The Claude Code Source Leak: fake tools, frustration regexes, undercover mode - https://news.ycombinator.com/item?id=47586778 - March 2026 (406 comments)
Claude Code's source code has been leaked via a map file in their NPM registry - https://news.ycombinator.com/item?id=47584540 - March 2026 (956 comments)
Also related: https://www.ccleaks.com
If you don't have a rigid, external state machine governing the workflow, you have to brute-force reliability. That codebase bloat is likely 90% defensive programming; frustration regexes, context sanitizers, tool-retry loops, and state rollbacks just to stop the agent from drifting or silently breaking things.
The visual map is great, but from an architectural perspective, we're still herding cats with massive code volume instead of actually governing the agents at the system level.
My takeaway from looking at the tool list is that they got the fundamental architecture right - try to create a very simple and general set of tools on the client-side (e.g. read file, output rich text, etc) so that the server can innovate rapidly without revving the client (and also so that if, say, the source code leaks, none of the secret sauce does).
Overall, when I see this I think they are focused on the right issues, and I think their tool list looks pretty simple/elegant/general. I picture the server team constantly thinking - we have these client-side tools/APIs, how can we use them optimally? How can we get more out of them. That is where the secret sauce lives.
If you can download it client side you can likely place a copy in a folder and ask claude
‘decompile the app in this folder to answer further questions on how it works. As an an example first question explain what happens when a user does X’.
I do this with obscure video games where i want to a guide on how some mechanics work. Eg. https://pastes.io/jagged-all-69136 as a result of a session.
It can ruin some games but despite the possibility of hallucinations i find it waaay more reliable than random internet answers.
Works for apps too. Obfuscation doesn’t seem to stop it.
What is the naive implementation you're comparing against? Ssh access to the client machine?
If it doesn't deliver on the promise we have bigger problems than "oh no the code is insecure". We went from "I think this will work" to "this has to work because if it doesn't we have one of those 'you owe the bank a billion dollars' situations"
If they fail, doesn't software and the giant companies that make it go back to owning the world?
As I’m reading this, I’m thinking about how in 1980. It was imagined that everyone needed to learn how to program in BASIC or COBOL, and that the way computers would become ubiquitous would be that everybody would be writing program programs for them. That turned out to be a quaint and optimistic idea.
It seems like the pitch today is that every company that has a software-like need will be able to use AI to manifest that software into existence, or more generally, to manifest some kind of custom solution into existence. I don’t buy it. Coding the software has never been the true bottleneck, anyone who’s done a hackathon project knows that part can be done quickly. It’s the specifying and the maintenance that is the hard part.
To me, the only way this will actually bear the fruit it’s promising is if they can deliver essentially AGI in a box. A company will pay to rent some units of compute that they can speak to like a person and describe the needs, and it will do anything - solve any problem - a remote worker could do. IF this is delivered, indeed it does invalidate virtually all business models overnight, as whoever hits AGI will price this rental X%[1] below what it would cost to hire humans for similar work, breaking capitalism entirely.
[1] X = 80% below on day 1 as they’ll be so flush with VC cash, and they’d plan to raise the price later. Of course, society will collapse before then because of said breaking of capitalism itself.
It's one thing to give Claude a narrow task with clear parameters, and another to watch errors or incorrect assumptions snowball as you have a more complex conversation or open-ended task.
Their demo even says:
`Paste any code or text below. Our model will produce an AI-generated, byte-for-byte identical output.`
Unless this is a parody site can you explain what I am missing here?Token echoing isn't even to the lexeme/pattern level, and not even close to WSD, Ogden's Lemma, symbol-grounding etc...
The intentionally 'Probably approximately complete' statistical learning model work, fundamentally limits reproducibility for PAC/Stastical methods like transformers.
CFG inherently ambiguity == post correspondence problem == halt == open domain frame-problem == system identification problem == symbol-grounding problem == entscheidungsproblem
The only way to get around that is to construct a grammar that isn't. It will never exist for CFGs, programs, types, etc... with arbitrary input.
I just don't see why placing a `14-billion parameter identity transformer` that just basically echos tokens is a step forward on what makes the problem hard.
Please help me understand.
Can you expand on this?
My experience is they require excessive steering but do not “break”
I don't know where you get this. you should ask folks at Meta. They are probably the biggest and happiest users of CC
System-level governance means the LLM is completely stripped of orchestration rights. It becomes a stateless, untrusted function. The state lives in a rigid, external database (like SQLite). The database dictates the workflow, hands the LLM a highly constrained task, and runs external validation on the output before the state is ever allowed to advance. The LLM cannot unilaterally decide a task is done.
I got so frustrated with the former while working on a complex project that I paused it to build a CLI to enforce the latter. Planning to drop a Show HN for it later today, actually.
This sounds like where lat.md[0] is headed. Only thing is it doesn't do task constraint. Generally I find the path these tools are taking interesting.
edit: Also seems like peoples replies are getting downvoted to hell and getting marked as dead and dissapear. Someone must not like your idea :-)
So what specifically is the gripe? If it works, it works right?
Considering what the entire system ends up being capable of, 500k lines is about 0.001% of what I would have expected something like that to require 10 years ago.
You can combine that with all the training and inference code, and at the end of the day, a system that literally writes code ends up being smaller than the LibreOffice codebase.
It boggles the mind, really.
https://github.com/badlogic/pi-mono/tree/main/packages/codin...
You really need to compare it to the model weights though. That’s the “code”.
Is that the case? I'm pretty sure Claude Code is one of the most massively successful pieces of software made in the last decade. I don't know how that proves your point. Will this codebase become unmanageable eventually? Maybe, but literally every agent harness out there is just copying their lead at this point.
The fact that the industry is copying a 500k-line harness is the problem. We're automating security vulnerabilities at scale because people are trying to put the guardrails inside the probabilistic code instead of strictly above it.
Standardizing on half a million lines of defensive spaghetti is a huge liability.
For starters, CC's TUI is React-based.
" Most people's mental model of Claude Code is that "it's just a TUI" but it should really be closer to "a small game engine".
For each frame our pipeline constructs a scene graph with React then -> layouts elements -> rasterizes them to a 2d screen -> diffs that against the previous screen -> finally uses the diff to generate ANSI sequences to draw
We have a ~16ms frame budget so we have roughly ~5ms to go from the React scene graph to ANSI written. "
Claude Code CLI is actually horrible: it's a full headless browser rendering that's then converted in real-time to text to show in the terminal. And that fact leaks to the user: when the model outputs ASCII, the converter shall happily convert it to Unicode (no latter than yesterday there was a TFA complaining about Unicode characters breaking Unix pipes / parsers expecting ASCII commands).
It's ultra annoying during debugging sessions (that is not when in a full agentic loop where it YOLOs a solution): you can't easily cut/paste from the CLI because the output you get is not what the model did output.
Mega, mega, mega annoying.
What should be something simple becomes a rube-goldberg machinery that, of course, fucks up something fundamental: converting the model's characters to something else is just pathetically bad.
Anyone from Anthropic reading? Get your shit together: if you keep this "headless browser rendering converted to text", at least do not fucking modify the characters.*
- Dark mode design with lots of colors
- Buttons that have vibrant, bright borders and duller backgrounds
- Excessive (IMO) usage of monospace fonts for stylistic reasons
None of this proves that it's AI (the other comments have covered that) but in my experience it's always correct.
Isn't it a simple REPL with some tools and integrations, written in a very high level language? How the hell is it so big? Is it because it's vibecoded and LLMs strive for bloat, or is it meaningful complexity?
- Opencode (anomalyco/opencode) is about 670k LOC
- Codex (openai/codex) is about 720k LOC
- Gemini (google-gemini/gemini-cli) is about 570k LOC
Claude Code's 500k LOC doesn't seem out of the ordinary.
Aren't all the other products also vibe-coded? "All vibe-coded products look like this" doesn't really seem to answer the question "Why is it so damn large?"
It's a repl, that calls out to a blackbox/endpoint for data, and does basic parsing and matching of state with specific actions.
I feel the bulk of those lines should be actions that are performed. Either this is correct or this is not:
1. If the bulk of those lines implement specific and simple actions, why is it so large compared to other software that implements single actions (coreutils, etc)
2. If the actions constitute only a small part of the codebase, wtf is the rest of it doing?
> You're complaining about vibe coding while also complaining about how you "feel" about the code. Do you see the irony in that?
Where did I complain about how I feel about the actual code? I have feelings, negative ones, about the size of the code given the simple functionality it has, but I have no feelings on the code because I did not look at the code.
For comparison, it takes me less time to load Chrome and go to gemini.google.com.
Yeahhh strong disagree there, I find Codex and CC to be buggy as hell. Desktop CC is very bad and web version is nigh unusable.
I think a lot of the people prasing Claude & co are on Macs.
edit: Claude is actually (TS) 395K. So Gemini is more bloat. Codex is arguable since is written in lower-level language.
I'm not saying that this is necessarily too much, I'm genuinely asking if this is a bloat or if it's justified.
I doubt it needs to be more than 20-50kloc.
You can create a full 3D game with a custom 3D engine in 500k lines. What the hell is Claude Code doing?
For example, I found an implementation of a PRNG, mulberry32 [1], in one of the files. That's pretty strange considering TS and Javascript have decent PRNGs built into the language and this thing is being used as literally just a shuffle.
[1] https://github.com/AprilNEA/claude-code-source/blob/main/src...
If you search mulberry32 in the code, you'll see they use it for a deterministic random. They use your user ID to always pick the same random buddy. Just like you might use someone's user ID to always generate the same random avatar.
So that's 10 lines of code accounted for. Any other examples?
What every developer learns during their “psh i could build that” weekendware attempt is that there is infinite polish to be had, and that their 20k loc PoC was <1% of the work.
That said, doesn't TFA show you what they use their loc for?
It's a completely different domain, e.g. very different integration surface area and abstractions.
Claude Code's source is dumped online so there's probably a more concrete analysis to be had than "that sounds like too many loc".
Also a AAA game (with the engine) with physics, networking, and rendering code is up there in terms of the most complex pieces of software.
For example, without looking at the code, the superstition also works in the opposite direction: Claude Code is an interface to using AI to do any computer task while a 3D game just lets you shoot some bad guys, so surely the 3D game must be done in fewer loc. That's equally unsatisfying.
You'd have to be more concrete than "sounds like a lot".
Shouldn't interfaces be smaller than the implementation?
A GUI/client can be arbitrarily more or less complex than the things it's GUI'ing.
Claude Code is quite literally a wrapper around a few APIs. At one point it needed 68GB of RAM to run and requires 11ms to "lay a scene graph" to display a few hundred characters on screen. All links here: https://news.ycombinator.com/item?id=47598488
> while a 3D game just lets you shoot some bad guys, so surely the 3D game must be done in fewer loc.
Yes, most games should be done in fewer loc
A cursory glance at the codebase shows that it's not just a wrapper around a few APIs.
As the ZMachine interpreter (V3 games at least, enough for the mentioned example), even a Game Boy used to play Pokemon Red/Blue -and Crystal/Sylver/Blue, just slightly better specs than the OG GB- can run Tristam Island with keypad based input picking both selected words from the text or letter by letter as when you name a character in an RPG. A damn Game Boy, a pocket console from 1989. Not straightly running a game, again. Emulating a simple text computer -the virtual machine- to play it. No slowdowns, no-nothing, and you can save the game (the interpreter status) in a battery backed cartridge, such as the Everdrive. Everything under... 128k.
Claude Code and the rest of 'examples' it's what happens when trade programmers call themselves 'engineers' without even a CS degree.
This file is exactly what I'm talking about.
Take the loadInitialMessage function: It's encumbered with real world incremental requirements. You can see exactly the bolted-on conditionals where they added features like --teleport, --fork-session, etc.
The runHeadlessStreaming function is a more extreme version of that where a bunch of incremental, lateral subsystems are wired together, not an example of superfluous loc.
You don't have to explain why there might be better ways to write some code because the claim is about lines of code. It could be the case that perfectly organizing and abstracting the code would result in even more loc.
Sure. You could have. But you're not the one playing football in the Champions League.
There were many roads that could have gotten you to the Champions League. But now you're in no position to judge the people who got there in the end and how they did it.
Or you can, but whatever.
The only reason people are using Claude Code is because it's the only way to use their (heavily subsidized) subscription plans. People who are okay with using and paying for their APIs often opt out for other, better, tools.
Also, analogies don't work. As we know for a fact that Claude Code is a bloated mess that these "champions league-level engineers" can't fix. They literally talk about it themselves: https://news.ycombinator.com/item?id=47598488 (they had to bring in actual Champions League engineers from bun to fix some of their mess).
You just repeat the same statement.
That bloated mess is what got them to the Champions League. They did what was necessary to get them here. And they succeeded so far.
But hey, according to some it can be replicated in 50k lines of wrapper code around a terminal command, so for Anthropic it's just one afternoon of vibe coding to get rid of this mess. So what's the problem? /s
Since you keep putting words in my mouth that I never said, and keep being deliberately obtuse, this particular branch is over.
Go enjoy Win11 written by same level of champions or something.
Adieu.
Engineers using LOC as a measure of quality is the inverse of managers using LOC as a measure of productivity.
The principles of good software don't suddenly vanish just because now it's a machine writing the code instead of a human, they still have to deal with the issues humans have for more than half a century. The history of programming is new developers coming up with a new paradigm, then rediscovering all the issues that the previous generation had figured out before them.
It turns out that there is a tradeoff in code between velocity and quality that smart businesses consider relative to hardware cost/quality. The businesses that are outcompeting others are rarely those who have the highest quality code, but rather those that are shipping quickly at a quality level that is satisfactory for current hardware.
That worked because of rapid advancements in CPU performance. We’ve left that era.
It’s about more than performance. Code is and always has been a liability. Even with agents, you start seeing massive slowdowns with code base size.
It’s why I can nearly one shot a simple game for my kid in 20 minutes with Claude, but using it at work on our massive legacy codebase is only marginally faster than doing it by hand.
But given that we know the functionality of Claude Code, we can guess how much complexity should be required. We could also be wrong.
>Why does it matter?
If there’s massively more code than there needs to be that does matter to the end user because it’s harder to maintain and has more surface area for bugs and security problems. Even with agents.
The more lines of code you have the more likely there is for one of them to be wrong and go unnoticed. It results in bugs, vulnerabilities,... and leaks.
Because it's unmaintainable slop that they themselves don't know how to fix when something happens? https://news.ycombinator.com/item?id=47598488
At some point someone will probably take their LLM code and repoint it at the LLM and say 'hey lets refactor this so it uses less code is easier to read but does the same thing' and let it chrun.
One project I worked on I saw one engineer delete 20k lines of code one day. He replaced it with a few lines of stored procedure. That 20k lines of code was in production for years. No one wanted to do anything with it but it was a crucial part of the way the thing worked. It just takes someone going 'hey this isnt right' and sit down and fix it.
I guess I just find it weird because all the signals are messed up so whenever I see these sorts of layouts, I feel like I'm looking at the average where I don't think "gorgeous and interesting" at all. Instead, I'm forced to think "I should be skeptical of this based on the presentation because it presents as high quality but this may be hiding someone who is not actually aware of what they're presenting in any depth" as the author may have just shoved in a prompt and let it spin.
There's actually a similarly designed website (font weights, font styles etc) here in New Zealand (https://nzoilwatch.com/) where at a glance, it might seem like some overloaded professional-backed thing but instead it's just some guy who may or may not know anything about oil at all, yet people are linking it around the place like some sort of authoritative resource.
I would have way less of an issue if people just put their names by things and disclosed their LLM usage (which again, is fine) rather than giving the potentially false impression to unequipped people that the information presented is actually as accurate and trustworthy as the polish would suggest.
I'm serious. The hype chasing clearly clearly matters. .
things like this: https://github.com/instructkr/claw-code I mean ok, serious people put in years of effort for 100 of those stars ...
it's continually wild how extremely irrelevant hard effortful careful work is.
I think that's the game. Get up, look at the headlines, figure out how you can exploit them with vibe coding, do some hyphy project and repeat.
Maybe some lobster themed bullshit between openclaw and the claudecode leak.
I'm not being a cynic here, I'm just telling you what I'm going to do tomorrow.
It's sloppy work
Does not matter. Sloppiness is unimportant
My shit's always too complicated. let's see
Last week we I was struggling to go from vague prompt to a OMG-it's-so-nice-looking web app, I remembered that example above and then decided to create my own component library, which I did in a couple days: https://www.substrateui.dev/. I was actually super happy that I was able to accomplish that, and then I realized I wanted to better understand the content that I had vibe coded into existence. So now I'm recreating that design system step by step w/ Claude code, filling in gaps in my knowledge & learning a bit about colors, typography, CSS, blah blah blah. It's actually a lot of fun because I'm able to explore all of the concepts and learn enough to build a front end that doesn't suck & is good enough for my use case without getting stuck for days on trying to center a stupid div by hand or play whack-mole-fix-something-and-break-something-else when trying to clean up AI slop.
Content resizing, needing to juggle a speed knob to read, and the overall presentation makes it feel like Edward Tufte flavored nightmare fuel.
It's basically a slideshow which advances and presents several content areas which are intended to be read, all while advancing and resizing themselves.
Pausing and clicking through manually stepwise is also pretty obnoxious.
Would much rather just see the content all laid out at once
That just seems to be human nature unfortunately - the complainers are always louder.
Those within well informed, technical circles will fall somewhere in between the for/against labels, myself included.
The GenAI hype cycle is finally starting to collapse as the general population starts to realize that these systems aren't the panacea for "everything" after all. They provide enormous utility in some domains like coding, but even then there are massive tradeoffs, footguns and the usual horse blinder ills that come with every hype cycle. I just hope we stop having to "learn the hard way" with respect to undisciplined use of current-gen LLM systems writ large, and cooler heads prevail sooner rather than later.
I've created some chinese characters learning website and I took me typing 1/3 of LoTR to get there[1]. I would have typed like 1% of that writing code directly. It is a different process, but it still needs some direction.
I love your implementation.
Here was my first stab:
A 1yo project may be in good shape if written by just one dev, maybe a few. But if you have many devs, I can guarantee it will be messy and buggy. If anything, at 1yo it is probably still full of bugs because not enough time has elapsed for people to run into them.
And I'm sure we all know that when working on a greenfield project you can produce a lot more LoC per day than maintaining a legacy one.
Given that vibe code is significantly more verbose, you're probably talking about ~15 engineers worth of code?
I know that's all silly numbers, but this is just attempting to give people some context here, this isn't a massive code base. I've not read a lot of it, so maybe it's better than the verbose code I see Claude put out sometimes.
Correction: a code base of 500kLoC would take 23 engineers a year to write. There is no indication that the functionality needed in a TUI app that does what this app does needs 500kLoC.
This is a two-pizza team sized project, so it's not a project that the code quality would inevitably spiral out of control due to communication problems.
A single senior architect COULD have kept the code quality under control.
This is why I personally don't take technical debt arguments about how LLM maintained code bases deteriorate with size/age seriously; it presumes that at some point I'll give up with the LLM and be left with a mess to clean up by hand, but that's not going to happen, future maintenance is to be left to LLMs and if that isn't possible for some reason then the project is as good as dead anyway. When you start a project with a LLM the plan should be to see it through with LLMs, planning to have unaided humans take over maintenance at some point is a mistake.
I find LLMs very useful and capable, but in my experience they definitely perform worse when things are unorganized. Maintenance isn't just aesthetics, it's a direct input to correctness.
Just a thought experiment, I very much doubt I'm the first one to think of it. It's probably in the same line of "why doesn't an LLM just write assembly directly"
I liken it to the problem of applying machine learning to hard video games (e.g. Starcraft). When trained to mimic human strategies, it can be extremely effective, but machine learning will not discover broadly effective strategies on a reasonable timescale.
If you convert "human strategies" to "human theory, programming languages, and design patterns", perhaps the point will be clear.
But: could the ouroboric cycle of LLM use decay the common strategies and design patterns we use into inexplicable blobs of assembly? Can LLMs improve at programming if humans do not advance the theory or invent new languages, patterns, etc?
The current training loop for coding is RL as well - so a departure from human coding patterns is not unexpected (even if departure from human coding structure is unexpected, as that would require development of a new coding language).
My suspicion is that the "language" part of LLMs means they tend to prefer languages which are closer to human languages than assembly and benefit from much of the same abstractions and tooling (hence the recent acquisition of bun and astral).
> It's a shame, because it's still the best coding agent, in my experience.
If it is the best, and if it delivers the value users are asking for, then why would they have an incentive to make further $$$ investments to make it of a "higher" quality if the value this difference could make is not substantial or hurts the ROI?
On many projects I found this "higher quality" not only to be false of delivering more substantial value but actually I found it was hurting the project to deliver the value that matters.
Maybe we are after all entering the era of SWE where all this bike-shedding is gone and only type of engineers who will be able to survive in it will be the ones who are capable of delivering the actual value (IME very few per project).
Or that's why tgey had to buy bun with actual engineers to work on Claude Code to reduce memory peaks from 68 GB (yes, 68 gigabytes) to a "measely" 1.7? Because code quality doesn't matter?
Or that a year later they still cannot figure out how to render anything in the terminal without flickering?
The only reason people use Claude Code is because it's the only way to use Anthropic's heavily subsidized subscription. You get banned if you use it through other, better, tools.
Meanwhile I apparently need to change my persoective about this: https://news.ycombinator.com/item?id=47598488
Now whether that’s actually possible is a second topic.
That's how you get "oh this TUI API wrapper needs 68GB of RAM" https://x.com/jarredsumner/status/2026497606575398987 or "we need 16ms to lay out a few hundred characters on screen that's why it's a small game engine": https://x.com/trq212/status/2014051501786931427
I particularly valued the tool list. People in these comments are complaining about how bad the code is, but I found the client-side tools that the model actually uses to be pretty clean/general.
My takeaway was more that at a very basic level they know what they are doing - keep the client general, so that you can innovate on the server side without revving the client as much.
The real value of Anthropic is in the models that they spent hundreds of millions training. Anyone can build a frontend that does a loop, using the model to call tools and accomplish a task. People do it every day.
Sure, they've worked hard to perfect this particular frontend. But it's not like any of this is revolutionary.
Also I definitely want a Claude Code spirit animal
(Yes, I know I can turn it off. I have.)
This deployment is temporarily paused
https://web.archive.org/web/20260331105051/https://www.cclea...
BTW, that's why you should use your own infrastructure and not depend on Vercel
For example the whole animation on this website, what does it say beyond that you make a request to backend and get a response that may have some tool call?
If you prompt with little raw material and little actual specification of what you want to see in the end, eg you just say make a detailed breakdown dashboard-like site that analyzes this codebase, the result will have this uncanny character.
I'd describe it as a kind of "fanfic", it (and now I'm not just talking about this website but my overall impression related to this phenomenon) reminds me a bit like how when I was 15 or so, I had an idea about how the world works then things turned out to be less flashy, less movie-like, less clear-cut, less-impressive-to-a-teenage-boy than I had thought.
If you know the concept of "stupid man's idea of a smart man", I'd say AI made stuff (with little iteration) gives this outward appearance of a smart man from the Reddit-midwit-cinematic-universe. It's like how guns in movies sound more like guns than real guns. It's hyperreality.
Again this is less about the capabilities of AI and it's more connected to the people-pleasing nature of it. It's like you prompt it for some epic dinner and it heaps you up some hmmm epic bacon with bacon yeah (referring to the hivemind-meme). Or BigMac on the poster vs the tray, and the poster one is a model made with different components that are more photogenic. It's a simulacrum.
It looks more like your naive currently imagined thing about what you think you need vs what you'd actually need. It's like prompting your ideal girlfriend into AI avatar existence. I'm sure she will fit your ideal thought and imagination much better but your actual life would need the actual thing.
This relates to the Persona thing that Anthropic has been exploring, that each prompt guides the model towards adopting a certain archetypal fiction character as it's persona and there are certain attraction basins that get reinforced with post training. And in the computer world, simulated action can be easily turned into real action with harnesses and tools, so I'm not saying that it doesn't accomplish the task. But it seems that there are more sloppy personas, and it seems that experts can more easily avoid summoning them by giving them context that reflects more mundane reality than a novice or an expert who gives little context. Otherwise the AI persona will be summoned from the Reddit midwit movie.
I'm not fully clear about all this, but I think we have a lot to figure out around how to use and judge the output of AI in a productive workflow. I don't think it will go away ever, but will need some trimming at the edges for sure.
Here is another one that goes in depth as well: www.markdown.engineering for anyone going deep on learning.
I use it all day and love it. Don't get me wrong. But it's a terminal-based app that talks to an LLM and calls local functions. Ooookay…
Agents in general are easy to make, and trivial to make for yourself especially, and the result will be much better than what any of the big providers can make for you.
`pi` with whatever commands/extensions you want to make for yourself is better than CC if you really don't want to go through the trouble of making your own thing.
Sincerely, someone running a team building similar things for analytics.
curious as i haven't gotten around to writing my own agent yet
Anything general is always going to be worse for specific use cases, and agents from these big providers are very general. They'll spend tons of tokens doing things that you might not need, including spend extra tokens on supporting MCP, etc., when you might not even need that.
The open source models are quite close, and they'd probably be just as good with the equivalent amount of compute/data the frontier labs have access to.
I looked at the leaked code expecting some "secret sauce", but honestly didn't found anything interesting.
I don't get the hype around Claude Code. There's nothing new or unique. The real strength are the models.
First command I looked at:
/stickers:
Displays earned achievement stickers for milestones like first commit, 100 tool calls, or marathon sessions. Stickers are stored in the user profile and rendered as ASCII art in the terminal.
That is not what it does at all - it takes you to a stickermule website.What is the motivation for someone to put out junk like this?
Getting something with a link to their GitHub onto the frontpage of HN. Because form matters much more in this world than substance.
The animated explanation at the top is also way too fast at 1x, almost impossible to follow; that immediately hinted at the author not fully reading/experiencing the result before publishing this.
How is this on the front page?
The utils directory should only contain truly generic, business-agnostic utilities (such as date retrieval, simple string manipulation, etc.).
We can see that the code produced by Vibe is not what a professional engineer would write. This may be due to the engineers using the Vibe tool.
0 - https://github.com/zackautocracy/claude-code/blob/main/src/u...
it looks really interesting.
- find nothing - still manage to fill entire lages - somehow have a similar structure - are boring as fuck
At least this one is 3/4, the previous one had BINGO.
The fact that now every agent designer knows what was already built is a huge shot of steroids to their codebase!
In all seriousness. I think you‘re supposed to run these in some kind of sandbox.
Which emperor, specifically?
I've been working on my own coding agent setup for a while. I mostly use pi [0] because it's minimal and easy to extend. When the leak happened, I wanted to study how Anthropic structured things: the tool system, how the agent loop flows, A 500K line codebase is a lot to navigate, so I mapped it visually to give myself a quick reference I could come back to while adapting ideas into my own harness and workflow.
I'm actively updating the site based on feedback from this thread. If anything looks off, or you find something I missed, lmk.
[0] https://pi.dev/
I strongly disagree, but it made me chuckle a bit, thinking about labeling software as "handmade" or marketing software house as "artisanal".
The only suggestion/nit I have is that you could add some kind of asterisk or hover helper to the part when you talk about 'Anthropic's message format', as it did make me want to come here and point out how it's ackchually OpenAI's format and is very common.
Only because I figure if this was my first time learning about all this stuff I think I'd appreciate a deep dive into the format or the v1 api as one of the optional next steps.
I had used pi and cc to analyze the unpacked cc to compare their design, architecture and implementation.
I guess your site was also coded with pi and it is very impressive. Wonderful if you can do a visualization for pi vs cc as well. My local models might not be powerful enough.
Thanks for the hard work!
https://gist.github.com/ontouchstart/d7e3b7ec6e568164edfd482... (cc)
M5 (24G)