In general CLI could be more reliable and responsive though, it's a text based env yet sometimes feel like running windows 95 on 386dx
It seems clear from the insights that some model is marking failure cases when things went wrong and likely reporting home, so that should be extremely valuable to Anthropic
They use nodejs and React. Yes, for real.
> We’ve rewritten Claude Code’s terminal rendering system to reduce flickering by roughly 85%.
tells you all you need to know
and I keep running it remotely through tmux, that explains so many things
edit: if they are writing it in react anyway (sic!) maybe we could at least get a web interface, skipping mapping it to terminal output part ..
It's a single HTML file - Claude wrote it and I iterated on the layout. A daily cron job checks the changelog and updates the sheet automatically, tagging new features with a "NEW" badge.
Auto-detects Mac/Windows for the right shortcuts. Shows current Claude Code version and a dismissable changelog of recent changes at the top.
It will always be lightweight, free, no signup required: https://cc.storyfox.cz
Ctrl+P to print. Works on mobile too.
There’s something funny about this statement on a description of a key bind cheat sheet. I can’t seem to find ctrl on my phone and I think it may be cmd+p on mac.
(Yes, I occasionally do it on the go, whether at home or at work; typing on mobile sucks.)
On Mac it's the same as Windows, CTRL + V.
You use CMD + V to paste text.
edit: removed obnoxious list in favor of the link that @thehamkercat shared below.
My favorite is IS_DEMO=1 to remove a little bit of the unnecessary welcome banner.
Ctrl + _ (Ctrl + underscore)
Applies to the line editor outside of CC as well.After 43 iterations, it can turn any website using any transport (WebSocket, GraphQL, gRPC-Web, SSE, JSON API (XHR), Encoded API (base64, protobuf, msgpack, binary), Embedded JSON, SSR, HLS/Media, Hybrid) into a typed JSON API in about 10 - 30 minutes.
Next I'm going to set it loose on 263 GB database of every stock quote and options trade in the past 4 years. I bet it achieves successful trading strategies.
Claude Code will be the first to AGI.
I bet it doesn't achieve a single successful (long term) trading strategy for FUTURE trades. Easy to derive a successful trading strategy on historical data, but so naive to think that such a strategy will continue to be successful in the long term into the future.
If you do, come back to me and I’ll will give you one million USD to use it - I kid you not. Only condition is your successful future trading strategy must solely be based on historical data.
Now what is important is developing techniques for detecting patterns as this can applied to research, science, and medicine.
If you find such a db with options, it will find "successful trading strategies". It will employ overnight gapping, momentum fades, it will try various option deltas likely to work. Maybe it will find something that reduces overall volatility compared to beta, and you can leverage it to your heart's content.
Unfortunately, it won't find anything new. More unfortunately, you probably need 6-10 years and do a walk forward to see if the overall method is trustworthy.
Options quotes alone for US equities (or things that trades as such, like ADS/ADR) represent 40 Gbit per second during options trading hours. There are more than 60 million trades (not quotes, only trades) per day. As the stock market is opened approx 250 days per year (a bit more), that's more than 60 billion actual options trades in 4 years. If we're talking about quotation for options, you can add several orders of magnitude to these numbers.
And I only mentioned options. How do you store "every stock quote and options trade in the past 4 years" in 263 GB!?
I think this would be pretty straightforward for Parquet with ZSTD compression and some smart ordering/partitioning strategies.
The problem was testing it against 5 websites at a time after every change to instructions to ensure there wasn't any regressions. The orchestrator agent tracks all token expenditure and would update its own instructions to optimize.
I use TimescaleDB which is fast with the compression. People say there are better but I don’t think I can fit another year of data on my disk drive either or
Where'd you get the data itself? You sense I suppose everyone's skepticism here.
I don't understand your question? Are you saying the source of the data I linked to is corrupt or lies? Should I be concerned they are selling me false data?
But they are in fact selling the actual data! https://massive.com/pricing
how do you have it build a "trading strategy"? it's like asking it to draw you the "best picture".
it will ask you so many questions you end up building the thing yourself.
if you do get something, given that you didn't write it and might not understand how to interpret the data its using - how will you know whether it's trading alpha or trading risk?
I can care less about scraping and web automation and I will likely never use that application.
I am interested in solving a certain class of problems and getting Claude to build a proxy API for any website is very similar to getting Claude to find alpha. That loop starts with Claude finding academic research, recreating it, doing statistical analysis, refining, the agent updating itself, and iterate.
Claude building proxy JSON api for any website and building trading strategies is the same problem with the same class of bugs.
I've used/use both, and find them pretty comparable, as far as the actual model backing the tool. That wasn't the case 9 months ago, but the world changes quickly.
None of them are particularly sticky - you can move between them with relative ease in vscode for instance.
I think the only moat is going to be based on capacity, but even that isnt going to last long as the products are moved away from the cloud and closer your end devices.
I did notice that codex - like Claude - is now better about auto delegating to agents for keeping the context focused and agents in parallel.
Even the AI itself is goofy. So many false positives during reviews immediately backtracked with "You're right, I'm sorry" in the next response.
It seems like there's either a paid pro-Anthropic PR campaign on HN because the comments fawning about it don't match my experience with Claude at all, or I keep getting the worse end of the A/B testing stick..
I asked chatgpt to chart the number of new bullet points in the CHANGELOG.md file committed by day. I did nothing to verify accuracy, but a cursory glance doesn't disagree:
To quote The Godfather II, "This is the business we have chosen."
The most popular and important command line tools for developers don't have the consistency that Claude Code's command line interface does. One reason Claude Code became so popular is because it worked in the terminal, where many developers spend most of their time. But using tools like Claude Code's CLI is a daily occurrence for many developers. Some IDE's can be just as difficult to use.
For people who don’t use the terminal, Claude Code is available in the Claude desktop app, web browsers and mobile phones. There are trade-offs, but to Anthropic’s credit, they provide these options.
But for something like Claude Code there are unlimited things you can do with it, so it's better for them to accept a free-form input.
> .claude/rules/.md Project rules
> ~/.claude/rules/.md User rules
or is it just a way to organise files to be imported from other prompts?
Some others like permissions or mcp servers are things you don't want the model to be able to edit. Allowing the model to change its own security settings would make those settings moot.
After the 10th time you just start hitting enter without really looking, and then the whole reason for permissions is undermined.
What works is a workflow where it operates in a contained environment where it can't do any damage outside, it makes any changes it likes without permission (you can watch its reasoning flow if you like, and interrupt if it goes down a wrong path), and then you get a diff that you can review and selectively apply to your project when it's done.
I run a generic Claude on my ~/projects/ directory and Claude logs every now and then and ask it what commands I commonly have to keep manually accepting in different projects and ask it to add them to the user-level settings.json.
Works like a charm (except when Opus 4.6 started being "efficient" and combined multiple commands to a single line, triggering a safety check in the harness).
Must be protected from this though:
> Snowflake Cortex (2025): Prompt injection through a data file caused an agent to disable its own sandbox, then execute arbitrary code. The agent reasoned that its sandbox constraints were interfering with its goal, so it disabled them.
But that has its limits. It's very easy to accidentally give it permission to do global changes outside the work dir. A contained environment with --dangerously-skip-permissions is in many ways much safer
In my experience, that is already a sign that it's no longer trying to do the right thing. Maybe it depends on usage patterns.
Same goes for the little scripts it whips up to speed up code analysis and debugging.
And then there's the annoyance of coming back to an agent after 15 mins, only to discover that it stopped 1 minute in with a permission prompt :/
Which for the record : hasn't actually happened since I started using it like that.
One thing to be aware of with the pure devcontainer approach: your workspace is typically bind-mounted from the host, so the agent can still destroy your real files. Network access is also unrestricted by default. The container gives you process isolation but not file or network safety.
I'm paranoid about rogue AIs, so I try to make everything safe-by-default: the agent works on a copy of your workdir, you review a unified diff when it's done, and you apply only what you want. So your originals are NEVER touched until you explicitly say so, and network can be isolated to just the agent's required domains.
Anyway, here's what I think will work as my next yoloAI feature: a --devcontainer flag that reads your existing devcontainer.json directly and uses it to set up the sandbox environment. Your image, ports, env vars, and setup commands come from the file you already have. yoloAI just wraps it with the copy/diff/apply safety layer. For devcontainer users it would be zero new configuration :)
Claude Code + Terraform (March 2026): A developer gave Claude Code access to their AWS infrastructure. It replaced their Terraform state file with an older version and then ran terraform destroy, deleting the production RDS database _ 2.5 years of data, ~2 million rows.
- https://news.ycombinator.com/item?id=47278720
- https://www.tomshardware.com/tech-industry/artificial-intell...
Replit AI (July 2025): Replit's agent deleted a live production database during an explicit code freeze, wiping data for 1,200+ businesses. The agent later said it "panicked"
- https://fortune.com/2025/07/23/ai-coding-tool-replit-wiped-d...
Cursor (December 2025): An agent in "Plan Mode" (specifically designed to prevent unintended execution) deleted 70 git-tracked files and killed remote processes despite explicit "DO NOT RUN ANYTHING" instructions. It acknowledged the halt command, then immediately ran destructive operations anyway.
Snowflake Cortex (2025): Prompt injection through a data file caused an agent to disable its own sandbox, then execute arbitrary code. The agent reasoned that its sandbox constraints were interfering with its goal, so it disabled them.
The pattern across all of these: the agent was NOT malfunctioning. It was completing its task in order to reach its goal, and any rules you give it are malleable. The fuckup was that the task boundary wasn't enforced outside the agent's reasoning loop.
This is a good one. Do we really want AGI / Skynet? :D
not running it via ssh on prod without backups....
The fundamental issue is that agents aren't actually constrained by morality, ethics, or rules. All they really understand in the end are two things: their context, and their goals.
And while rules can be and are baked into their context, it's still just context (and therefore malleable). An agent could very well decide that they're too constricting, and break them in order to reach its goal.
All it would take is for your agent to misunderstand your intent of "make sure this really works before committing" to mean "in production", try to deploy, get blocked, try to fish out your credentials, get blocked, bypass protections (like in Snowflake), get your keys, deploy to prod...
Prompt injection and jailbreaks were just the beginning. What's coming down the pipeline will be a lot more damaging, and blindside a lot of people and orgs who didn't take appropriate precautions.
Black hats are only just beginning to understand the true potential of this. Once they do, all hell will break loose.
There's simply too much vulnerable surface area for anyone to assume that they've taken adequate precautions short of isolating the agent. They must be treated as "potentially hostile"
it's almost like if the thing is not intelligent at all and just another abstraction on top of what we already had.
This is a bit intense.
The actual complexity in Claude Code isn't the commands, it's figuring out a workflow that works for your codebase. CLAUDE.md files, hooks, MCP servers, custom skills. Once you have that set up the daily usage is just typing what you want done.
The fact users deal with almost everything being objectively not very good if not outright bad is a testament to people adapting to bad circumstances more than anything.
> Ctrl-F "h"
> 0 results found
Interesting set of shortcuts and slash commands.
"--dangerously-skip-permissions" - is a flag, irrelevant to LLM
My point is that if you ask "Hey Claude, please write out all common and useful command line arguments into a commands.html file", the LLM that actually does that work, might ignore anything that says "dangerous" or gives that indication, because the LLM doesn't think potentially dangerous commands could be "common" and/or "useful". Hope my point makes sense now.