(Waiting for Cerebras coding plan to stop being sold out ;)
I've used them for smaller tasks (making small edits), and the "realtime" aspect of it does provide a qualitative difference. It stops being async and becomes interactive.
A sufficient shift in quantity produces a phase shift in quality.
--
That said, the main issue I find with agentic is my mental model getting desynchronized. No matter how fast the models get, it takes a fixed amount of time for me to catch up and understand what they've done.
The most enjoyable way I've found of staying synced is to stay in the driver's seat, and to command many small rapid edits manually. (i.e. I have my own homebrew "agent" that's just a loop of, I prompt it, it proposes edits, I accept or edit, repeat.)
So then the "synchronization" of the mental state is happening continuously, because there is no opportunity for desynchronization. Because you are the one driving. I call that approach semi-auto, or Power Coding (akin to Power Armor, which is wielded manually but greatly enhances speed and strength).
This is why I'm so skeptical of anyone running 6+ Claude sessions at a time. I've gotten to 5 but really that was across 3 sessions with 2 standing by just to commit stuff. And even with just 3 sessions I constantly lost where I was and wasted time re-orienting myself, doing work in the wrong session, etc.
>The most enjoyable way I've found of staying synced is to stay in the driver's seat, and to command many small rapid edits manually.
Same, there's a fantastic flow state/momentum I can get in a single session just knocking off features. I don't mind switching between two sessions in this state but the experience is better when it's two different projects vs two different features on the same project. The complete context switch lets be re-orient more easily
Even if you eliminate model latency and keep yourself fully in sync via a tight human-in-the-loop workflow, the shared mental model of the team still advances at human speed. Code review, design discussion, and trust-building are all bandwidth-limited in ways that do not benefit much from faster generation.
There is also an asymmetry: local flow can be optimized aggressively, but collaboration introduces checkpoints. Reviewers need time to reconstruct intent, not just verify correctness. If the rate of change exceeds the team’s ability to form that understanding, friction increases: longer reviews, more rework, or a tendency to rubber-stamp changes.
This suggests a practical ceiling where individual "power coding" outpaces team coherence. Past that point, gains need to come from improving shared artifacts rather than raw output: clearer commit structure, smaller diffs, stronger invariants, better automated tests, and more explicit design notes. In other words, the limiting factor shifts from generation speed to synchronization quality across humans.
With a good modern setup, everyone can be that "productive", and the only thing that keeps a project coherent is if the original design holds, therefore making rearchitecture a very rare event. It will also push us to have smaller teams in general, just because the idea of anyone managing a project with, say, 8 developers writing a codebase at full speed seems impossible, just like it was when we added enough high performance, talented people to a project. It's just harder to keep coherence.
You can see this risk mentioned in The Mythical Man Month already. The idea of "The Surgery Team", where in practice you only have a couple of people truly owning a codebase, and most of the work we used to hand juniors just being done via AI. It'd be quite funny if the way we have to change our team organization moves towards old recommendations.
Agentic coding is only speeding up or parallelising a small part of the workflow - the rest is still sequential and human-driven.
The current tools are the infancy of AI assisted coding. It’s like the MS-DOS era. Over time maybe the backpropagating from “your comfort language” to “target language” could become commonplace.
To be fair, that's not part of the article's title, but rather the title of the website that the article was posted to.
Also agree on everything author mentions. I can't attest to all examples but I know what a UI is.
Author mentions center of focus of attention. We should hear more often about the periphery of our attention field. Its bandwidth so to speak is a magnitude lower compared to the center but it's still there and can guide some decisions quite unintrusively to flow.
(Major) eye movements are a detriment to attention, which itself should be treated like a commodity (in case of a UI thousands use, moreso like a borrowed commodity).
It was a good article though
Would suggest that one of the mods remove it
Up until recently, LLMs just plain sucked. You'd set them on a task and then spend hours hand-holding them to output something almost correct.
Nowadays you can have a conversation with the chatbot, hash out a design, rubber duck and discuss what-ifs until you have a solid idea of the thing you're building, codified in a way an agent could understand, and now you have a PLAN.
From there, it's a matter of setting the agent in motion and checking from time to time to make sure it's not getting stuck on something under-specified.
That said, I've found that this kind of workflow works a lot better with claude than with gemini.
I am recently using this tiny[1] skill to generate an order on how to review a PR and it has been very helpful to me.
I find the inline stuff so incredibly annoying because they move around the text I am looking at.
I do appreciate in-IDE functionality that can search the codebase etc etc, but I want to hit a button when I need it.
> “Focus on…” would allow the user to specify what they're interested in changing and present only files and lines of code related to their specified interest.
> “Edit as…” would allow the user to edit the file or selected code as if it were a different programming language or file format.
When a human junior writes code, they leave breadcrumbs of their thinking — commit messages, PR descriptions, comments explaining why they chose approach A over B. You can reconstruct their reasoning from the artifact trail.
Agents don't do this naturally. You get a diff with no context for why it went that direction. So the reviewer has to reverse-engineer the thinking from the code alone, which is actually harder than reviewing human code because there are no "tells" — no familiar coding style, no consistent patterns that hint at the developer's mental model.
The semi-auto approach mentioned upthread works precisely because it solves this: you were there for every decision, so there's nothing to reconstruct. The productivity loss from staying in the loop is offset by the time you save not having to audit opaque changes after the fact.
- Break down big problems into smaller ones
- Create extensive plan + documentation (context)
- Make sure some parts of the plan if possible can be done simultaneously and not create too many dependencies.
- Define success criteria (tests?)
Then just unleash the agents. The more you put in, the more you get out.
Nothing wrong with that, except that as opposed to any other tool that is out there, agentic coding is approached by smart senior engineers that would otherwise spend time reading documentation and understanding a new package/tool/framework before giving conclusions around it with “I spun up Claude code and it’s not working”. Dunno why the same level of diligence isn’t applied to agentic coding as well.
First question that I always have to such engineers is “what model have you tried?” And it always ends up being the non-SOTA models for tasks that are not simple. Have you tried Claude Opus?
Second question: have you tried plan mode?
And then I politely ask them to read some documentation on using these tools, because the simplicity of the chat interface is deceptive.
I always wonder what the purpose of posting these generic, superficial defenses of a certain form of LLM-based coding is?
My experience is different in that case, but it certainly depends on the type of technical challenge, the programming language, etc.
Candidates that perform better or worse exist with and without agentic coding tools. I've had positive and negative experience on both fronts, so I'd attribute the OP's experience to the N=1 problem, and perhaps to the model's jagged intelligence.
I work mostly in Typescript, and it's well known that models are particulary well versed in it. I know that other programming languages are less supported because the training data for them is lower, in which case models could be worse with them across the board (or some SOTA models could be better than others)
Just that many brilliant engineers as themselves test agentic tools without the same level of thorough understanding that they give to other software engineering tools that they test out.
If the "Calm Tech"(™) people/institute/whatever actually wanted to move the needle, they'd be lobbying for regulations, building tools for consumers to fight back, or trying to do anything at all that actually shifts the underlying institutional and incentive structures. As it sits, they're the equivalent of a recess monitor suggesting maybe the bully would be happier if he shared the toys with the other kids - and frankly, given the degree of branding around the whole thing, it all starts to smell more like "influencer" than "genuine attempt to improve technology."
I have a similar impression. It seems to me that people get something that kind of works and then their interest runs out and they're left with a shallow understanding of the result and how it might be achieved. This seems detrimental to learning, which tends to happen when one is struggling.
"I strongly believe that chat is the least interesting interface to LLMs"
This is also something I agree with. When I work with databases, the best part is not sitting with an immediate client writing raw queries by hand.
^ This is a genius idea - someone add this to claude
I've thought for a while of building this exact thing as a vscode extension because of how utterly shit it is :D
I really want the source code!
It's very rough, but I plan on cleaning it up soon (the cluster labeler still needs a lot of work) and writing another post about it soon
I think this is true of AI agents. What is the object of our engineering attention? Applications, features, defect resolution. Not code.
I've also tinkered with this idea myself in the context of prompt engineering with my Grace Browser project (https://trygrace.dev/), which converts code to an equivalent dynamic web form live within your browser.
I do think it's useful to remember that code is not the end goal and is itself just another mediated interface to the actual goal: the product your building. However, I think even if you cut code out of the picture the chat interface is still not necessarily the right interface for building a product. A great example of how to build a non-chat interface to product building (predating the AI boom) is Bret Victor's Inventing on Principle talk (https://www.youtube.com/watch?v=PUv66718DII) and there might be ways we can refresh the ideas in that talk for the AI era (although I still don't have any specific thoughts along those lines yet).
If you are going to do a big build out of something, spec up front at least to have a clear idea of the application architectural boundaries.
If you are adding features to a mature code base, then the general order of the day is: First have the Ai scout all the code related to the thing you are changing. Then have it give you a summary of its general plan of action. Then fire it off and review the results (or watch it, less needed now though).
For smaller edits or even significant features, I often just give it very short instructions of a few sentences, if I have done my job well the code is fairly opinionated and the models pick up the patterns well and I don't really have to give much guidance. I'll usually just ask for a few touchups like introdusing some fluent api nicities.
That being said, I do tend to make a few surgical requests of the AI when I review the PR, usually around abraction seams.
(For my play projects I don't even look at the code any more unless I hit a wall, and I haven't really hit a wall since Opus 4.5, though I do have a material physics simulator that Opus 4.5 wrote that runs REALLY slow that I should muck around in, but I'm thinking of seeing if Opus 4.6 can move it to the GPU by itself first.)
So if I were doing an interview with an interview question. I would probably do a "let's break down what we know", "what can we apply to this", "ok. let's start with x" and then iterate quickly and look at the code to validate as needed.
The same goes for using Claude in a programming interview. If the environment of interview is not representative of how people actually work then the interview needs to be changed.
But the hard part is designing the problem so that it exercises skill.
I’ve interviewed hundreds of engineers for software and hardware roles. A good coding test is based on self-contained problems that the team actually encountered while developing our product. Boil the problem down to its core, create a realistic setup that reflects the information the team had when they encountered the challenge, and then ask the candidate to think it through. It doesn’t matter if they only write notes or pseudo code, and it doesn’t matter if they reach the wrong conclusion. What it’s testing for is the thought process. The fact the candidate has to ask the interviewer questions as though the interviewer is effectively the IDE, is great! The interviewer experiences the engineer’s thought process first-hand. And the interviewer can nudge the candidate in the correct direction by communicating answers that aren’t just typical IDE error messages.
To validate these kinds of questions in advance, I’d often run them on existing team members that hadn’t already been exposed to the real challenge the problem was based on.
They're not unusual because they're legally risky; they're unusual because they don't work well.
Why not? It sounds like a skill issue to me.
>It ideally also requires iterating upon the prompt to refine it before execution.
I don't understand. It's not like you would need to one shot it.
I tried to approach it that way as well, but I am realizing when I let the agent do the implementation, even with clear instructions, I might miss all the “wrong“ design decisions it takes, because if I only review and do not implement I do not discover the “right“ way to build something. Especially in places where I am not so familiar myself — and those are the places where it is most tempting to rely on an agent.
Will agents simply dig the trenches deeper towards the direction of the best existing tests, and does it take a human to turn off the agent noise and write code manually for a new, innovative direction?
Yikes.
By the way, the whole website is strange. Just the name alone "haskell for all".
Many years ago when I tried to learn Haskell (and wrote some haskell code that worked but it was sooooo much harder when compared to ruby or python), one of the few things that appeared early on, aside from the monad barrier, was that many haskell people said that Haskell is deliberately not for everyone. Back then this was when IRC was still en vogue, so I "heard" that via various discussions on #haskell.
I did not fully understand this part, because ... why would you write a language that only a few big brain people could use? I found that elitistic and snobbish, even arrogant.
Only at a later time did I understand one part of the meaning. The "we don't want you here" also means "we don't want YOU to change haskell into some other new meta-variant". I understood this much better when some guys wanted to have ruby embrace types. Then I understood that people not only want to change a language but also want to ruin it; whether on purpose or because they prefer something else (such as their brain embraced types-only code bases) is a separate discussion. I still find the haskell attitude very elitistic but I at the least understand that they don't want everyone to use - and change - Haskell.
> For example, someone who was new to Haskell could edit a Haskell file “as Python” and then after finishing their edits the AI attempts to back-propagate their changes to Haskell.
I like the general idea behind "write in any language, have it work in EVERY language". But the whole AI movement seems more about trying to dumb down people really or make them lazy, in many ways. I have seen people use it to great effect, so I am not at all saying AI has no use cases. What I am however had noticing is that it made many normal folks super-lazy. They type on their smartphone, solution comes out, task finished, move on. That's not necessarily only bad, but it comes with trade-offs. My approach is much slower, but it is systematic and I am in full control of what is documented how and where.
> This is obviously not a comprehensive list of ideas, but I wrote this to encourage people to think of more innovative ways to incorporate AI into people's workflows
Oh he has achieved this in a different way. Now I have another reason to not want AI in my "workflows". The whole website also seems super-strange to me. Has he used AI to write the whole content and layout? It's hard to say because I don't know how it used to be in the past, but the paragraphs and the content seem so strange. I suspect he used AI to generate the layout too; and some of the content as well. We are losing "interaction" with real humans here too (ok ok, there is not a lot of interaction with regards to a static website, but if a blog is written by AI, then that is not really any possibility for interaction with a human - you could not even distinguish WHO wrote the content or made the decisions such as which style to choose and so forth; it looks very fake to me or, at the least, in part. I typically don't see this with other blogs.).
I did not use AI to generate my blog's content nor layout.
Also, the reason my blog is named "Haskell for all" is because I originally created my blog a long time ago to try to make Haskell more accessible to people and counter the elitist tendencies.
Your entire take is super strange and presumptive.
I feel this should be a bigger focus than it is. All the AI code review start up are mostly doing “hands off” code review. It’s just an agent reviewing everything.
Why not have an agent create a perfect “review plan” for human consumption? Split the review up in parts that can be individually (or independently) reviewed and then fixed by the coding agent. Have a proper ordering in files (GitHub shows files in a commit alphabetically, which is suboptimal), and hide boring details like function implementations that can be easily unit tested.
Yes exactly! I have been using this to create a comment on the PR, showing suggested review order and a diagram of how changes relate to each other. And even this super simple addition has been very helpful for code review so far!
(more on this: https://www.dev-log.me/pr_review_navigator_for_claude/)
One idea that comes to mind to make review easier would be to re-create commits following Kent Beck's SB Changes concept - splitting structure changes (tidying/refactoring) and behavior changes (features). The structure changes could then be quickly skimmed (especially with good coverage) and it should save focus for review of the behavior changes.
The challenge is that it is not the same as just committing the hunks in different order. But maybe a skill with basic agent loop could work with capabilities of models nowadays.
Many very low risk applications of AI can add up to high payoff without high risk.