The “what is this trying to do?” has never been harder to answer than before. It creates scenarios where 99% is correct, but the most important area is subtly broken. I prefer it to be human, where 60-80% will be correct, and the problematic areas begin to smell more and more gradually.
In my experience LLMs, at times, may hide the truth from you in a haystack made of needles.
Harder to catch because nothing is factually wrong. You have to ask: could this output have been produced without actually reading my codebase?
I will, often go back, after the fact, and ask for refactors and documentation.
It works. Probably a lot slower than using agents, but I test every step, and it is a lot faster than I would do it, unassisted.
And if you can define "quality" in a way the agent can check against it, it will follow the instructions.
Would that be so bad? "Readability" sure is subjective, so it seems "code quality" is.
Ask 10 programmers what quality a snippet of code is, and you'll get 10 different answers.
All that time it's people arguing with people and wasting time on pure feels. People will get offended and angry and defensive, nothing good ever comes from it.
But when you pick a style and enforce it with a tool like gofmt or black both locally and in the CI, the arguments go away. That's the style all code merged to the codebase must look like and you will deal with it like a professional.
Go proverb: "Gofmt's style is no one's favorite, yet gofmt is everyone's favorite."
In my 15 years of experience I have not worked at a place like this. Those are distractions. Anytime something about style has been brought up, the solution was to just enforce a linter/pre-commit process/blacklist for certain functions, etc. It can easily be automated. When those tools don't exist for particular ecosystems we made our own.
Holy strawman Batman!
Have you ever given a code review? These are the lowest items on the totem pole of things usually considered critical for a code review.
Here’s an example code review from this week from me to a colleague, paraphrased:
“We should consider using fewer log statements and raising more exceptions in functions like this. This condition shouldn’t happen very often, and a failure of this service is more desirable than it silently chugging along but filling STDOUT with error messages.”
Don't you have "format on save" enabled in your editor? When you open a file, change two lines and save -> boom 500 changed lines because the previous programmer had different formatting rules than you. Whoops.
This is why the low totem pole stuff needs to be enforced automatically so that actual humans can focus on the higher stuff that's about feels and intuition - things that are highly context dependent and can't be codified into rules.
After a certain point in your career you don't care what brace style the new dev used, even if the project has lint rules. You do care if critical errors are ignored and possibly incorrect data is returned. These two situations are in no way equivalent, no need to bikeshed the former when discussing the latter.
…
> This is why the low totem pole stuff needs to be enforced automatically so that actual humans can focus on the higher stuff that's about feels and intuition - things that are highly context dependent and can't be codified into rules.
I’m confused, have you switched your position on this topic over the course of this thread? Maybe I’ve misinterpreted your position entirely. If so, my bad.
Code quality is about how well a piece of code expresses what it intends to do. It’s like quality writing.
It "expresses what it intends to do" prefectly well - for the original author. Nobody else can decipher it without spending significant amounts of memory cycles.
Jack Kerouac is "quality writing" as is the Finnish national epic Kalevala.
But neither are the kind you want to read in a hurry when you need to understand something.
I want the code at work to be boring, standard and easy to understand. I can get excited by fancy expressive tricks on my own time.
I’m generally of the opinion that LLM-supplied code is “prolix,” but works well. I don’t intend to be personally maintaining the code, and plan to have an LLM do that, so I ask the LLM to document the code, with the constraint being, that an LLM will be reading the code.
It tends to write somewhat wordy documentation, but quite human-understandable.
In fact, it does such a good job, that I plan on having an LLM rewrite a lot of my docs (and I have a lot of code documentation. My cloc says that it’s about 50/50, between code and documentation).
Personally, I wish Apple would turn an LLM loose on the header docs for their SwiftUI codebase. It would drastically improve their docs (which are clearly DocC).
[EDITED TO ADD] By the way, it warms my heart to see actual discussion threads on code Quality, on HN.
Do you think that no one has tried this over the past 80 years with human programmers, but now with LLMs we can suddenly manage to do it? Why do linters and formal verification and testing exist if we could’ve jus codified coding quality in the first place?
To me, this is like telling a carpenter that we can codify what makes a chair comfortable or not.
Related, it seems to me that there are two types of tests, the ones created in a TDD style and can be modified and the ones that come from acceptance criteria and should only be changed very carefully.
I use a test harness, and step through the code, look at debug logs, and abuse the code, as much as possible.
Kind of a pain, but I find unit tests are a bit of a "false hope" kind of thing: https://littlegreenviper.com/testing-harness-vs-unit/
It's not a massively complex AI monstrosity (it's from 2018 after all) or a perfect solution, but it's a good jumping off point.
With a slight sprinkling of LLM this could be improved quite a bit. Not by having the agent write the documentation necessarily, but for checking the parity and flagging it for users.
For example a CI job that checks that relevant documentation has been created / updated when new functionality is added or old one is changed.
It allows you to write simple unit tests directly in your doc strings, by essentially copying the repl output so it doubles as an example.
combined with something like sphinx that is almost exactly what you’re looking for.
doctest kind of sucks for anything where you need to set up state, but if you’re writing functional code it is often a quick and easy way to document and test your code/documentation at the same time.
That system is an unit test that checks that functions are documented in the documentation. Nothing to do with docstrings.
Even without doctest, generating your documentation from docstrings is much easier to keep updated than writing your documentation somewhere else, because it is right there as you are making changes.
I started working on something today I hadn't touched in a couple years. I asked for a summary of code structure, choices I made, why I made them, required inputs and expected outputs. Of course it wasn't perfect, but it was a very fast way to get back up to speed. Faster than picking through my old code to re-familiarize myself for sure.
This is the same discussion that goes round ad nauseum about comments. Nobody needs comments to tell us what the code does. We need comments to explain why choices were made.
https://testing.googleblog.com/2025/10/simplify-your-code-fu...
The key insight of FCIS is that complicated logic with large dependencies leads to a large test suite that runs slowly. The solution is to isolate the complicated logic in the functional core. Test that separately from the simpler, more sequential tests of the imperative shell.
For instance, two things I'm currently working on: - A reasonably complicated indie game project I've been doing solo for four years. - A basic web API exposing data from a legacy database for work.
I can see how the API could be developed mostly by agents - it's a pretty cookie cutter affair and my main value in the equation is just my knowledge of the legacy database in question.
But for the game... man, there's a lot of stuff in there that's very particular when it comes to performance and the logic flow. An example: entities interacting with each other. You have to worry about stuff like the ordering of events within a frame, what assumptions each entity can make about the other's state, when and how they talk to each other given there's job based multi-threading, and a lot of performance constraints to boot (thousands of active entities at once). And that's just a small example from a much bigger iceberg.
I'm pretty confident that if I leaned into using agents on the game I'd spend more time re-explaining things to them than I do just writing the code myself.
I was shocked recently when it helped me diagnose a musl compile issue, fork a sys package, and rebuild large parts of it in 2 hours. Would've taken me atleast 2 weeks to do it without AI.
Don't want to reveal the specific task, but it was a far out of training data problem and it was able to help me take what would've normally taken 2 weeks down to 2 hours.
Since then I've been going pretty hard at maximizing my agent usage, and tend to have a few going at most times.
As does the reductionist idea that human thinking is something crude in comparison.
It's basically like hiring a new developer for one task and letting them go right after. They don't know your conventions, your history, or why things are the way they are. The only thing they have is what they can see in the code. Your code quality is basically the prompt now.
You can ask the agent to make 10 different solutions in the time it takes you to make 0.5.
Then you review them based on whatever criteria you feel is right and either throw them all away and do it yourself (maybe with inspiration from the other solutions) or pick one to progress further.
This is my take on how to not write slop.
In a blameless postmortem style process, you would look at not just the mistake itself but the factors influencing the mistake and how to mitigate them. E.g., doctor was tired AND the hospital demanded long hours AND the industry has normalized this.
So yes, the programmers need to hold the line AND ALSO the velocity of the tool makes it easy to get tired AND and its confidence and often-good results promote laziness or maybe folks just don’t know better AND it can thrash your context and bounce you around the code base making it hard to remember the subtleties AND on and on.
Anyway, strong agree on “dude, review better” as a key part of the answer. Also work on all this other stuff and understand the cost of VeLOciTy…
If you're not familiar with the patch enough to answer any question about it, you shouldn't submit it for review.
And you’re completely right, humans are still the ones in control here. It’s entirely possible to use AI without lowering your standards.
I generally agree on this as best practice today, though I think it will become irrelevant in the next 2 generations of models.
The useful part is not just asking it to write code, but giving it context: how the codebase got here, what constraints are intentional, where the sharp edges are, and what direction we want to take.
With that guidance, it can be excellent. Without it, it tends to produce changes that make sense in isolation but not in the system.
This is a beautiful articulation of a major pet peeve when using these coding tools. One of my first review steps is just looking for all the extra optional arguments it's added instead of designing something good.
To solve this permanently, use a linter and apply a "ratchet" in CI so that the LLM cannot use ignore comments
Basically just a bunch of .js rules that are executed like:
Which in practice works really well and can be in the loop during AI coding. For example, I can disallow stuff like eslint-disable for entire files and demand a reason comment to be added when disabling individual lines (that can then be critiqued in review afterwards), with even the error messages giving clear guidelines on what to do: The downside is that such an approach does mean that your rules files will need to try to parse what's in the code based on whatever lines of text there are (hasn't been a blocker yet), but the upside is that with slightly different rules I can support Java, .NET, Python, or anything else (and it's very easy to check when a rule works).And since the rules are there to prevent AI (or me) from doing stupid shit, they don't have to be super complex or perfect either, just usable for me. Furthermore, since it's Go, the executable ends up being a 10 MB tool I can put in CI container images, or on my local machine, and for example add pre-run checks for my app, so that when I try to launch it in a JetBrains IDE, it can also check for example whether my application configuration is actually correct for development.
Currently I have plenty in regards to disabling code checks, that reusable components should show up in a showcase page in the app, checking specific configuration for the back end for specific Git branches, how to use Pinia stores on the front end, that an API abstraction must be used instead of direct Axios or fetch, how Celery tasks must be handled, how the code has to be documented (and what code needs comments, what format) and so on.
Obviously the codebase is more or less slop so I don't have anything publish worthy atm, but anyone can make something like that in a weekend, to supplement already existing language-specific linters. Tbh ECMAScript is probably not the best choice, but hey, it's just code with some imports like:
Can personally recommend the general approach, maybe someone could even turn it into real software (not just slop for personal use that I have), maybe with a more sane scripting language for writing those rules.