Hacker News

23 points by fagnerbrack 7 hours ago | 19 comments

> Rework is almost free

Is it? All the electricity and capital investment in computing hardware costs real money. Is this properly reflected in the fees that AI companies charge or is venture capital propping each one up in the hope that they will kill off the competition before they run out of (usually other people's) money?

ramoz 32 minutes ago

> If I had to roll out such a development process today, I’d make a standardized Markdown specification the new unit of knowledge for the software project. Product owners and engineers could initially collaborate on this spec and on test cases to enforce business rules. Those should be checked into the project repositories along with the implementing code. There would need to be automated pull-request checks verifying not only that tests pass but that code conforms to the spec. This specification, and not the code that materializes it, is what the team would need to understand, review, and be held accountable for.

The constant urge I have today is for some sort of spec or simpler facts to be continuously verified at any point in the development process; Something agents would need to be aware of. I agree with the blog and think it's going to become a team sport to manage these requirements. I'm going to try this out by evolving my open source tool [1] (used to review specs and code) into a bit more of a collaborative & integrated plane for product specs/facts - https://plannotator.ai/workspaces/

[1] https://github.com/backnotprop/plannotator

jondwillis 2 minutes ago

I’ve been considering this as well, and trying to get my colleagues to understand and start doing it. I use it to pretty decent effect in my vibe coded slop side projects.

In the new world of mostly-AI code that is mostly not going to be properly reviewed or understood by humans, having a more and more robust manifestation and enforcement, and regeneration of the specs via the coding harness configuration combined with good old fashioned deterministic checks is one potential answer.

Taken to an extreme, the code doesn’t matter, it’s just another artifact generated by the specs, made manifest through the coding harness configuration and CI. You could re-generate it from scratch every time the specs/config change.

“Clean room code generation-compiler-thing.”

tyleo 2 hours ago

The underlying mechanism is still the same: humans type and products come out.

So something which must be true if this author is right is that whatever the new language is—the thing people are typing into markdown—must be able to express the same rigor in less words than existing source code.

Otherwise the result is just legacy coding in a new programming language.

SoftTalker 26 minutes ago

> Otherwise the result is just legacy coding in a new programming language.

And this is why starting with COBOL and through various implementations of CASE tools, "software through pictures" or flowcharts or UML, etc, which were supposed to let business SMEs write software without needing programmers, have all failed to achieve that goal.

tyleo 18 minutes ago

While they failed to achieve the goal outright, I'd argue that each is a concrete step towards it. The languages we have today are more productive than the languages we had decades ago.

I think it's an open question of whether we achieve the holy grail language as the submission describes. My guess is that we inch towards the submission's direction, even if we never achieve it. It won't surprise me if new languages take LLMs into account just like some languages now take the IDE experience into account.

debesyla 20 minutes ago

I found that adding "philosophy" descriptions help guide the tooling. No specs, just general vibes what's the point, because we can't make everyone happy and it's not a goal of a good tool (I believe).

Technology, implementation may change, but general point of "why!?" stays.

montroser 3 hours ago

This could very well be a pattern that some teams evolve into. Specs are the new source -- they describe the architectural approach, as well as the business rules and user experience details. End to end tests are described here too. This all is what goes through PRs and review process, and the code becomes a build artifact.

vips7L 2 hours ago

It just doesn’t work though. Anthropic couldn’t even get Claude to build a working C compiler which has a way better specification than any team can write and multiple reference implementations.

wizzwizz4 3 hours ago

> There would need to be automated pull-request checks verifying not only that tests pass but that code conforms to the spec.

As I understand, this is an unsolved problem.

InsideOutSanta 53 minutes ago

Step 1: solve the halting problem.

soraminazuki 2 minutes ago

Yep, it's not an "unsolved" problem at all. We already have mathematical proof that it's impossible.

Ecys 3 hours ago

this is actually precisely what humans' roles will be.

"is this implementation/code actually aligned with what i want to do?"

humanic responsibility's focus will move entirely from implementing code to deciding whether it should be implemented or not.

u probably mean unsolved as in "not yet able to be automated", and that's true.

if pull-request checks verifying that tests are conforming to the spec are automated, then we'd have AGI.

phainopepla2 55 minutes ago

> humanic

No, thank you

wizzwizz4 3 hours ago

This is a task that humans are exceptionally bad at, because we are not computers. If something uses the right words in the right order such that it communicates the correct algorithm to a human, then a human is likely to say "yup, that's correct", even if an hour's study of these 15 lines reveals that a subtle punctuation choice, or a subtle mismatch between a function's name and its semantics, would reveal that it implements a different algorithm to the expected one.

LLMs do not understand prose or code in the same way humans do (such that "understand" is misleading terminology), but they understand them in a way that's way closer to fuzzy natural language interpretation than pedantic programming language interpretation. (An LLM will be confused if you rename all the variables: a compiler won't even notice.)

So we've built a machine that makes the kinds of mistakes that humans struggle to spot, used RLHF to optimise it for persuasiveness, and now we're expecting humans to do a good job reviewing its output. And, per Kernighan's law:

> Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?

And that's the ideal situation where you're the one who's written it: reading other people's code is generally harder than reading your own. So how do you expect to fare when you're reading nobody's code at all?

Ecys 2 hours ago

i meant on a higher, agentic level where the AI's code is infallible. and that's going to happen very soon:

say: human wants to make a search engine that money for them.

1. for a task, ask several agents to make their own implementation and a super agent to evaluate each one and interrogate each agent and find the best implementation/variable names, and then explain to the human what exactly it does. or just mythos

2. the feature is something like "let videos be in search results, along with links"

3. human's job "is it worth putting videos in this search engine? will it really drive profits higher? i guess people will stay on teh search engine longer, but hmmm maybe not. maybe let's do some a/b testing and see whether it's worth implementing???" etc...

this is where the developer has to start thinking like a product manager. meaning his position is abolished and the product manager can do the "coding" part directly.

now this should be basic knowledge in 2026. i am just reading and writing back the same thing on HN omds.

wizzwizz4 52 minutes ago

The AI's code is not going to be infallible any time soon. It's been "very soon" for the past 4 years, and the AI systems are still making the same kinds of mistakes, which are the mistakes you'd expect from a first-principles study of their model architectures. There's no straightforward path to modifying the systems we have now, to make them infallible.

Ecys 3 hours ago

very true. and we already know and agree with this.

user experience/what the app actually does >>> actually implementing it.

elon musk said this a looong time ago. we move from layer 1 (coding, how do we implement this?) to layer 2 thinking (what should the code do? what do we code? should we implement this? (what to code to get the most money?))

this is basic knowledge

duskdozer 32 minutes ago

Elon Musk has been saying Teslas would have fully autonomous self-driving within 1-3 years since 2013