It's why famously, programmers always say, the code is the documentation, because writing detailed docs is very tedious and nobody wants to do it.
Behaviour Driven Development or Spec Driven Development are, loosely, forms of Test Driven Development where you encode the specification into the code base. No impedance, full insight, formality through code.
I think people get really dogmatic about “test” projects, but with a touch of effort a unit test harness can be split up into integration tests, acceptance tests, and specification compliance tests. Pull the data out as human readable reports and you have a living, verifiable, specification.
Particularly using something comparable to back-ticks in F#, which let test names be defined with spaces and punctuation (ie “fulfills requirement 5.A.1, timeouts fail gracefully on mobile”), you can create specific layers of compiled, versioned, and verifiable specification baked into the codebase and available for PMs and testers and clients and approval committees.
Second is that I'm doing a lot less "seat of my pants prompting" and doing more engineering and ideating, which was a big goal of mine. So I'm feeling less psychotic there too.
And sort of tangentially to that, I think a significant subset of devs actually are willing to just prompt their way to nirvana, day in and day out. I'm not. I think the spec will carry a lot of weight for a long time. Maybe they will get further than I give them credit for? Maybe the whole digital world becomes a single chat box?
1. Don’t write in yaml. It’s really hard for humans. Write in markdown and use a standard means to convert to lists / yaml.
2. Think beyond you writing your own specs - how does this expand into teams of tens or more. The ticketing system you have (jira? Bugzilla) is not designed for discussion of the acceptance criteria. I think we are heading into a world of waterfall again where we have discussions around the acceptance criteria. This is not a bad thing - is used to be called product management and they would write an upfront spec.
If this new world of a tech and a business user lead the writing of a new spec (like a PEP) and then then AI implements it and it’s put into a UAT harness for larger review and a daily cycle begins, we might have something.
Good luck
This seems like the answer to that thought!
Also, I mainly pursue these tools so that I can have AI accelerate this process and broker an agreement after negotiating specs with the agent.
The one thing I like that OP brings is to tie specs and code together. The openspec flow does help a lot in keeping code synced with specs, but when a spec changes, AI needs to find the relevant code to change it. It's pretty easy to miss something in large codebase (especially when there is lots of legacy stuff).
Being able to search for numbered spec tags to find relevant bits of code makes it much more likely to find what needs to be changed (and probably with less token use too).
https://haskellforall.com/2026/03/a-sufficiently-detailed-sp...
An executable spec like gherkin or hitchstory is config - it has no loops or conditionals. There are a number of rarely recognized benefits to this.
If you're genuinely confused, and haven't tried Opus for coding, then it's not surprising you're confused!
It is also okay for you to just not like the idea of LLMs for coding (but say that!).
I have seen the same idea with processes, pipelines, lists, bullet points, jsons, yamls, trees, prioritization queues all for LLM context and instruction alignment. It's like the authors take the structure they are familiar with, and go 100% in on it until it provides value for them and then they think it's the best thing since sliced bread.
I would like, for once, to see some kind of exploration/abalation against other methods. Or even better, a tool that uses your data to figure out your personal bias and structure preference for writing specs, so that you can have a way of providing yourself value.
First it was choice of editor: people were micro optimizing every aspect of their typing experience, editor wars where people would literally slaughter over suggesting another camp.
Editor wars v2: IDEs arrived and second editor war began.
Revenge of the note taking apps: Obsidian/Roam/Joplin/Apple Notes/Logseq. Just one plugin, just one more knowledge graph, bro, and I’ll have peak productivity. 10x is almost here.
AI: you’re witnessing it now.
Do people NOT have anything else in life? How are y’all finding time to do all of this shit? Are you doing it on company time? Do you have hobbies, do you learn foreign languages, travel, have kids or spouses, drive a car, other thousand “normie” things outside of staring at the freaking monitor or thinking about this shit 24/7? Did I miss the invention of a Time Machine?
How are any of those things even remotely as interesting as arguing with people about an Emacs config?
People are people.
Otherwise, I like the idea of machine-readable specs.
This industry has become a parody of itself, and people are celebrating.
Don't we just love the hard fact conclusions based on sample size N=1 and hand-waving arguments?
This industry is just getting more and more bonkers.
fyi language alone can’t define/describe requirements which is why UML existed.
You could deterministically process any UML diagram into a prose equivalent.
And in fact you couldn't do the other way around (any prose -> UML) because UML is less powerful than natural language and actually can't express everything that natural language can.
Can it also fully describe a composition by Bach or a Rembrandt's painting? In some weird, overly complex way it probably 'could', but it would be very painful. That's why we pick other forms of expression. We use other forms of expression to compact and optimise information delivery. Another benefit is that we cut out the noise. So yes UML cannot describe everything natural language can, but then again why should it - it was designed as a specific framework for designing relations between objects. Not more and not less. Similar for sequence diagrams or other forms of communicating ideas efficiently.
> My point is, the spec must live somewhere, even if you don’t write it down. The spec is what you want the software to be. It often exists only in your head or in conversations. You and your team and your business will always care what the spec says, and that’s never going to change. So you’re better off writing it down now! And I think that a plain old list of acceptance criteria is a good place to start. (That’s really all that `feature.yaml` is.)
Did I miss something or is everyone back in 1970s, working in waterfall processes now?
You don't plan to follow the plan. You plan in order to understand the whole problem space. Obviously no plan survives contact with reality.
Another point of view is that LLM:s perform to an extent on the same level as outsourcing does. This interface requires a bit more contract mass than doing everything within single team.
“Specsmaxxing” is basically the right response to this. When you can't rely on authorial memory, you have to put the intent somewhere durable. Specs become the source of truth by default if we continue down the road of AI generated code.
1: https://ossature.dev/blog/ai-generated-code-has-no-author/
Unlike you, I wish for the LLM to do as much of the work as possible -- but "as possible" is doing a lot of work in that sentence. I'm still trying to get clear on exactly where I am needed and where Opus and iterations will get there eventually.
It has really challenged me to get clearer on what a requirement is vs a constraint (e.g., "you don't get to reinvent the database schema, we're building part of a larger system"). And I still battle with when and how to specify UI behaviours: so much UI is implicit, and it seems quite daunting to have to specify so much to get it working. I have new respect for whoever wrote the undoubtedly bajillion tests for Flutter and other UI toolkits.
1. Specifications that live outside the code. We have a lot of code for which "what should this do?" is a subjective answer, because "what was this written to do?" is either oral legend or lost in time. As future Claude sessions add new features, this is how Claude can remember what was intentional in the existing code and what were accidents of implementation. And they're useful for documenters, support, etc.
2. Specifications that stay up to date as code is written. No spec survives first contact with the enemy (implementation in the real world). "Huh, there are TWO statuses for Missing orders, but we wrote this assuming just one. How do we display them? Which are we setting or is it configurable?" etc. Implementer finds things the specifier got wrong about reality, things the specifier missed that need to be specified/decided, and testing finds what they both missed.
I have a colleague working on saving architecture decisions, and his description of it feels like a higher-abstraction version of my saving and maintaining requirements.
I am also stealing the idea of talking to LLMs as if it's an email. So funny, we need to be joymaxxing a bit more I think :)
You probably don't want people associating your work with abusing crystal meth and hitting yourself in the face with a hammer.
For anyone missing the reference, SNL has a pretty good explainer:
https://www.youtube.com/watch?v=4XMPLdiXB1k