The software is a tool specifically designed around my requirements of managing lectures that need to be prepared, managed, have presentations, grading etc. I wanted one big space where i can quickly access all related data in a workspace, fold and unfold important aspects while also editing and moving contents across multiple days/lectures.
The first version is a vscode plugin, which i now use since about 4 months without or with minor modifications to manage my lectures and private data. The second version is a standalone application which improves on the ideas of the first version and goes a few steps further.
AI can make you something that looks like its running quickly. But when you try to finish it takes way longer then you'd think. You need to specify every little detail. You need to make its KISS and DRY etc. You let it analyze the application structure and simplify and cleanup nearly the same amount of times as you add features. While fixing bugs you might need to run the same thing multiple times and revert any unrequired changes. You need to think about good level of debug logs and ways that the program can help you find errors and report them quickly.
I hope my project will be ready in about 2 to 3 months. The current version is according to a quick analysis over 850 files with 250'000 lines of code.
I spent about 2000$ on ai subscriptions in that time. 200$ claude for a while down to 100$ a month now. 20$ to openai which is very important for architecture and reviews. 20$ on tests with other ai's, but i rarely use them in the works. I also spent 1500$ on 2 * 3090's to hopefully have a local ai agent in the future.
I spend about 2 to 4 hours each day (including weekends) to check that app and write prompts.
I would never have been able to create such a large and complex project next to my other tasks and i am very confident that the final product will be good enough for productive work.
This is the correct way to code with AI. If you don't understand the code, we're not yet in a point where the model can do it all, well it can, but where you can confidently move forward knowing its been thoroughly built and reviewed by a model up to par. Some day maybe, but not currently.
Sure, Claude Code and Codex can write (most of) the code for me - but the amount of technical knowledge I need to decide what and how to build remains enormous.
As an example: I'm working on a system right now that works like Claude Artifacts, allowing custom HTML+JS apps to safely run in an iframe sandbox inside a larger application.
Just understanding why that's a useful thing that can be built requires deep knowledge of sandboxing, security threats, browser security models, and half a dozen different platform features that have been evolving over a couple of decades.
A vibe coded without that technical understanding would have zero chance of prompting such a thing into existence, no matter how much guidance the LLMs gave them.
It really saddens me to see some developers talk about literally quitting their careers over AI, right when the benefits of existing deep technical experience have never been more valuable.
There's an interesting repository with 63600 stars on GitHub (1). The developer of the repository is No 1 at the GitHub's trending contributors list (2). However, it seems like the application isn't what it's described to be (3), and the developers, on their end, are unable to clearly answer whether this is real or not, as it's just messy LLM output.
Proof that the suit alone doesn't make anyone Iron Man.
1. https://github.com/ruvnet/RuView
So, a nonfunctional project is created by AI and AI is used to attests its nonfunctionality.
What a brave new world.
AI creates a delusional product, people don't trust their own opinion regarding it and follow it, another AI is needed to prove that the product is unreal.
In the loop.
Better headline: "Why AI Multiplies Developer Skills Rather Than Replacing Them"
I work with two pretty green developers. The rate that they can make a mess is now phenomenal. And the sense of confidence the tools give them with early successes, means any experience I might have to offer means less now. Which is ok, I’m not going to be that “my experience has to be useful to you so I still fell relevant” old guy. But I do find myself curious how “lessons are learned” that lead to greater and greater tool exploitation in this brave new world.
Those are people who weren't making it to the MVP stage before LLMs.
There is no doubt that highly technical people are getting A LOT more out of LLMs than people without dev experience, in an absolute sense. I think it's less clear in a relative sense.
A question I also ask myself a lot: What are the skills I'm leveraging, exactly, as a highly experienced developer that's now doing a lot of vibe coding?
1) I'm choosing good technology for the task, and thinking about what LLM-agents are good at and choosing technology that they can work well with.
2) I'm choosing good workflows for the LLM-agent, starting a new context at the right time, having it test things, making sure it has logging that it can inspect, making sure it can operate the application in a way that it can debug and inspect it.
3) I'm thinking about the code even though I'm not looking at it, I'm telling it how I want things implemented, I'm telling it how to debug things.
I think these are all hard things for non-developers to do, but I also think non-developers will be able to replicate a large chunk of #1 and #2 relatively quickly. I only have to figure out that it's valuable to tell the LLM-agent to use playwright when working on web page visuals once, and then I can tell you to do that too. Or the coding agents will come with that knowledge built-in (to the model or as a builtin skill or whatever). Knowledge around this will accumulate and become easier for non-developers to access, and in many cases be builtin to the models or harnesses.
Someone needs to watch iron man 3...
What folks seem to avoid is that a Junior (in ANY subject) has the ability to LEARN so much faster with an AI research assistant, and that becoming an expert has accelerated for those with the personal stamina to dig deep (this as a requirement hasn't changed). I spend just as much time with my AI tooling asking questions as I do asking it to "build" or "fix" things. "How does this work?". "Can you suggest other tools?".
I think some people always think about AI as an input / output relationship, when a lot of the time, the fiddling in between, with or without AI was always the important part. Yes people will suck in the beginning, against they always did. I think the good folks though will suck for a MUCH shorter time than I did getting into things.
A lot of people will drop out and get discouraged. That happened before too. Learning things requires persistence. I think the only real case to be made is that AI's sense of immediate pleasure can neuter people away from running into friction. AI natives likely won't understand friction and question it.
I’m not seeing this. And based on what we’re seeing at the university level, I’m not expecting to.
The analogy is unlimited typing in Gmail won’t make you a better writer or typesetter on its own.
If anything it allows to be as lazy as possible. I have not seen anyone digging deeper with the AI tools.
Companies with AI will move faster than those without.
AI itself could subsume what we collectively consider as Engineering Taste.
AI is faster at what it does. So even if a junior costs less on his own than AI. Paying extra for AI means gaining first mover advantage.
This is a testable hypotheses with severe lack of citations. Intuition would argue the opposite. We learn by using our brains, if we offload the thinking to a machine and copy their output we don‘t learn. A child does not learn multiplication by using a calculator, and a language learner will not learn a new language by machine translating every sentence. In both cases all they’ve learnt is using a tool to do what they skipped learning.
For such a person, I believe AI can be very empowering for learning. Like Google, wikipedia and stack overflow, Arxiv before it - AI tools give access to a lot of information. It allows to quickly dig deep into any topic you can imagine. And yes, the quality is variable - so one needs to find ways to filter and synthesize from imperfect info. But that was also the case before. Furthermore AI tools can be used to find holes in arguments or a paper. And by coding one can use it to test out things in practice. These are also powerful (albeit imperfect) learning tools. But they will not apply themselves.
And as we are talking about junior developers it is safe to assume your conditions (1), (2), and (4) are all true, if any of them are false, then why did that person apply for and get a job as a junior developer? As for condition (3), all workplaces eventually hires a person who does not fulfill this, then they either fire that person, or they give them a talk and the developer grows out of it and changes their behavior to fulfill that condition.
Aside: you listed 4 conditions for learning. I am not sure these are actually conditions recognized as such by behavior science. In fact, I doubt they are and that these conditions are just your opinions (man).
I like to think of it as a normal distribution, the further away a programmer is to the right of the mean, the more their benefit. It's almost like it's their standard deviation squared (σ²). So someone like Matt Perry (as OP mentioned), who is a >99.99% programmer for argument's sake and is therefore four standard deviations away from the mean... Matt gets a (4×4) 16x multiplying effect on their productivity.
Someone who is a slightly above average programmer might see a 2 or 3x boost on their productivity, which is huge(!) and might also make them fear for their job. Which tracks with the level of moral panic we are seeing and experiencing. This math kinda still holds up for "bad programmers" too (i.e. left of the mean), as in they still see a boost to their productivity (negative squared is a positive number)... but there's something iffy about their results. The technical debt is unmaintainable and because they don't _understand_ the systems that they're operating in, they end up in the "3 hour" prompt loops that the OP refers to.
> Similarly, if Matt Perry handed me the keys to the Motion repository and told me to take over, I wouldn’t have the same results even though I have access to the same set of LLM tools.
The question is -- how long is this multiplier going to exist for? Some people would wager "for the foreseeable long-term future"; some people think it will widen further; and some people think it will diminish or god forbid even collapse. It feels like most arguments at the moment (like this article's) are that the humans who "know what they are doing" will be able to baton the hatches and avoid being usurped by ever-capable models. I saw it in a café yesterday: someone was using a coding agent to build a marketing website for their project, getting more and more frustrated by not getting the outcome they wanted. Their friend typed a couple of sentences on their keyboard and got a "Dude! How did you do that? That was sick!" a minute or so later. "I used to build websites" the friend said. -- The friend 'knew what they were doing'.
How much longer is knowing what you're doing going to be a moat?
For a looooonnnnngggg time, unless there's massive progress in AI research.
Fundamentally, next token prediction is limited. Granted, I'm pretty amazed at how well it's done, but if you can't activate the right parts of the models (with your prompts), then you're not going to get good results.
And to be fair, for lots of things this doesn't matter. Steve in Finance or Mindy in Marketing can create dashboards that actually help them, and the code quality mostly doesn't matter.
For stuff that needs to be shipped, monitored and maintained you still need to know what you're doing.
The question that really matters is whether that will continue to be the case. My guess is that technical expertise matters less over time, and the ability to specify the desired outcome is eventually the only thing that becomes important. But I could be wrong! The direction this all goes is pretty fuzzy in my mind.
To me, I don't see how this will ever not be an advantage. All software requires constraints. Some of those constraints might be objective (scale, performance, etc.) but a lot of them are subjective and require active decision making (architecture, UI, readability).
So if there was only one way to do something or only one desired output, then yes I think models would surpass humans. But like art, I don't think there is a objective truth to software and because of that, humans get the opportunity to play an important role.
Now whether that is valued from a business/industry perspective is a question that I think we all know the answer to unfortunately.
But the general argument of 'we will need skilled operators' still holds.
For every 'junior' displaced by AI, there will be some other kind of relevant role they're needed for.
Agentic workflows, integration, all the data science stuff, new UX paradigms.
I don't think the job numbers will dwindle, just shift.
What’s not clear to me is: if writing more code per engineer is possible, does that result in fewer engineers or just more software, especially in areas that traditionally got squeezed: UX, testing, DevEx, documentation, etc. Perhaps the bar just gets raised?
AIs have skills humans aren’t good at like nerding out on technical details.
That’s not a perfect map because I’m spitballing. However there is a symbiosis.
I am not sure I am productive anymore with AI as I am up to 125 repos and agents most of which are tools for managing AIs and things break frequently that it feels like spinning plates.
I spent two months in November and December last year writing by hand a fundamental library to constrain how the AIs build clis. That did make things move a lot faster but for those two months I felt the slowness.
I think it will always be like this. It’s the nature of paradigm shift to shift.
What is the llm equivalent?
The problem is just that the question is not whether "human developers will be necessary in the near future", it's "how many human developers will be necessary in the near future" - managers wanting to exploit the efficiency gains by deciding that fewer developers can now do more work "thanks" to AI.
You cannot hold a computer liable for any of those reasons. You can, however, sue the human that built or used the AI. So those concerns shoudn't be any different with or without AI. The same problems will be here either way. If you really care about those problems, you would demand your representatives in government actually enshrine those things in law, with some teeth, to ensure companies prevent problems with them. If you don't do something about those problems (with or without AI), then it's clear by your actions that ethical/environmental/safety concerns aren't actually that important to you.
I've found I can prevent the LLM, in many cases, from thrashing on a bug/feature for long periods of time by switching into plan mode and, even in the middle of a conversation, having it reassess the structure around the problem, first. If you keep prompting about the same bug, it may keep producing variations of the problem code. But forcing it to stop and 'think' for a bit, has yielded much better results.
- Lesser overall engineers needed -> lesser demand of human engineers -> lower compensations
- insufficient training at junior levels.
- longer time to productive human engineering skill.
These are playing out right now, and a concern for all engineers in the industry. IronMan amplification don't address the above
1. AIs aren't yet good at architecture.
2. AIs aren't yet good at imagining technically exciting stuff to build.
And I agree that there's still space there to build a career in the short to medium term (plus Jevons Paradox). When both those points are no longer true we are certainly much closer to, dear I say it, agi. I suspect that (1) will be solved for somewhat limited domains in the near future using harnesses. And it could snowball from there.
E.g ‘productivity’ is seemingly increasing but what is the effect on a firms financial position? It’s all speculative and experimental right now.
I used to be a PM and am technically literate enough but can only very minimally write code. I have been using LLMs to build (or try to, at least) internal tools for my business since GPT-4.
In the early days, I'd get a little ways, then the LLM would start breaking things, and I'd try but fail to get it to fix things. But over successive generations, I was increasingly able to get it unstuck by offering suggestions on where it may have gone wrong. With Opus 4.7, I don't even really have to do that - if something isn't working it's usually sufficient to just tell it what's broken. It can figure out how to fix it without my input. And of course fewer things are broken in the first place.
So I think I'm very well positioned to understand how these things are improving - better able to get the LLM to do what I want than the post OP quoted from /vibecoding (though I am 99% sure that post is actually AI slop), but less so than most of the people posting in this thread. As they've improved, whatever ability I have to guess at the causes of problems based on my experience having seen things go wrong with products I've PMed has become less necessary to getting the right outcome.
I expect that trend to continue - increasingly the LLM won't need the guidance of people with a great deal of technical expertise. I basically no longer have to attempt to diagnose problems in order to get them fixed, though with the caveat that I am building internal tools for which I am the only user, so certainly much simpler in scope than the stuff OP is talking about.
> Without guidance, LLMs tend to paint themselves into a corner, because they’re generating code to solve individual prompts, not thinking holistically about an application’s architecture.
The crux of what I'm trying to say here is that I absolutely believe that this line is 100% true today, but I would be deeply cautious about assuming that it will continue to be true given the improvements in LLMs over the past few years.
Not the most talented developer, but this has been pretty much my experience as well. Just keep it under control, know what and why its doing at every step, read the code, and then it will boost your productivity.
I didn't think this 6 months ago but today after what I've seen these models debug and accomplish in established, messy production monoliths, I'm fully convinced even the worst vibe coders are only a year or two away from being able to actually create something from scratch and have it not blow up 50 files in.
So I guess I take the totally opposite stance, today's AI is the worst AI will ever be at coding, and I believe the vested interests behind AI do not plan on making it any worse at this task, so...
This sentiment will stray further from the truth as time goes on.
Sure, it's a multiplier for those who are already skilled, but for those who are unskilled, it is capable of taking you from 0 -> 1+.
The ones currently benefiting from AI are the ones who (i) have a general understanding of how an AI works and experience with using it and (ii) have a very generic understanding of what it is they're trying to do (programming, most likely) and know the limits of their tools, but don't know how to actually do anything meaningful.
The whole point of AI is to open the door of complexity to normies; they are the ones benefiting most from it. For a skilled developer, it may make a 1hr task -> 5 mins; for a normie, it makes something which was utterly impossible into -> now within his reality to achieve. the difference for normies is just more life-changing.
If you think of skilled developers as the ceiling and normies as the floor, AI raises the floor higher by giving normies more capability, which makes the ceiling seem less impressive. But eventually the floor will surpass the ceiling, and then it'll be a matter of who can operate AI better/how good AI is.
I understand the need to make a living but hard to take this stuff seriously/sincerely with the, "and buy my course!" angle.
Maybe not the same agency you would expect from a human being, but if you put them in a ralph loop they can go far, far away, and mostly because on how we build our world in the pre-llm era: do you need to order something (or you want to hire a hitman)? -> you can go do it on a web site or via whatsapp or by calling some API.
The point is they mostly wind up somewhere stupid, and it takes expertise to spot and correct that. (Maybe that changes with further development.)
It's essentially a "brute force" approach, but in most cases, they only need to succeed once.
The article’s point is this is not true. They wind up in bullshit attractors where they hit a wall and then get lost within their muddled context window.
> they only need to succeed once
Yet they don’t. Not on their own. Like, you haven’t had an LLM get stuck in a stupid loop where you point out the flaw and then it gets unstuck?
Y’all sound the same:
> Let’s start with an uncomfortable truth: AI models have become shockingly good at completing a wide variety of programming tasks. They’re certainly not perfect, but in many cases, they’re good enough. I’m not happy about this, for a wide variety of ethical/environmental/safety reasons, but it is what it is.
More Inevitabilism posting with the “not happy with” but is-what-it-is washing of your hands. At a distance you all look the same: an army of posts insisting the obvious, the inevitable; who knows why you all need to sound the same and say the same thing, but I guess it is to keep it top-of-mind for us alls. It is what it is.
> [...] It’s never been easier to learn about new topics, with tools like ChatGPT that can answer any questions you have. But that only works when you know what questions to ask. My course offers a curated curriculum that will introduce you to all sorts of new techniques. I think you’ll be amazed at what you can build after taking the course.
Okay, sure. I ask these LLMs things too (c.f. outright --be coding) so that’s not necessarily incongruent with the stance of being not-happy-about-this.
Everything these days is either the greatest thing ever or the worst thing ever. All the stuff in the middle has vanished. Very few it seems acknowledge AI as being a useful tool. It's either "We're all being replaced" or "The technology is all slop" and everyone talks over each other like it's the Super Bowl and their teams are battling it out.
It would be nice if we could just look to the opportunities this tech offers and focus on that.
When you see rising inequality, don't just cheer because you happen to win for now.. maybe think about the future and also others..
Seemingly every AI pilled programmer who writes a blog post on AI's impact on software engineering has the same philosophical argument, and it's wording changes slightly every 6-12 months to reflect the newest models capabilities.
In 2023 it was: "AI is just autocomplete. It can't code whole blocks on it's own."
In 2024 it was: "AI is only good for scaffolding new projects, or boiler plate code. It can't write the application whole sale."
Since November 2025 it's been: "AI is only writing the code for us. It can't manage architecture, or do the long term planning required for real world applications."
In 6-12 months when the AI is doing an increasing amount of the architecture and high level planning, what will AI pilled programmers fall back on then?
So while the author's points are completely true and valid, an executive will say "True, but Claude will get smarter faster than these problems and in 3 years it'll fix everything" and there's absolutely nothing you can say or do in response to this.
The code it generated was awful. The kind of garbage that people who don’t know any better would ship: it looked right and it worked. But it was instantly a maintenance dead end. But I had an effortless time converging on a design that I wouldn’t have been able to do on my own (I’m not a designer). And then I had a reference design and I manually implemented it with better code (the part I am good at).
In the Tailwind thread the other day I was explicitly told that the intended experience of many frameworks is "write-only code" so maybe this is just the way of the future that we have to learn to embrace. Don't worry how it's all hooked up, if it works it works and if it stops working tell the AI to fix it.
It's kind of liberating I guess. I'm not sure if I've reached AI nirvana on accepting this yet, but I do think that moment is close.
Which is probably why so many random buttons in microsoft/apple/spotify just stop working once you get off the beaten path or load the app in some state which is slightly off base
The number of edge cases in a software is not fixed at all. One of the largest markers of competence in software development is being able to keep them at minimum, and LLMs tend to make that number higher than humanely possible.
The people pushing AI _over_ humans never thought they were. They just don't care about 'good' or 'bad', only 'time-to-market'. A bad app making money is better than a good one that isn't deployed yet. And who cares about anything past the end of the quarter? That's the next guy's problem.
Humorously, this could be the result of LLMs vacuuming up all the sentiment on the web that the code that LLMs produce is trash-tier.
In terms of "junior dev following" it would be the model trying to think and write it as a Senior or Staff Level engineer would.
I did an experiment on this a few weekends ago and Codex for example was a lot more adversarial and thorough in its review when given Claude-authored code compared to when given the same code with "I wrote this, can you review it?"
Prototypes are practically free now. You can ask the AI try each architectural or stylistic option and just see which code you like better.
To your point, another interesting note is that rewriting and rearchitecting are also very good.
One pattern I like is to vibe code a set of solutions, pick the approach, then backfill tests and do major refactors to make it maintainable.
Here the skill is knowing what good architecture looks like, and knowing how to prompt and validate (eg what level of tests will speed up the feedback cycle or enable me to make the LLM’s changes legible).
To be fair the “ready, fire, aim” approach of rapid prototyping has been known for a long time, but you need to be quite quick at coding in old world for it to work well IMO.
- first I've created a skill how the architecture of the system should look like
- I'll tell the LLM to follow the guidelines; it will not do that 100%, but it will be good enough
- I'll go through what it produced, align to the template; if I like something (either I've not thought about the problem in that way, or simply forgot) I add that to the skill template
- rinse and repeat
This is not only for architecture of the system, but also when (and how to) write backend, frontend, e2e tests, docs. I know what I want to achieve = I know how the code should be organized and how it should work, I know how tests should be written. LLMs allow me to eliminate the tediousness of following the same template every time. Without these guardrails it switches patterns so often, creating unmaintainable crap
Bear in mind - the output requires constant supervision = LLM will touch something I told it not to touch, or not follow what I told it to do. The amount of the output can also sometimes be overwhelming (so, peer review is still needed), but at this point I can iterate over what LLM produces with it, with another LLM, then give to a human if it together makes sense
At the moment, we understand the basic tech, could reasonably DIY, but choose not to knowing full well there's a mess of understandable code somewhere we could go clean up but dont want to. We accept fast iterations because we know roughly the shape of how it "should be" and can guide an automated framework towards that. This is especially true on our own projects or something we built originally! Stark/Iron man knew/moved, the suit assisted by adding momentum.
We're riding our "knowledge momentum".
If companies can hold out long enough, that knowledge completely fades, and the tool is all you have. At that point, they are locked in. Then it's not Iron man, it's an Iron lung (couldn't resist!)
I love the Iron lung reference. Perfect.
I keep asking myself the same questions, and the conclusion I keep coming to is the clean modeled structure we want to see is for humans to maintain and extend, but the AI doesn't need this.
There's definitely an efficiency angle here where it's faster for AI to go from a clean modeled solution to the desired solution because it's likely been trained on cleaner code. Is this really going to matter though?
The best argument I can come up with is the clean modeled solution is better for existing development tools because it's less likely to get confused by the patch work of vibes throughout the code; but this feels like it ultimately becomes an efficiency concern as well.
This just might be the new reality, and we need to stop looking behind the curtain and accept what the wizard presents us.
This does not match my experience. I do a lot of AI-assisted coding at this point, and what I've seen is that when the AI is asked to extend or modify existing code, it does a much better job on clean, well-structured and well-abstracted code.
I think the reason is simple, and tracks for humans as well: well-structured code is simply easier to understand and reason about, and takes a smaller amount of working-set memory. Even as LLMs get better with coding, I expect that they would converge on the same conclusion, namely that good structure + good abstractions make for code that is more efficient to work with.
I think it’s all about the structure you use to work in and how you use the model. We are shipping better, more human friendly code, with less bugs, then we ever did before and doing it at 1/10 the cost before LLMs.
But we are definitely not vibe coding, and the key seems to be devs with years of experience managing teams, managing the LLM instead. Basically you create the same kind of formal specifications, conventions, and documentation that you would develop for a project with two or three teams, then use that to keep the project on the rails recursively looping back through the docs as you go along. I’ve only had to back out of a couple of issues over the last year, and even though that cost a couple of hours, it was still extremely cheap.
Meanwhile we are shipping at 4x speed with 1/4 the labor, and the code is better than it was because the “overhead” of writing maintainable, self documented code has inverted into the secret ingredient to shipping bug free code at unprecedented speed.
If you just explain the standards to which you want the code written, use a strict style guide, have a separate process that ensures test coverage (not in the same context) you can get example quality code all the way through. Turns out that’s also in the training data.
I think we eventually end up at the tool approach via vendors providing the tools to other companies, but it still feels like there's a long road ahead to get there.
That's not true. The LLM performance will degrade as the codebase gets messier as well. You get to a point where every fix breaks something else and you can't really make forward progress.
Yes, you might be able to get a bit further with a messy codebase just because the LLM won't complain and will just grind through fixing things, but eventually it will just start disabling failing tests instead of actually fixing things.
Of course that just leads to: what’s the best way to achieve that goal? Through elegant code or adding lots of tests? Which is a debate from long before LLMs existed.
This is how societies become shittier. People who are ostensibly responsible for doing their jobs not giving a damn about quality.
LLMs have a limit to how deep they can understand and refactor architectural issues.
That limit is far, far lower than a human's.
I'm not a designer either, but I've been around designers long enough to recognize when something is bad but just not know what is needed to make it better/good. I've taken time to find sites that are designed well and then recreated them by hand coding the html/css to the point that I consider myself pretty decent at css now. I don't need libraries or frameworks. My css/html is so much lighter than what's found in those frameworks as well. I still would not call myself a designer, but pages look like they were designed by a mediocre designer rather than an engineer :shrug:
Which I think is perfectly worthy of exploration. Some people want to check in the prompts. Or even better, check in a plan.md or evenest betterest: some set of very well-defined specifications.
I'm not sure what the answer will be. Probably some mix of things. But today it is absolutely imperative that the code I write for the case I wrote it in is good quality and can be maintained by more than just me.
It's tempting to move out a layer and try making prompts and plan.md the "source code", and then the generated actual-source-code becomes just another ephemeral form of "intermediate representation" in the toolchain while building the final executable product. But then how are you versioning the toolchain and maintaining any reasonable sense of "stability" (in terms of features/bugs/etc) in the final output?
Example: last week, someone ran our "LLM inputs" source code through AgentCo SuperModel-7-39b, and produced a product output that users loved and it seemed to work well. Next week, management asks for a new feature. The "developer" adds the new feature to the prompting with a few trial iterations, but the resulting new product now has 339 new subtle bugs in areas that were working fine in last week's build owing the fact that, in the meantime, AgentCo has tweaked some weights in SuperModel-7-39b under the hood because of some concern about CSAM results or whatever and this had subtle unrelated effects. Or better yet: next month, management has learned that OtherCo MegaModel-42.7c seems to be the new hotness and tells everyone to switch models. Re-building from our "source" with the new model fixes 72 known bugs filed by users, fixes another 337 bugs nobody had even noticed yet, and causes 111 new bugs to be created that are yet-unknown.
If you treat the output source code as a write-only messy artifact, and you don't have stable, repeatable models, and don't treat model updates/changes as carefully as switching compiler vendors and build environments, this kind of methodology can only lead to chaos.
And don't even get me started on the parallel excuses of "Your specifications should be more-perfect" (perfection is impossible), or "An expansive testsuite should catch and correct all new bugs" (also impossible. testing is only as good as the imperfect specification, and then layers in its own finite capabilities to boot).
I never tried spec driven development for myself, but if I review other's MRs I am typically exhausted after the first 10 lines.
And there are hundreds of lines, nearly always with major inaccuracies.
For myself I always found the plan mode to work well. Once the implementation is done, the code is the source of truth. If it works, it works.
When I want to add more functionality or change it, I just tell the agent what I want changed.
I doubt walls of semi-accurate existing specs are going to be beneficial there, but maybe my work differs from yours.
I mitigate this by few things: 1. Checkpoints every few days to thoroughly review and flag issues. Asking the LLM to impersonate (Linus Torvalds is my favorite) yields different results. 2. Frequent refactors. LLMs don't get discouraged from throwing things out like humans do. So I ask for a refactor when enough stuff accumulates. 3. Use verbose, typed languages. C# on the backend, TypeScript on the frontend.
Does it produce quality code? Locally yes, architecturally I don't know - it works so far, I guess. Anyway, my alternative is not to make this software I'm writing better but not making it at all for the lack of time, so even if it's subpar it still brings business value.
I suppose you could solve that in two ways. Manually rewrite it as you did. Or formalize an architecture and let the AI rewrite it with that in mind. I suspect that either works.
the power comes from creating the machine you can steer. Treat AI like an over eager college intern who you need to hand hold, but do tasks.
Iron Man created Jarvis whose capabilities are way beyond any models in the near future. So it wasn’t an Iron Man moment.
(And on a personal note, I'm glad we don't have a publicly released Jarvis before we get our act together about the use.)