Hacker News

176 points by swah 6 days ago | 362 comments

After re-reading the post once again, because I honestly thought I was missing something obvious that would make the whole thing make sense, I started to wonder if the author actually understands the scope of a computer language. When he says:

> LLMs are far more nondeterministic than previous higher level languages. They also can help you figure out things at the high level (descriptions) in a way that no previous layer could help you dealing with itself. […] What about quality and understandability? If instead of a big stack, we use a good substrate, the line count of the LLM output will be much less, and more understandable. If this is the case, we can vastly increase the quality and performance of the systems we build.

How does this even work? There is no universe I can imagine where a natural language can be universal, self descriptive, non ambiguous, and have a smaller footprint than any purpose specific language that came before it.

jmalicki 32 minutes ago

In the same way in Rust you can download a package with Cargo and use it without reimplementing it, an LLM can download and explore all written human knowledge to produce a solution.

Or how you can efficiently loop over all combinations of all inputs in a short computer program, it will just take awhile!

If you have a programming language where finding an efficient algorithm is a compiler optimization, then your programs can get a lot shorter.

onlyrealcuzzo 8 hours ago

You're going to pretty hard pressed to do Rust better than Rust.

There's minimal opportunity with lifetime annotations. I'm sure very small options elsewhere, too.

The idea of replacing Rust with natural language seems insane. Maybe I'm being naive, but I can't see why or how it could possibly be useful.

Rust is simply Chinese unless you understand what it's doing. If you translate it to natural language, it's still gibberish, unless you understand what it does and why first. In which case, the syntax is nearly infinitely more expressive than natural language.

That's literally the point of the language, and it wasn't built by morons!

manuelabeledo 8 hours ago

I believe the author thinks of this problem in terms of “the LLM will figure it out”, i.e. it will be trained on enough code that compiles, that the LLM just needs to put the functional blocks together.

Which might work to a degree with languages like JavaScript.

onlyrealcuzzo 6 hours ago

That point makes no sense.

If the LLM is not perfect at scale - extraordinarily unlikely that it would be - then it becomes relevant to understand the actual language.

That's either natural language that's supposed to somehow be debuggable - or it's a language like Rust - which actually is.

slfnflctd 9 hours ago

To be generous and steelman the author, perhaps what they're saying is that at each layer of abstraction, there may be some new low-hanging fruit.

Whether this is doable through orchestration or through carefully guided HITL by various specialists in their fields - or maybe not at all! - I suspect will depend on which domain you're operating in.

coldtea 7 hours ago

>After re-reading the post once again, because I honestly thought I was missing something obvious that would make the whole thing make sense, I started to wonder if the author actually understands the scope of a computer language.

The problem is you restrict the scope of a computer language to the familiar mechanisms and artifacts (parsers, compilers, formalized syntax, etc), instead of taking to be "something we instruct the computer with, so that it does what we want".

>How does this even work? There is no universe I can imagine where a natural language can be universal, self descriptive, non ambiguous, and have a smaller footprint than any purpose specific language that came before it.

Doesnt matter. Who said it needs to be "universal, self descriptive, non ambiguous, and have a smaller footprint than any purpose specific language that came before it"?

It's enough that is can be used to instruct computers more succintly and at a higher level of abstraction, and that a program will come out at the end, which is more or less (doesn't have to be exact), what we wanted.

manuelabeledo 6 hours ago

If you cannot even provide a clear definition of what you want it to be, then this is all science fiction.

coldtea 5 hours ago

Doesn't have to be "a clear definition", a rough defition within some quite lax boundaries is fine.

You can just say to Claude for example "Make me an app that accepts daily weight measurements and plots them in a graph" and it will make one. Tell it to use tha framework or this pattern, and it will do so too. Ask for more features as you go, in similar vague language. At some point your project is done.

Even before AI the vast majority of software is not written with any "clear definition" to begin with, there's some rought architecture and idea, and people code as they go, and often have to clarify or rebuilt things to get them as they want, or discover they want something slightly different or the initial design had some issues and needs changing.

collingreen 4 hours ago

This is the most handwaving per paragraph I've ever seen.

I think a fair summarization of your point is "LLM generated programs work well enough often enough to not need more constraints or validation than natural language", whatever that means.

If you take that as a true thing then sure why would you go deeper (eg, I never look at the compiled bytecode my high level languages produce for this exact reason - I'm extremely confident that translation is right to the point of not thinking about it anymore).

Most people who have built, maintained, and debugged software aren't ready to accept the premise that all of this is just handled well by LLMs at this point. Many many folks have lots of first hand experience watching it not be true, even when people are confidently claiming otherwise.

I think if you want to be convincing in this thread you need to go back one step and explain why the LLM code is "good enough" and how you determined that. Otherwise it's just two sides talking totally past each other.

coldtea 4 hours ago

>This is the most handwaving per paragraph I've ever seen.

Yes: "LLM generated programs work well enough often enough to not need more constraints or validation than natural language" if a fair summarization of my point.

Not sure the purpose of "whatever that means" that you added. It's clear what it means. Thought, casual language seems to be a problem for you. Do you only always discuss in formally verified proofs? If so, that's a you problem, not an us or LLM problem :)

>Most people who have built, maintained, and debugged software aren't ready to accept the premise that all of this is just handled well by LLMs at this point.

I don't know who those "most people are". Most developers already hand those tasks to LLMs, and more will in the future, as it's a market/job pressure.

(I'm not saying it's good or good enough as a quality assessment. In fact, I don't particularly like it. But I am saying it's "good enough" as in, people will deem it good enough to be shipped).

manuelabeledo 2 hours ago

> I don't know who those "most people are". Most developers already hand those tasks to LLMs, and more will in the future, as it's a market/job pressure.

This is definitely not true. Outside of the US, very few devs can afford to pay for the computer and/or services. And in a couple years, I believe, devs in the US will be in for a rude awakening when the current prices skyrocket.

manuelabeledo 2 hours ago

You do need a clear definition of what this “LLM as a high level language” is supposed to be. Otherwise it’s all just wishful thinking.

“It’s good enough” so it generates apps that could otherwise be boilerplate. OK, I guess? But that’s not what OP was talking about in their post.

heikkilevanto 14 hours ago

If we consider the prompts and LLM inputs to be the new source code, I want to see some assurance we get the same results every time. A traditional compiler will produce a program that behaves the same way, given the same source and options. Some even go out of their way to guarantee they produce the same binary output, which is a good thing for security and package management. That is why we don't need to store the compiled binaries in the version control system.

Until LLMS start to get there, we still need to save the source code they produce, and review and verify that it does what it says on the label, and not in a totally stupid way. I think we have a long way to go!

afavour 10 hours ago

> If we consider the prompts and LLM inputs to be the new source code, I want to see some assurance we get the same results every time.

There’s a related issue that gives me deep concern: if LLMs are the new programming languages we don’t even own the compilers. They can be taken from us at any time.

New models come out constantly and over time companies will phase out older ones. These newer models will be better, sure, but their outputs will be different. And who knows what edge cases we’ll run into when being forced to upgrade models?

(and that’s putting aside what an enormous step back it would be to rent a compiler rather than own one for free)

devsda 6 hours ago

> New models come out constantly and over time companies will phase out older ones. These newer models will be better, sure, but their outputs will be different.

IIUC, same model with same seed and other parameters is not guaranteed to produce the same output.

If anyone is imagining a future where your "source" git repo is just a bunch of highly detailed prompt files and "compilation" just needs an extra LLM code generator, they are signing up for disappointment.

carlmr 6 hours ago

>IIUC, same model with same seed and other parameters is not guaranteed to produce the same output.

Models are so large that random bit flips make such guarantees impossible with current computing technology:

https://aclanthology.org/2025.emnlp-main.528.pdf

nerdsniper 7 hours ago

Presumably, open models will work almost, but not quite, as well and you can store those on your local drive and spin them up in rented GPUs.

energy123 13 hours ago

Greedy decoding gives you that guarantee (determinism). But I think you'll find it to be unhelpful. The output will still be wrong the same % of the time (slightly more, in fact) in equally inexplicable ways. What you don't like is the black box unverifiable aspect, which is independent of determinism.

layer8 12 hours ago

What people don’t like is that the input-output relation of LLMs is difficult, if not impossible, to reason about. While determinism isn’t the only factor here (you can have a fully deterministic system that is still unpredictable in practical terms), it is still a factor.

willj 10 hours ago

If you’re using a model from a provider (not one that you’re hosting locally), greedy decoding via temperature = 0 does not guarantee determinism. A temperature of 0 doesn’t result in the same responses every time, in part due to floating-point precision and in part to to lack of batch invariance [1]

[1] https://thinkingmachines.ai/blog/defeating-nondeterminism-in...

trklausss 10 hours ago

The question is: if we keep the same context and model, and the same LLM configuration (quantization etc.), does it provide the same output at same prompt?

If the answer is no, then we cannot be sure to use it as a high-level language. The whole purpose of a language is providing useful, concise constructs to avoid something not being specified (undefined behavior).

If we can't guarantee that the behavior of the language is going to be the same, it is no better than prompting someone some requirements and not checking what they are doing until the date of delivery.

properbrew 11 hours ago

> I want to see some assurance we get the same results every time

Genuine question, but why not set the temperature to 0? I do this for non-code related inference when I want the same response to a prompt each time.

willj 10 hours ago

A temperature of 0 doesn’t result in the same responses every time, in part due to floating-point precision and in part to to lack of batch invariance [1]

[1] https://thinkingmachines.ai/blog/defeating-nondeterminism-in...

properbrew 8 hours ago

Thank you for this, this was a really interesting read about batch invariance, something I didn't even know about.

assbuttbuttass 9 hours ago

This still doesn't help when you update your compiler to use a newer model

pjmlp 14 hours ago

Anyone doing benchmarks with managed runtimes, or serverless, knows it isn't quite true.

Which is exactly one of the AOT only, no GC, crowds use as example why theirs is better.

dgb23 11 hours ago

But there is functional equivalence. While I don't want to downplay the importance of performance, we're talking about something categorically different when comparing LLMs to compilers.

pjmlp 7 hours ago

Not when those LLMs are tied to agents, replacing what would be classical programming.

Using low code platforms with AI based automations, like most iPaaS are now doing.

If the agent is able to retrieve the required data from a JSON file, fill an email with the proper subject and body, sending it to another SaaS application, it is one less integration middleware that was required to be written.

For all practical business point of view it is an application.

zozbot234 13 hours ago

Reproducible builds exist. AOT/JIT and GC are just not very relevant to this issue, not sure why you brought them up.

pjmlp 13 hours ago

Because they are dynamic compilers!

manuelabeledo 13 hours ago

Even those are way more predictable than LLMs, given the same input. But more importantly, LLMs aren’t stateless across executions, which is a huge no-no.

zozbot234 13 hours ago

> But more importantly, LLMs aren’t stateless across executions, which is a huge no-no.

They are, actually. A "fresh chat" with an LLM is non-deterministic but also stateless. Of course agentic workflows add memory, possibly RAG etc. but that memory is stored somewhere in plain English; you can just go and look at it. It may not be stateless but the state is fully known.

manuelabeledo 13 hours ago

Using the managed runtime analogy, what you are saying is that, if I wanted to benchmark LLMs like I would do with runtimes, I would need to take the delta between versions, plus that between whatever memory they may have. I don’t see how that helps with reproducibility.

Perhaps more importantly, how would I quantify such “memory”? In other words, how could I verify that two memory inputs are the same, and how could I formalize the entirety of such inputs with the same outputs?

pjmlp 13 hours ago

Are you certain to predict the JIT generated machine code given the JVM bytecode?

Without taking anything else into account that the JIT uses on its decision tree?

manuelabeledo 11 hours ago

For a single execution, to a certain extent, yes.

But that’s not the point I’m trying to make here. JIT compilers are vastly more predictable than LLMs. I can take any two JVMs from any two vendors, and over several versions and years, I’m confident that they will produce the same outputs given the same inputs, to a certain degree, where the input is not only code but GC, libraries, etc.

I cannot do the same with two versions of the same LLM offering from a single vendor, that had been released one year apart.

pjmlp 7 hours ago

Good luck mapping OpenJDK with Azul's cloud JIT, in generated machine code.

manuelabeledo 6 hours ago

The output being the actual program output, not the byte code. No one is arguing that in the scope of LLMs.

ThunderSizzle 12 hours ago

Enough so that I've never had a runtime issue because the compiler did something odd once, and correct thr next time. At least in c#. If Java is doing that, then stop using it...

If the compiler had an issue like LLMs do, the half my builds would be broken, running the same source.

aurareturn 13 hours ago

> If we consider the prompts and LLM inputs to be the new source code, I want to see some assurance we get the same results every time.

Give a spec to a designer or developer. Do you get the same result every time?

I’m going to guess no. The results can vary wildly depending on the person.

The code generated by LLMs will still be deterministic. What is different is the product team tools to create that product.

At a high level, does using LLMs to do all or most of the coding ultimately help the business?

jug 13 hours ago

This comparison holds up to me only in the long standing debate "LLMs as the new engineer", not "LLMs as a new programming language" (like here).

I think there are important distinctions there, predictably one of them.

intrasight 12 hours ago

Even as a SSWE I do often wonder if I am but a high-level language.

matheus-rr 15 hours ago

The intermediate product argument is the strongest point in this thread. When we went from assembly to C, the debugging experience changed fundamentally. When we went from C to Java, how we thought about memory changed. With LLMs, I'm still debugging the same TypeScript and Python I was before.

The generation step changed. The maintenance step didn't. And most codebases spend 90% of their life in maintenance mode.

The real test of whether prompts become a "language" is whether they become versioned, reviewed artifacts that teams commit to repos. Right now they're closer to Slack messages than source files. Until prompt-to-binary is reliable enough that nobody reads the intermediate code, the analogy doesn't hold.

andai 11 hours ago

>With LLMs, I'm still debugging the same TypeScript and Python I was before.

Aren't you telling Claude/Codex to debug it for you?

pjmlp 14 hours ago

We went from Assembly to Fortran, with several languages in between, until C came to be almost 15 years later.

surajrmal 9 hours ago

Note that a lot of people also still work in C.

BudapestMemora 14 hours ago

"Until prompt-to-binary is reliable enough that nobody reads the intermediate code, the analogy doesn't hold."

1. OK, let's create 100 instances of prompt under the hood, 1-2 will hallucinate, 3-5 will produce something different from 90% of remaining, and it can compile based on 90% of answers

2. computer memory is also not 100% reliable , but we live with it somehow without man-in-the-middle manually check layer?

whoisthemachine 10 hours ago

Computer memory, even cheap consumer grade stuff, has much higher reliability than 90%. Otherwise your computer would be completely unusable!

Lwerewolf 14 hours ago

I wonder what ECC is for. So, unless you're Google and you're having to deal with "mercurial cores"...

Also, sorry, but what did I just actually attempt to read?

zvitiate 13 hours ago

Okay but if you aren’t using RAIM or a TMR system then is he really wrong?

And if you weren’t being snarky I’m sure you could understand. Generate 100 answers. Compare them. You’ll find ~90% the same. Choose that one.

anon946 18 hours ago

Isn't this a little bit of a category error? LLMs are not a language. But prompts to LLMs are written in a language, more or less a natural language such as English. Unfortunately, natural languages are not very precise and full of ambiguity. I suspect that different models would interpret wordings and phrases slightly differently, leading to behaviors in the resulting code that are difficult to predict.

pjmlp 15 hours ago

Not really, because when they are feed into agents, those agents will take over tasks that previously required writing some kinds of classical programming.

I have already watched integrations between SaaS being deployed with agents instead of classical middleware.

elzbardico 15 hours ago

I've seen them too. They are not pretty.

pjmlp 14 hours ago

Like microservices, cloud and whatever new cool tech for deliver something that can be done on a laptop, they aren't going away.

empressplay 17 hours ago

Right, but that's the point -- prompting an LLM still requires 'thinking about thinking' in the Papert sense. While you can talk to it in 'natural language' that natural language still needs to be _precise_ in order to get the exact result that you want. When it fails, you need to refine your language until it doesn't. So prompts = high-level programming.

zekica 17 hours ago

You can't think all the way about refining your prompt for LLMs as they are probabilistic. Your re-prompts are just retrying until you hit a jackpot - refining only works to increase the chance to get what you want.

When making them deterministic (setting the temperature to 0), LLMs (even new ones) get stuck in loops for longer streams of output tokens. The only way to make sure you get the same output twice is to use the same temperature and the same seed for the RNG used, and most frontier models don't have a way for you to set the RNG seed.

red75prime 15 hours ago

Randomness is not a problem by itself. Algorithms in BQP are probabilistic too. Different prompts might have different probabilities of successful generation, so refinement could be possible even for stochastic generation.

And provably correct one-shot program synthesis based on an unrestricted natural language prompt is obviously an oxymoron. So, it's not like we are clearly missing the target here.

fauigerzigerk 14 hours ago

>Different prompts might have different probabilities of successful generation, so refinement could be possible even for stochastic generation.

Yes, but that requires a formal specification of what counts as "success".

In my view, LLM based programming has to become more structured. There has to be a clear distinction between the human written specification and the LLM generated code.

If LLMs are a high level programming language, it has to be clear what the source code is and what the object code is.

red75prime 14 hours ago

I don't think framing LLMs as a "new programming language" is correct. I was addressing the point about randomness.

A natural-language specification is not source code. In most cases it's an underspecified draft that needs refinement.

fulafel 15 hours ago

Programs written in traditional PLs are also often probabilistic. It seems that the same mechanisms could be used to address this in both types (formal methods).

bandrami 15 hours ago

Huh?

What's an example of a probabilistic programming language?

jmalicki 26 minutes ago

This isn't what the parent was talking about, but probabilistic programming languages are totally a thing!

https://en.wikipedia.org/wiki/Probabilistic_programming

fulafel 14 hours ago

Race conditions, effects of memory safety and other integrity bugs, behaviours of distributed systems, etc.

bandrami 13 hours ago

Ah sorry I read your comment wrong. Yes I agree we can and do make probabilistic systems; we've just to date been using deterministic tools to do so.

cubefox 16 hours ago

... yet.

tomaytotomato 2 days ago

I would like to hijack the "high level language" term to mean dopamine hits from using an LLM.

"Generate a Frontend End for me now please so I don't need to think"

LLM starts outputting tokens

Dopamine hit to the brain as I get my reward without having to run npm and figure out what packages to use

Then out of a shadowy alleyway a man in a trenchcoat approaches

"Pssssttt, all the suckers are using that tool, come try some Opus 4.6"

"How much?"

"Oh that'll be $200.... and your muscle memory for running maven commands"

"Shut up and take my money"

----- 5 months later, washed up and disconnected from cloud LLMs ------

"Anyone got any spare tokens I could use?"

cyberax 2 days ago

> and your muscle memory for running maven commands

Here's $1000. Please do that. Don't bother with the LLM.

tomaytotomato 8 hours ago

I'm guessing you like Gradle? :)

allovertheworld 17 hours ago

aka a mind virus

imiric 16 hours ago

I can't tell if your general premise is serious or not, but in case it is: I get zero dopamine hits from using these tools.

My dopamine rush comes from solving a problem, learning something new, producing a particularly elegant and performant piece of code, etc. There's an aspect of hubris involved, to be sure.

Using a tool to produce the end result gives me no such satisfaction. It's akin to outsourcing my work to someone who can do it faster than me. If anything, I get cortisol hits when the tool doesn't follow my directions and produces garbage output, which I have to troubleshoot and fix myself.

jatora 21 hours ago

If you're disconnected from cloud LLM's you've got bigger problems than coding can solve lol

WoodenChair 20 hours ago

The article starts with a philosophically bad analogy in my opinion. C-> Java != Java -> LLM because the intermediate product (the code) changed its form with previous transitions. LLMs still produce the same intermediate product. I expanded on this in a post a couple months back:

https://www.observationalhazard.com/2025/12/c-java-java-llm....

"The intermediate product is the source code itself. The intermediate goal of a software development project is to produce robust maintainable source code. The end product is to produce a binary. New programming languages changed the intermediate product. When a team changed from using assembly, to C, to Java, it drastically changed its intermediate product. That came with new tools built around different language ecosystems and different programming paradigms and philosophies. Which in turn came with new ways of refactoring, thinking about software architecture, and working together.

LLMs don’t do that in the same way. The intermediate product of LLMs is still the Java or C or Rust or Python that came before them. English is not the intermediate product, as much as some may say it is. You don’t go prompt->binary. You still go prompt->source code->changes to source code from hand editing or further prompts->binary. It’s a distinction that matters.

Until LLMs are fully autonomous with virtually no human guidance or oversight, source code in existing languages will continue to be the intermediate product. And that means many of the ways that we work together will continue to be the same (how we architect source code, store and review it, collaborate on it, refactor it, etc.) in a way that it wasn’t with prior transitions. These processes are just supercharged and easier because the LLM is supporting us or doing much of the work for us."

valenterry 20 hours ago

What would you say if someone has a project written in, let's say, PureScript and then they use a Java backend to generate/overwrite and also version control Java code. If they claim that this would be a Java project, you would probably disagree right? Seems to me that LLMs are the same thing, that is, if you also store the prompt and everything else to reproduce the same code generation process. Since LLMs can be made deterministic, I don't see why that wouldn't be possible.

WoodenChair 20 hours ago

PureScript is a programming language. English is not. A better analogy would be what would you say about someone who uses a No Code solution that behind the scenes writes Java. I would say that's a much better analogy. NoCode -> Java is similar to LLM -> Java.

I'm not debating whether LLMs are amazing tools or whether they change programming. Clearly both are true. I'm debating whether people are using accurate analogies.

cortesoft 19 hours ago

> PureScript is a programming language. English is not.

Why can’t English be a programming language? You would absolutely be able to describe a program in English well enough that it would unambiguously be able to instruct a person on the exact program to write. If it can do that, why couldn’t it be used to tell a computer exactly what program to write?

matthewbauer 19 hours ago

I don’t think you can do that. Or at least if you could, it would be an unintelligible version of English that would not seem much different from a programming language.

davnicwil 17 hours ago

I agree with your conclusion but I don't think it'd necessarily be unintelligible. I think you can describe a program unambiguously using everyday natural language, it'd just be tediously inefficient to interpret.

To make it sensible you'd end up standardising the way you say things: words, order, etc and probably add punctuation and formatting conventions to make it easier to read.

By then you're basically just at a verbose programming language, and the last step to an actual programming language is just dropping a few filler words here and there to make it more concise while preserving the meaning.

valenterry 17 hours ago

I think so too.

However I think there is a misunderstanding between being "deterministic" and "unambiguous". Even C is an ambiguous programming language" but it is "deterministic" in that it behaves in the same ambiguous/undefined way under the same conditions.

The same can be achieved with LLMs too. They are "more" ambiguous of course and if someone doesn't want that, then they have to resort to exactly what you just described. But that was not the point that I was making.

davnicwil 3 hours ago

I'm not sure there's any conflict with what you're saying, which I guess is that language can describe instructions which are deterministic while still being ambiguous in certain ways.

My point is just a narrower version of that: where language is completely unambiguous, it is also deterministic where interepreted in some deterministic way. In that sense plain, intelligible english can be a sort of (very verbose) programming language if you just ensure it is unambiguous which is certainly possible.

It may be that this can still be the case if it's partly ambiguous but that doesn't conflict with the narrower case.

I think we're agreed on LLMs in that they introduce non-determinism in the interpretation of even completely unambiguous instructions. So it's all thrown out as the input is only relevant in some probabilistic sense.

cortesoft 17 hours ago

I don't think it would be unintelligible.

It would be very verbose, yes, but not unintelligible.

valenterry 17 hours ago

Why not?

Here's a very simple algorithm: you tell the other person (in English) literally what key they have to press next. So you can easily have them write all the java code you want in a deterministic and reproducible way.

And yes, maybe that doesn't seem much different from a programming language which... is the point no? But it's still natural English.

skydhash 18 hours ago

> Why can’t English be a programming language? You would absolutely be able to describe a program in English well enough that it would unambiguously be able to instruct a person on the exact program to write

Various attempt has been made. We got Cobol, Basic, SQL,… Programming language needs to be formal and English is not that.

geon 17 hours ago

No. Natural language is vague, ambiguous and indirect.

Watch these poor children struggle with writing instructions for making a sandwich:

https://youtu.be/FN2RM-CHkuI

lpnam0201 19 hours ago

English can be ambiguous. Programming languages like C or Java cannot

cortesoft 17 hours ago

English CAN be ambiguous, but it doesn't have to be.

Think about it. Human beings are able to work out ambiguity when it arrises between people with enough time and dedication, and how do they do it? They use English (or another equivalent human language). With enough back and forth, clarifying questions, or enough specificity in the words you choose, you can resolve any ambiguity.

Or, think about it this way. In order for the ambiguity to be a problem, there would have to exist an ambiguity that could not be removed with more English words. Can you think of any example of ambiguous language, where you are unable to describe and eliminate the ambiguity only using English words?

Covenant0028 9 hours ago

Human beings are able to work out the ambiguity because a lot of meaning is carried in shared context, which in turn arises out of cultural grounding. That achieves disambiguation, but only in a limited sense. If humans could perfectly disambiguate, you wouldn't have people having disputes among otherwise loving spouses and friends, arising out of merely misunderstanding what the other person said.

Programming languages are written to eliminate that ambiguity because you don't want your bank server to make a payment because it misinterpreted ambiguous language in the same way that you might misinterpret your spouse's remarks.

Can that ambiguity be resolved with more English words? Maybe. But that would require humans to be perfect communicators, which is not that easy because again, if it were possible, humans would have learnt to first communicate perfectly with the people closest to them.

manuelabeledo 12 hours ago

COBOL was designed under the same principles: a simple, unambiguous English like language that works for computers.

valenterry 17 hours ago

C can absolutely be ambiguous: https://en.wikipedia.org/wiki/Undefined_behavior

lelandbatey 18 hours ago

A determinisitic prompt + seed used to generate an output is interesting as a way to deterministically record entirely how code came about, but it's also not a thing people are actually doing. Right now, everyone is slinging around LLM outputs without any trying to be reproducible; no seed, nothing. What you've described and what the article describe are very different.

valenterry 17 hours ago

Yes, you are right. I was mostly speaking in theoretical terms - currently people don't work like that. And you would also have to use the same trained LLM of course, so using a third party provider probably doesn't give that guarantee.

But it would be possible in theory.

ekropotin 20 hours ago

IDK how everyone else feel about it, but a non-deterministic “compiler” is the last thing I need.

ChrisGreenHeur 20 hours ago

I may have bad news for you on how compilers typically work.

sarchertech 20 hours ago

The difference is that what most languages compile to is much much more stable than what is produced by running a spec through an LLM.

A language or a library might change the implementation of a sorting algorithm once in a few years. An LLM is likely to do it every time you regenerate the code.

It’s not just a matter of non-determinism either, but about how chaotic LLMs are. Compilers can produce different machine code with slightly different inputs, but it’s nothing compared to how wildly different LLM output is with very small differences in input. Adding a single word to your spec file can cause the final code to be far more unrecognizably different than adding a new line to a C file.

If you are only checking in the spec which is the logical conclusion of “this is the new high level language”, everyone you regenerate your code all of the thousands upon thousands of unspecified implementation details will change.

Oops I didn’t think I needed to specify what going to happen when a user tries to do C before A but after B. Yesterday it didn’t seem to do anything but today it resets their account balance to $0. But after the deployment 5 minutes ago it seems to be fixed.

Sometimes users dragging a box across the screen will see the box disappear behind other boxes. I can’t reproduce it though.

I changed one word in my spec and now there’s an extra 500k LOC to implement a hidden asteroids game on the home page that uses 100% of every visitor’s CPU.

This kind of stuff happens now, but the scale with which it will happen if you actually use LLMs as a high level language is unimaginable. The chaos of all the little unspecified implementation details constantly shifting is just insane to contemplate as user or a maintainer.

acuozzo 16 hours ago

> A language or a library might change the implementation of a sorting algorithm once in a few years.

I think GP was referring to heuristics and PGO.

sarchertech 12 hours ago

That makes sense, but I was addressing more than just potential compiler non-determinism.

hndc 20 hours ago

Deterministic compilation, aka reproducible builds, has been a basic software engineering concept and goal for 40+ years. Perhaps you could provide some examples of compilers that produce non-deterministic output along with your bad news.

mike_hearn 15 hours ago

JIT compilers.

hndc 6 hours ago

Compiler artifact is still deterministic. Clearly not referring to runtime behavior that is input-dependent

booleandilemma 14 hours ago

Account created 11 months ago. They're probably just some slop artist with too much confidence. They probably don't even know what a compiler is.

ChrisGreenHeur 8 hours ago

He is a software engineer with a comp.sci masters degree with about 15 years industry experience with primarily C++. Currently employed at a company that you most likely know the name of.

jcranmer 20 hours ago

Compilers aim to be fully deterministic. The biggest source of nondeterminism when building software isn't the compiler itself, but build systems invoking the compiler nondeterministically (because iterating the files in a directory isn't necessarily deterministic across different machines).

csmantle 19 hours ago

If you are referring to timestamps, buildids, comptime environments, hardwired heuristics for optimization, or even bugs in compilers -- those are not the same kind of non-determinism as in LLMs. The former ones can be mitigated by long-standing practices of reproducible builds, while the latter is intrinsic to LLMs if they are meant to be more useful than a voice recorder.

rezonant 19 hours ago

You'll need to share with the class because compilers are pretty damn deterministic.

pjmlp 13 hours ago

Not if they are dynamic compilers.

Two runs of the same programme can produce different machine code from the JIT compiler, unless everything in the universe that happened in first execution run, gets replicated during the second execution.

sarchertech 11 hours ago

That’s 100% correct, but importantly JIT compilers are built with the goal of outputting semantically equivalent instructions.

And the vast, vast majority of the time, adding a new line to the source code will not result in an unrecognizably different output.

With an LLM changing one word can and frequently does cause the out to be so 100% different. Literally no lines are the same in a diff. That’s such a vastly different scope of problem that comparing them is pointless.

pjmlp 7 hours ago

No, but will certainly result in a complete different sequence of machine code instructions, or not, depending on what that line actually does, what dynamic types it uses, how often it actually gets executed, the existence of vector units, and so forth.

Likewise, as long as the agent delivers the same outcome, e.g. an email is sent with a specific subject and body, the observed behaviour remains.

sarchertech 2 hours ago

The reason this works for compilers is because machine code is so low level that it’s possible to easily prove semantic equivalence between 2 different sets of instructions.

That is not true for an English language prompt like “send and email with this specific subject and body”. There are so many implicit decisions that have to be made in that statement that will be different every time you regenerate the code.

English language specs will always have this ambiguity.

whoisthemachine 10 hours ago

Do these compilers sometimes give correct instructions and sometimes incorrect instructions for the same higher level code, and it's considered an intrinsic part of the compiler that you just have to deal with? Because otherwise this argument is bunk.

pjmlp 7 hours ago

Possibly, hence why the discussion regarding security in JavaScript runtimes and JIT, by completely disabling JIT execution.

https://microsoftedge.github.io/edgevr/posts/Super-Duper-Sec...

Also the exact sequence of generated machine instructions depends of various factors, the same source can have various outputs, depending on code execution, preset hardware, and heuristics.

ChrisGreenHeur 8 hours ago

they in fact do have bugs, yes, inescapably so (no one provides formal proofs for production level compilers)

fragmede 16 hours ago

Only mostly, and only relatively recently. The first compiler is generally attributed to Grace Hopper in 1952. 2013 is when Debian kicked off their program to do bit-for-bit reproducible builds. Thirteen years later, Nixos can maybe produce bit-for-bit identical builds if you treat her really well. We don't look into the details because it just works and we trust it to work, but because computers are all distributed systems these days, getting a bit-for-bit identical build out of the compiler is actually freaking hard. We just trust them to work well enough (and they do), but they've had three fourths of a century to get there.

leptons 19 hours ago

Compilers are about 10 orders of magnitude more deterministic than LLMs, if not more.

misiek08 16 hours ago

Currently it’s about closing that gap.

And 10 orders is optimistic value - LLMs are random with some probability of solving the real problem (and I think of real systems, not a PoC landing page or 2-3 models CRUD) now. Every month they are now getting visibly better of course.

The „old” world may output different assembly or bytecode everytime, but running it will result in same outputs - maybe slower, maybe faster. LLMs now for same prompt can generate working or non-working or - faking solution.

As always - what a time to be alive!

JackSlateur 10 hours ago

Reproductible builds are a thing (that are used in many many places)

r0b05 20 hours ago

Elaborate please

Applejinx 15 hours ago

I love the 'I may have' :)

ChrisGreenHeur 8 hours ago

made some people angry at me :)

pjmlp 13 hours ago

I use them everywhere since the late 1990's, it is called managed runtime.

discreteevent 13 hours ago

That is a completely different category. I've never experienced a logic error due to a managed runtime and only once or twice ever due to a C++ compiler.

pjmlp 13 hours ago

I certainly already experienced crashes due to JIT miscompilations, even though it was a while back, on Websphere with IBM Java implementation.

Also it is almost impossible to guarantee two runs of an application will trigger the same machine code output, unless the JIT is either very dumb on its heuristics and PGO analysis, or one got lucky enough to reproduce the same computation environment.

cesarb 11 hours ago

> Also it is almost impossible to guarantee two runs of an application will trigger the same machine code output

As long as the JIT is working properly, it shouldn't matter: the code should always run "as if" it was being run on an interpreter. That is, the JIT is nothing more than a speed optimization; even if you disable the JIT, the result should still be the same.

pjmlp 7 hours ago

Same with agent actions.

rzmmm 11 hours ago

I think it's technically possible to achieve determinism with LLM output. The LLM makers typically make them non-deterministic by default but it's not inherent to them.

booleandilemma 14 hours ago

Well I've been seeing on HN how everyone else feels about it and I'm terrified.

robrenaud 20 hours ago

A compiler that can turn cash into improved code without round tripping a human is very cool though. As those steps can get longer and succeed more often in more difficult circumstances, what it means to be a software engineer changes a lot.

void-star 19 hours ago

LLMs may occasionally turn bad code into better code but letting them loose on “good” or even “good enough” code is not always likely to make it “better”.

tjr 19 hours ago

What compiler accepts cash as input?

nly 2 days ago

I have a source file of a few hundred lines implementing an algorithm that no LLM I've tried (and I've tried them all) is able to replicate, or even suggest, when prompted with the problem. Even with many follow up prompts and hints.

The implementations that come out are buggy or just plain broken

The problem is a relatively simple one, and the algorithm uses a few clever tricks. The implementation is subtle...but nonetheless it exists in both open and closed source projects.

LLMs can replace a lot of CRUD apps and skeleton code, tooling, scripting, infra setup etc, but when it comes to the hard stuff they still suck.

Give me a whiteboard and a fellow engineer anyday

kranner 20 hours ago

I'm seeing the same thing with my own little app that implements several new heuristics for functionality and optimisation over a classic algorithm in this domain. I came up with the improvements by implementing the older algorithm and just... being a human and spending time with the problem.

The improvements become evident from the nature of the problem in the physical world. I can see why a purely text-based intelligence could not have derived them from the specs, and I haven't been able to coax them out of LLMs with any amount of prodding and persuasion. They reason about the problem in some abstract space detached from reality; they're brilliant savants in that sense, but you can't teach a blind person what the colour red feels like to see.

chasd00 19 hours ago

Well I think that’s kind of the point or value in these tools. Let the AI do the tedious stuff saving your energy for the hard stuff. At least that’s how I use them, just save me from all the typing and tedium. I’d rather describe something like auth0 integration to an LLM than do it all myself. Same goes for like the typical list of records, clock one, view the details and then a list of related records and all the operations that go with that. Like it’s so boring let the LLM do that stuff for you.

prxm 21 hours ago

This is one of my favourite activites with LLMs as well. After implementing some sort of idea for an algorithm, I try seeing what an LLM would come up with. I hint it as well and push it in the correct direction with many iterations but never tell the most ideal one. And as a matter of fact they can never reach the quality I did with my initial implementation.

dgb23 10 hours ago

> but when it comes to the hard stuff they still suck.

Also much of the really annoying, time consuming stuff, like frontend code. Writing UIs is not rocket science, but hard in a bad way and LLMs are not helping much there.

Plus, while they are _very_ good at finding common issues and gotchas quickly that are documented online (say you use some kind of library that you're not familiar with in a slightly wrong way, or you have a version conflict that causes an issue), they are near useless when debugging slightly deeper issues and just waste a ton of time.

simianwords 17 hours ago

There's very low chance this is possible. If you can share the problem, I'm 90% sure an LLM can come up with a non buggy implementation.

Its easy to claim this and just walk away. But better for overall discussion to provide the example.

jatora 21 hours ago

i bet i could replicate it if you showed me the source file

raincole 19 hours ago

[flagged]

DavidPiper 18 hours ago

One of the reasons we have programming languages is they allow us to express fluently the specificity required to instruct a machine.

For very large projects, are we sure that English (or other natural languages) are actually a better/faster/cheaper way to express what we want to build? Even if we could guarantee fully-deterministic "compilation", would the specificity required not balloon the (e.g.) English out to well beyond what (e.g.) Java might need?

Writing code will become writing books? Still thinking through this, but I can't help but feel natural languages are still poorly suited and slower, especially for novel creations that don't have a well-understood (or "linguistically-abstracted") prior.

kristjansson 17 hours ago

Perhaps we'll go the way of the Space Shuttle? One group writes a highly-structured, highly-granular, branch-by-branch 2500 page spec, and another group (LLM) writes 25000 lines of code, then the first group congratulates itself on on producing good software without have to write code?

toprerules 2 days ago

After working with the latest models I think these "it's just another tool" or "another layer of abstraction" or "I'm just building at a different level" kind of arguments are wishful thinking. You're not going to be a designer writing blueprints for a series of workers to execute on, you're barely going to be a product manager translating business requirements into a technical specification before AI closes that gap as well. I'm very convinced non-technical people will be able to use these tools, because what I'm seeing is that all of the skills that my training and years of experience have helped me hone are now implemented by these tools to the level that I know most businesses would be satisfied by.

The irony is that I haven't seen AI have nearly as large of an impact anywhere else. We truly have automated ourselves out of work, people are just catching up with that fact and the people that just wanted to make money from software can now finally stop pretending that "passion" for "the craft" was every really part of their motivating calculus.

asa400 2 days ago

If all you (not you specifically, more of a royal “you” or “we”) are is a collection of skills centered around putting code into an editor and opening pull requests as fast as possible, then sure, you might be cooked.

But if your job depends on taste, design, intuition, sociability, judgement, coaching, inspiring, explaining, or empathy in the context of using technology to solve human problems, you’ll be fine. The premium for these skills is going _way_ up.

toprerules 2 days ago

The question isn't whether businesses will have 0 human element to them, the question is does AI offer a big enough gap that technical skills are still required such that technical roles are still hired for. Someone in product can have all of those skills without a computer science degree, with no design experience, and AI will do the technical work at the level of design, implementation, and maintenance. What I am seeing with the new models isn't just writing code, it's taking fundamental problems as input and design wholistic software solutions as output - and the quality is there.

apical_dendrite 21 hours ago

I am only seeing that if the person writing the prompts knows what a quality solution looks like at a technical level and is reviewing the output as they go. Otherwise you end up with an absolute mess that may work at least for "happy path" cases but completely breaks down as the product needs change. I've described a case of this in some detail in another comment.

Kerrick 17 hours ago

> the person writing the prompts knows what a quality solution looks like at a technical level and is reviewing the output as they go

That is exactly what I recommend, and it works like a charm. The person also has to have realistic expectations for the LLM, and be willing to work with a simulacrum that never learns (as frustrating as it seems at first glance).

falloutx 9 hours ago

When your title is software engineer, good luck convincing the layoff machine about your taste, design, intuition, sociability, judgement, coaching, inspiring, explaining, or empathy in the context of using technology to solve human problems.

jatora 21 hours ago

Ah the age old 'but humans have heart, and no machine can replicate that' argument. Good luck!

asa400 20 hours ago

The process of delivering useful, working software for nontrivial problems cannot be reduced to simply emitting machine instructions as text.

jatora 20 hours ago

Yes, so you need some development and SysOps skills (for now), not all of that other nonsense you mentioned.

idiotsecant 20 hours ago

It turns out that corporations value these things right up until a cheaper almost as good alternative is available.

The writing is on the wall for all white collar work. Not this year or next, but it's coming.

sarchertech 19 hours ago

If all white collar work goes, we’re going to have to completely restructure the economy or collapse completely.

Being a plumber won’t save you when half the work force is unemployed.

hackyhacky 2 days ago

> The irony is that I haven't seen AI have nearly as large of an impact anywhere else.

We are in this pickle because programmers are good at making tools that help programmers. Programming is the tip of the spear, as far as AI's impact goes, but there's more to come.

Why pay an expensive architect to design your new office building, when AI will do it for peanuts? Why pay an expensive lawyer to review your contract? Why pay a doctor, etc.

Short term, doing for lawyers, architects, civil engineers, doctors, etc what Claude Code has done for programmers is a winning business strategy. Long term, gaining expertise in any field of intellectual labor is setting yourself up to be replaced.

heavyset_go 12 hours ago

> Why pay an expensive architect to design your new office building, when AI will do it for peanuts? Why pay an expensive lawyer to review your contract? Why pay a doctor, etc.

All of those jobs are mandated by law to done by accredited and liable humans.

hackyhacky 7 hours ago

> All of those jobs are mandated by law to done by accredited and liable humans.

Good point. The jobs I listed will be protected for a little while due to statutory limitations. At first, firms will have one AI-augmented lawyer take on the work of a dozen lawyers. Of course his salary won't increase, and the others will be fired. Eventually, he'll just be rubber-stamping the AI's results, purely for the sake of compliance. Then the ruling class will petition the legislature to change the law in the name of "efficiency," and that will be the end of that.

Meanwhile, programmers have no such protection. Nor do customer service agents, secretaries, publishers, copywriters, banker, office managers. There is no safety net.

mike_hearn 15 hours ago

> Why pay an expensive architect to design your new office building, when AI will do it for peanuts?

Will it? AI is getting good at some parts of programming because of RLVR. You can test architectural designs automatically to some extent but not entirely, because people tend to want unique buildings that stand out (if it weren't the case architects would have already become a niche profession due to everyone using prefabs all the time). At some point an architectural design has to be built and you can't currently simulate real building sites at high speed inside a datacenter. This use case feels marginal.

There's going to be a lot of cases like this. The safe jobs are ones where there's little training data available online, the job has a large component of unarticulated experience or intuition, and where you can't verify purely in software whether the work artifact is correct or not.

hackyhacky 7 hours ago

> people tend to want unique buildings that stand out

Just tell the LLM that you want a unique design. I've found LLMs to respond well to requests for "originality," at least in poetry, prose, and coding. No reason that can't do that in architecture as well.

> At some point an architectural design has to be built and you can't currently simulate real building sites at high speed inside a datacenter.

First of all, you can simulate a building site, or any physical environment. We've been doing that for years, even in games. AI companies are working towards a "world model" for precisely that reason. Second of all, even without a physical simulation, the laws of physics are deterministic and easy for an LLM to understand.

> The safe jobs are ones where there's little training data available online,

These cases are "safe" only in relative terms. Lack of easily-available training data is friction but not insurmountable. AI companies have bet big and they have a strong incentive to find and use appropriate training data.

shahbaby 2 days ago

> what I'm seeing is that all of the skills that my training and years of experience have helped me hone are now implemented by these tools to the level that I know most businesses would be satisfied by.

So when things break or they have to make changes, and the AI gets lost down a rabbit hole, who is held accountable?

toprerules 2 days ago

The answer is the AI. It's already handling complex issues and debugging solely by gathering its own context, doing major refactors successfully, and doing feature design work. The people that will be held responsible will be the product owners, but it won't be for bugs, it will be for business impact.

My point is that SWEs are living on a prayer that AI will be perched on a knifes edge where there is still be some amount of technical work to make our profession sustainable and from what I'm seeing that's not going to be the case. It won't happen overnight, but I doubt my kids will ever even think about a computer science degree or doing what I did for work.

Quothling 24 hours ago

I work in the green energy industry and we see it a lot now. Two years ago the business would've had to either buy a bunch of bad "standard" systems which didn't really fit, or wait for their challengs to be prioritised enough for some of our programmers. Today 80-90% of the software which is produced in our organisation isn't even seen by our programmers. It's build by LLM's in the hands of various technically inclined employees who make it work. Sometimes some of it scales up a bit that our programmers get involved, but for the most part, the quality matters very little. Sure I could write software that does the same faster and with much less compute, but when the compute is $5 a year I'd have to write it rather fast to make up for the cost of my time.

I make it sound like I agree with you, and I do to an extend. Hell, I'd want my kids to be plumbers or similar where I would've wanted them to go to an university a couple of years ago. With that said. I still haven't seen anything from AI's to convince me that you don't need computer science. To put it bluntly, you don't need software engineering to write software, until you do. A lot of the AI produced software doesn't scale, and none of our agents have been remotely capable of making quality and secure code even in the hands of experienced programmers. We've not seen any form of changes over the past two years either.

Of course this doesn't mean you're wrong either. Because we're going to need a lot less programmers regardless. We need the people who know how computers work, but in my country that is a fraction of the total IT worker pool available. In many CS educations they're not even taught how a CPU or memory functions. They are instead taught design patterns, OOP and clean architecture. Which are great when humans are maintaining code, but even small abstractions will cause l1-3 cache failures. Which doesn't matter, until it does.

mjr00 2 days ago

And what happens when the AI can't figure it out?

toprerules 24 hours ago

Same situation as when an engineer can't figure something out, they translate the problem into human terms for a product person, and the product person makes a high level decision that allows working around the problem.

mjr00 24 hours ago

Uh that's not what engineers do; do you not have any software development experience, or rather any outside of vibe coding? That would explain your perspective. (for context I am 15+ yr experience former FAANG dev)

I don't meant this to sound inflammatory or anything; it's just that the idea that when a developer encounters a difficult bug they would go ask for help from the product manager of all people is so incredibly outlandish and unrealistic, I can't imagine anyone would think this would happen unless they've never actually worked as a developer.

skeptic_ai 18 hours ago

As a product owner I ask you to make a button that when I click auto installs an extension without user confirmation.

toprerules 24 hours ago

Staff engineer (also at FAANG), so yes, I have at least comparable experience. I'm not trying to summarize every level of SWE in a few sentences. The point is that AI's infallibility is no different than human infallibility. You may fire a human for a mistake, but it won't solve the business problems they may have created, so I believe the accountability argument is bogus. You can hold the next layer up accountable. The new models are startling good at direction setting, technical to product translation, and providing leadership guidance on technical matters and providing multiple routes for roadblocks.

We're starting to see engineers running into bugs and roadblocks feed input into AI and not only root causing the problem, but suggesting and implementing the fix and taking it into review.

mjr00 24 hours ago

Surely at some point in your career as a SWE at FAANG you had to "dive deep" as they say and learn something that wasn't part of your "training data" to solve a problem?

toprerules 23 hours ago

I would have said the same thing a year or two ago, but AI is capable of doing deep dives. It can selectively clone and read dependencies outside of its data set. It can use tool calls to read documentation. It can log into machines and insert probes. It may not be better than everyone, but it's good enough and continuing to improve such that I believe subject matter expertise counts for much less.

mjr00 23 hours ago

I'm not saying that AI can't figure out how to handle bugs (it absolutely can; in fact even a decade ago at AWS there was primitive "AI" that essentially mapped failure codes to a known issues list, and it would not take much to allow an agent to perform some automation). I'm saying there will be situations the AI can't handle, and it's really absurd that you think a product owner will be able to solve deeply technical issues.

You can't product manage away something like "there's an undocumented bug in MariaDB which causes database corruption with spatial indexes" or "there's a regression in jemalloc which is causing Tomcat to memory leak when we upgrade to java 8". Both of which are real things I had to dive deep and discover in my career.

anonymars 19 hours ago

There are definitely issues in human software engineering which reach some combination of the following end states:

1. The team is unable to figure it out

2. The team is able to figure it out but a responsible third-party dependency is unable to fix it

3. The team throws in the towel and works around the issue

At the end of the day it always comes down to money: how much more money do we throw at trying to diagnose or fix this versus working around or living with it? And is that determination not exactly the role of a product manager?

I don't see why this would ipso facto be different with AI

For clarity I come at this with a superposition of skepticism at AI's ultimate capabilities along with recognition of the sometimes frightening depth encountered in those capabilities and speed with which they are advancing

I suppose the net result would be a skepticism of any confident predictions of where this all ends up

mjr00 19 hours ago

> I don't see why this would ipso facto be different with AI

Because humans can learn information they currently do not have, AI cannot?

anonymars 9 hours ago

But does that change the end result? Finding a compiler or SQL bug doesn't mean you yourself can learn enough to fix it. I don't see any reason why AI would be inherently incapable of also concluding that there's a bug in an underlying layer beyond its ability to fix

That doesn't mean AI can do or replace everything but what fraction of software engineering work requires that final frontier?

randallsquared 11 hours ago

They can, by putting what they just "learned" into the context window. Claude Code does this without (my) prompting from time to time, adding to its CLAUDE.md things that it has learned about the project or my preferences. Currently this is limited to literally writing it down, but as context windows grow and models continue training on their own usage, it's not clear to me how that will significantly differ from an ability to "learn information they currently do not have".

raincole 20 hours ago

> translating business requirements into a technical specification

a.k.a. Being a programmer.

> The irony is that I haven't seen AI have nearly as large of an impact anywhere else.

What lol. Translation? Graphic design?

coffeebeqn 13 hours ago

Writing? Education?

anonnon 15 hours ago

> After working with the latest models I think these "it's just another tool" or "another layer of abstraction" or "I'm just building at a different level" kind of arguments are wishful thinking. You're not going to be a designer writing blueprints for a series of workers to execute on, you're barely going to be a product manager translating business requirements into a technical specification before AI closes that gap as well

I think it's doubtful you'll be even that; certainly not with the salary and status that normally entails.

> I'm very convinced non-technical people will be able to use these tools

This suggests that the skill ceiling of "Vibe Coding" is actually quite low, calling into question the sense of urgency with which certain AI influnecers present it, as if it were a skill that you need to invest major time & effort to hone now (with their help, of course), lest you get left behind and have to "catch up" later. Yet one could easily see it being akin to Googling, which was also a skill (when Google was usable), one that did indeed increase your efficiency and employable, but with a low ceiling, such that "Googler" was never a job by itself, the way some suggest "prompt engineer" will be. The Google analogy is apt, in that you're typing keywords into a blackbox until it spits out what you want; quite akin to how people describe "prompt engineering."

Also the Vibe Coding skillset--a bag of tricks and book of incantations you're told can cajole the model--has a high churn rate. Once, narrow context windows meant restarting a session de novo was advisable if you hit a roadblock, but now it's usually the opposite.

If this all true, then wouldn't the correct takeaway, rather than embracing and mastering "Vibe Coding" (as influencers suggest), be to "pivot" to a new career, like welding?

> The irony is that I haven't seen AI have nearly as large of an impact anywhere else. We truly have automated ourselves out of work, people are just catching up with that fact

What's funny is artists immediately, correctly perceived the threat of AI. You didn't see cope about it being "just another tool, like Photoshop."

redox99 6 hours ago

Gen AI for art was different because it would just output a final image with basically 0 control for the artist. It's like if AI programming would output a binary instead of source code.

eohsafya 2 days ago

[dead]

smohare 2 days ago

[dead]

voxleone 10 hours ago

One thing I think the “LLM as new high-level language” framing misses is the role of structure and discipline. LLMs are great at filling in patterns, but they struggle with ambiguity, the exact thing we tolerate in human languages.

A practical way to get better results is to stop prompting with prose and start providing explicit models of what we want. In that sense, UML-like notations can act as a bridge between human intent and machine output. Instead of:

“Write a function to do X…”

we give:

“Here’s a class diagram + state machine; generate safe C/C++/Rust code that implements it.”

UML is already a formal, standardized DSL for software structure. LLMs have no trouble consuming textual forms (PlantUML, Mermaid, etc.) and generating disciplined code from them. The value isn’t diagrams for humans but constraining the model’s degrees of freedom.

FuckButtons 7 hours ago

Have you tried this? How did it go?

frigg 12 hours ago

>Following this hypothesis, what C did to assembler, what Java did to C, what Javascript/Python/Perl did to Java, now LLM agents are doing to all programming languages.

What did Javascript/Python do to Java? They are not interchangeable nor comparable. I don't think Federico's opinion is worth reading further.

randallsquared 11 hours ago

Three of Java's top application categories are webapps, banking/financial services, and big data. Node and Pyspark have displaced quite a lot of that.

frigg 10 hours ago

Most serious banking apps and financial services are still written in Java, it hasn't displaced much of anything. Big data is a relatively 'new' fad that is already becoming less and less relevant.

niels8472 12 hours ago

Plus Python and Perl predate Java.

notepad0x90 3 hours ago

The problem is you still have to type prompts. That might require less word-count, but you still have to type it up, and it won't be short. For a small code base, your llm code might be a couple of pages, but for a complex code base it might be the size of medium-length novel.

In the end, you have text typed by humans, that is lengthy. and it might contain errors in logic, contradictions, unforeseen issues in the instructions. And the same processes and tooling used for syntactic code might need to apply to it. You will need to version control your prompts for example.

LLMs solve the labor problem, not the management problem. You have to spend a lot of time and effort with pages and pages of LLM prompts, trying to figure out which part of the prompt is generating which part of your code base. LLMs can debug and troubleshoot, but they can't debug and troubleshoot your prompts for you. I doubt they can take their own output, generated by multiple agents and lots of sessions and trace it all back to what text in your prompt caused all the mess either.

On one hand, I want to see what this experimentation will yield, on the other hand, it had better not create a whole suite of other problems to solve just to use it.

My confusion really is when experienced programmers advocate for this stuff. Actually typing in the code isn't very hard. I like the LLM-assistance aspect of figuring out what to actually code, and do some research. But actually figure out what code to type in, sure LLMs save time, but not that much time. getting it to work, debugging, troubleshooting, maintaining, those tend to be the pain-points.

Perhaps there are shops out there that just crank out lots of LoC, and even measure developer performance based on LoC? I can see where this might be useful.

I do think LLM-friendly high-level languages need to evolve for sure. But the ideal workflow is always going to be a co-pilot type of workflow. Humans researching and guiding the AI.

Psychologically, until AI can maintain it's own code, this is a really bad idea. Actually typing out the code is extremely important for humans to be able to understand it. Or if someone wrote the code, you have to write something that is part of that code base and figure out how things fit together, AI can't do that for you, if you're still maintaining the codebase in any capacity.

gloosx 10 hours ago

A novice prefers declarative control, an expert prefers procedural control

Beginner programmers want: "make this feature"

Experienced devs want: control over memory, data flow, timing, failure modes

That is why abstractions feel magical at first and suffocating later which sparks this whole debate.

PunchyHamster 17 hours ago

At this point I'm just waiting for people claiming they managed team of 20 people, where "20 people" were LLMs being fed a prompt

pavlov 17 hours ago

This claim has been going around for a while.

I know someone who has been spending $7k / month on Cursor tokens for the past six months, managing such a team of agents… But curiously the results seem to be endless PDF-ware, and every month there’s a new reason why the project is not yet quite ready to be used on real data.

LLMs are very good at making you think they’re giving you what you hoped to get.

PunchyHamster 4 hours ago

There are clearly uses where it can be very useful, if you just want to prototype some solutions having AI-throwaway code that you don't understand is entirely fine

But if you commit that and turn it into production project... that's basically starting with massive tech debt from the get go, and you can't do "just let AI write it from scratch again" trick too many times while growing features

noduerme 16 hours ago

They're also very good at demoralizing people who actually code for a living. Or at least the hype surrounding them is. Up until a couple years ago I'd simply dive into a new coding assignment and be excited to solve all the problems. Some new widget? Maybe I could find a way to make the UX flow better, or add a couple neat little transitions, or heck even improve the mechanics or the business logic. I now find myself looking at assignments and thinking: Is this a waste of my time? I know how I'd do it, but I'm prone to a lot of yak-shaving and perfectionism. Should I just ask Claude to do it?

So alright, let's see what Claude does.

And then I get presented with a piece of code and a product that I wouldn't have chosen - but in some cases would be perfectly sufficient for minor work I shouldn't be yak-shaving. On the other hand, going beyond the simplest or most obvious is how I built my business. Bringing ideas to the table and executing them. So then I discard the Claude code and sit down to write it from scratch.

It is at that exact point that I start to wonder: does anyone even care?

People who take the time to think deeply through their work, and who "hammer in the extra nail" as my grandpa used to say (he was a general contractor) do so most often for their own sense of pride in a job well done, and for the trust that comes from their clients or employers knowing that they will go the extra mile and do a job well. But what happens when employers or clients don't treat the work as important - and when they would be okay with a bad or mediocre version on the cheap? That's not a new problem - usually I just won't work for those people.

But I just hate the pessimistic feeling I get about doing what I always loved to do - writing bespoke code - when I constantly keep asking myself do these vibe coders know something more than I do? Should I try yet again on that route? Worse, am I just wasting my time perfecting something that no one else will appreciate?

To anyone outside engineering, writing code looks like wizardry. I think the most common and most demoralizing outcome of LLM vibe coding has been to incorrectly make them think it's suddenly easier. And a new crop of vibe coders who think they don't have to think for themselves? They're not engineers. They may even be enemies of good engineering. They have more in common with the bosses and clients we've always had who said "hey, this should be a very easy request, can you just add a button to the customer app that will (fill in the blank with some wildly complicated business transaction that definitely can't be handled in one click).

All of it is sapping my motivation to do what I was good at, which is, solving problems and writing well tested code.

Sorry for the rant.

JackSlateur 9 hours ago

This is correct, we see (and have seen before the AI slops) code those quality is not really important

At job, we have a lot of internal web apps coded like hell : "people just have to reload the page ..": this is lame

On the other hand, on many situations, when people are faced with something that works, that is well designed and well executed, it's a breath of fresh air and they do notice and they do note that "that team is not like the others" (even if they cannot always pinpoint the "why")

High quality work also leads to higher velocity, low/no regressions etc

Management does notice, product owner do notice

pavlov 15 hours ago

This happened to painters and illustrators a few years earlier than programmers.

People used to think detailed drawing and realistic painting were kinds of unachievable wizardry. Now they go on ChatGPT or Midjourney and press a button. Professionals despair at the slop results, but they’re good enough for most people.

I try to think of this positively as a transition similar to the invention of photography. The bourgeois classes used to need artists to paint pictures of themselves and their daughters. The camera turned portrait creation into a one-button affair. But that actually freed artists to be more creative. Monet and Picasso happened after photography for an obvious reason.

heavyset_go 12 hours ago

> People used to think detailed drawing and realistic painting were kinds of unachievable wizardry. Now they go on ChatGPT or Midjourney and press a button. Professionals despair at the slop results, but they’re good enough for most people.

Anyone who would actually pay for a portrait would not hang Midjourney or ChatGPT output on their walls.

rl3 16 hours ago

It's not a complete agentic setup until your chain of command goes at least nine levels deep.*

I want my C-suite and V-suite LLMs to feel like they earned their positions through hard work, values, and commitment to their company.

* = (Not to be confused with a famous poem by Dante Alighieri)

Izkata 16 hours ago

Well, I found this a bit ago through an offhand mention on reddit: https://github.com/bmad-code-org/BMAD-METHOD

> Specialized Agents: 12+ domain experts (PM, Architect, Developer, UX, Scrum Master, and more)

DaedalusII 16 hours ago

The biggest and least controversial thing will be when anthropic create a onedrive/googledrive integration that lets white collar employees create, edit, and export word documents into pdfs, referring to other files in the folder. This alone will increase average white paper employee productivity by 100x and lead to the most job displacement.

For instance: Here is an email from my manager at 1pm today. Open the policy document he is referring to, create a new version, and add the changes he wants. refer to the entire codebase (our company onedrive/google drive/dropbox whatever) to make sure it is contextually correct.

>Sure, here is the document for your review

Great, reply back to manager with attachment linked to OneDrive

sublinear 15 hours ago

Your example actually perfectly describes why it won't displace anyone.

The user still has to be in the loop to instruct the LLM every time, and the tiniest nuances in each execution of that work still matters. If it didn't we'd have replaced people with bash scripts many decades ago. When we've tried to do that, the maintenance of those scripts became a game of whack-a-mole that never ended and they're eventually abandoned. I think sometimes people forget how ineffective most software is even when written as good as it can be. LLMs don't unlock any new abilities there.

What this actually does is make people more available for meeting time. Productivity doesn't budge at all. :)

In other words, the "busy work" has always been faster to do than the decision making, and if someone has meetings to attend they don't strictly do busy work.

Maybe the more interesting outcome is that with the increased meeting time comes much deeper drinks of the kool-aid, and businesses become more cultish than they already are. That to me sounds far more sinister than kicking people out onto the curb. Employees become "agents of change" through the money and influence they're given. They might actually hire more :D

Davidzheng 15 hours ago

Even if your argument is correct here, it would only mean this particular method of replacement doesn't immediately work for this job.

1zael 17 hours ago

There's nothing novel in this article. This is what every other AI clickbait article is propagating.

podgorniy 7 hours ago

I want to see an example of the application with well written documentation which produces well working application based on those docs.

I discovered that it is not trivial to conceptualize app to that extent of clarity which is required for deterministic output of LLM. It's way easier to say than to actually implement by yourself (that's why examples are so interesting to see).

Backwards dynamics when you get spec/doc based on the source code does not work good enough.

omarreeyaz 13 hours ago

I’m not sure I buy this. GPT-5.2 codex still makes design errors that I as an engineer have to watch and correct. The only way I know how to catch it and then steer the model towards a correction is to be able to read the code and write some code into the prompt. So one can’t abstract programming language away through an agent…

euroderf 10 hours ago

The US military loves zillion-page requirements documents. Has anyone (besides maybe some Ph.Dork at DARPA) tried feeding a few to coder LLMs to generate applications - and then thrown them at test suites ?

freetonik 13 hours ago

There's a reason we distinguish between programmers and managers; if "LLMs are just the new high level language", then a manager is just another programmer operating on a level above code. I mean, sure, we can say that, but words become kind of meaningless at this point.

asim 12 hours ago

I wouldn't call it the new high level language. It's a new JIT. But that's not doing it justice. There's a translation of natural language to a tokenizer and processor that's akin to the earliest days of CPUs. It's a huge step change from punch cards. But there's also a lot to learn. I think we will eventually develop a new language that's more efficient for processing or multiple layers of transformers. Tbh Google is leapfrogging everyone in this arena and eventually we're going to more exotic forms of modelling we've never seen before except in nature. But from an engineering perspective all I can see right now is a JIT.

kazinator 2 days ago

This is a good summary of any random week's worth of AI shilling from your LinkedIn feed, that you can't get rid of.

QuadrupleA 18 hours ago

Can we stop repeating this canard, over and over?

Every "classic computing" language mentioned, and pretty much in history, is highly deterministic, and mind-bogglingly, huge-number-of-9s reliable (when was the last time your CPU did the wrong thing on one of the billions of machine instructions it executes every second, or your compiler gave two different outputs from the same code?)

LLMs are not even "one 9" reliable at the moment. Indeed, each token is a freaking RNG draw off a probability distribution. "Compiling" is a crap shoot, a slot machine pull. By design. And the errors compound/multiply over repeated pulls as others have shown.

I'll take the gloriously reliable classical compute world to compile my stuff any day.

kykat 18 hours ago

Agreed, yet we will have to keep seeing this take over and over again. As if I needed more reasons to believe the world is filled with morons.

kaapipo 15 hours ago

If we can treat the prompts as the versionable source code artefact, then sure. But as long as we need to fine-tune the output that's not a high level language. In the same way no one would edit the assembly that a compiler produces

imdsm 15 hours ago

If we're able to produce an LLM which takes a seed and produces the same output per input, then we'd be able to do this

layer8 11 hours ago

There must be good reasons why we don’t have this. I suspect one reason is that the SOTA providers are constantly changing the harness around the core model, so you’d need to version that harness as well.

Verdex 11 hours ago

If the LLM is a high level language, then why aren't we saving the prompts in git?

Last I checked with every other high level language, you save the source and then rerun the compiler to generate the artifact.

With LLMs you throw away the 'source' and save the artifact.

dgb23 11 hours ago

Some people do to a degree. Just not quite in the sense that this headline suggests.

There are now approaches were the prompt itself is being structured in a way (sort of like a spec) so you get to a similar result quicker. Not sure how well those work (I actually assume they suck, but I have not tried them).

Also some frameworks, templates and so on, provide a bunch of structured markdown files that nudges LLM assistance to avoid common issues and do things in a certain way.

andai 11 hours ago

The "prompt" is a guy who wrestled with Claude Code for several hours straight.

bwat49 23 minutes ago

I feel so attacked right now

AlexeyBrin 23 hours ago

This is an exaggeration, if you store the prompt that was "compiled" by today's LLMs there is no guarantee that in 4 months from now you will be able to replicate the same result.

I can take some C or Fortran code from 10 years ago, build it and get identical results.

gnatolf 17 hours ago

That is a wobbly assertion. You certainly would need to run the same compiler, forgo any recent optimisations, architecture updates and the likes if your code has numerical sensitive parts.

You certainly can get identical results, but it's equally certainly not going to be that simple a path frequently.

layer8 11 hours ago

The more important point is that even when you don’t get identical binary output, you still get identical observable behavior as specified by the programming language, unless there’s a compiler bug. That’s not the case for LLMs, they are more like an always randomly buggy compiler. You wouldn’t want to use such a compiler.

AlexeyBrin 12 hours ago

> You certainly can get identical results, but it's equally certainly not going to be that simple a path frequently.

But at least I know that if I need to, I can do it. With an LLM, if you don't store the original weights, all bets are off. Reproducibility of results can be a hard requirement in certain cases or industries.

skaul 7 hours ago

Programming with LLMs is fundamentally different than going from a lower-level to a higher-level language, even apart from the whole non-determinism thing. With a programming language, you're still writing a for-loop, whether that's in C, Java or Rust. There's language primitives that help you think better in certain languages, but they're still, at the end of the day, code and context that you have to hold in your head and be intimately familiar with.

That changes with LLMs. For now, you can use LLMs to help you code that way; a programming buddy whose code you review. That's soon going to become "quaint" (to quote the author) given the projected productivity gains of agents (and for many developers it already has).

redox99 7 hours ago

A programming language does not need to have a for loop. In fact many don't.

skaul 4 hours ago

Programming languages need to give the developer a way to iterate (map, fold, for-loop, whatever) over a collection of items. Over time we've come up with more elegant ways of doing this, but as a programmer, until LLMs, you've still had to be actively involved in the control logic. My point is that a developer's relationship with the code is very different now, in a way that wasn't true with previous low-to-high level language climbs.

redox99 4 hours ago

I was thinking of something like SQL, which is declarative and you tell it what you want, not how to do it broadly speaking.

Akef 10 hours ago

Paradigm shift ahead, folks. What I observe in the comments—often more compelling than the article itself—is the natural tension within the scientific community surrounding the 'scientific method,' a debate that's been playing out for... what, a year now? Maybe less? True, this isn't perfect, nor does it come with functionality guarantees. Talking about 10x productivity? That's relative—it hinges on the tool, the cultural background of the 'orchestra conductor,' or the specific, hands-on knowledge accumulated by the conductor, their team, organization, and even the target industry.

In essence: we're witnessing a paradigm shift. And for moments like these—I invite you—it's invaluable to have studied Popper and Kuhn in those courses.

An even more provocative hypothesis: the 'Vienna Circle' has morphed into the 'Circle of Big Tech,' gatekeepers of the data. What's the role of academia here? What happened to professional researchers? The way we learn has been hijacked by these brilliant companies, which—at least this time—have a clear horizon: maximizing profits. What clear horizon did the stewards of the scientific method have before? Wasn't it tainted by the enunciator's position? The personal trajectory of the scientist, the institution (university) funding them? Ideology, politics?

This time, it seems, we know exactly where we're headed.

(This comment was translated from Spanish, please excuse the rough edges)

BudapestMemora 14 hours ago

And sooner or later it will happen, imho. With probabalistic compiling. And several "prompts/agents" under the hood. The majority of "replies" wins to compile. Of course good context will contribute to better refined probability.

Ask yourself "Computer memory and disk are also not 100% reliable , but we live with it somehow without man-in-the-middle manual check layer, yes?" Answer about LLM will be the same, if good enough level of similarity/same asnwers is achieved.

geon 17 hours ago

> The code that LLMs make is much worse than what I can write: almost certainly; but the same could be said about your assembler

Has this been true since the 90s?

I pretty much only hear people saying modern compilers are unbeatable.

apical_dendrite 2 days ago

I'm trying to work with vibe-coded applications and it's a nightmare. I am trying to make one application multi-tenant by moving a bunch of code that's custom to a single customer into config. There are 200+ line methods, dead code everywhere, tons of unnecessary complexity (for instance, extra mapping layers that were introduced to resolve discrepancies between keys, instead of just using the same key everywhere). No unit tests, of course, so it's very difficult to tell if anything broke. When the system requirements change, the LLM isn't removing old code, it's just adding new branches and keeping the dead code around.

I ask the developer the simplest questions, like "which of the multiple entry-points do you use to test this code locally", or "you have a 'mode' parameter here that determines which branch of the code executes, which of these modes are actually used? and I get a bunch of babble, because he has no idea how any of it works.

Of course, since everyone is expected to use Cursor for everything and move at warp speed, I have no time to actually untangle this crap.

The LLM is amazing at some things - I can get it to one-shot adding a page to a react app for instance. But if you don't know what good code looks like, you're not going to get a maintainable result.

danparsonson 21 hours ago

You've just described the entirely-human-made project that I'm working on now.... at least now we can deliver the intractable mess much more quickly!

pjmlp 15 hours ago

Already there for anyone using iPaaS platforms, and despite their flaws, it is the new normal in many enterprise consulting scenarios.

fpereiro 3 hours ago

Hi HN! OP here. Thanks for reading and commenting -- (and @swah for posting!). It's unsettling to hit the HN front page, and even more so with an article that I hastily wrote down. I guess you never know what's going to hit a nerve.

Some context: I'm basically trying to make sense of the tidal wave that's engulfing software development. Over the last 2-3 weeks I've realized that LLMs will start writing most code very soon (I could be wrong, though!). This article is just me making sense of it, not trying to convince anybody of anything (except of, perhaps, giving the whole thing a think). Most of the "discarded" objections I presented in the list were things I espoused myself over the past year. I should have clarified that in the article.

I (barely) understand that LLMs are not a programming language. My point was that we could still think of them as a "higher level programming language", despite them 1) not being programming languages; 2) being wildly undeterministic; 3) also jumping levels by them being able to help you direct them. This way of looking at the phenomenon of LLMs is to try to see if previous shifts in programming can explain at least partially the dynamics we are seeing unfold so quickly (to find, in Ray Dalio's words, "another kind of those").

I am stepping into this world of LLM code generation with complicated feelings. I'm not an AI enthusiast, at least not yet. I love writing code by hand and I am proud of my hand-written open source libraries. But I am also starting to experience the possibilities of working on a higher level of programming and being able to do much more in breadth and depth.

I fixed an important typo - here I meant: "Economically, only quality is undisputable as a goal".

Responding to a few interesting points:

@manuelabeledo: during 2025 I've been building a programming substrate called cell (think language + environment) that attempts to be both very compact and very expressive. Its goal is to massively reduce complexity to turn general purpose code more understandable (I know this is laughably ambitious and I'm desperately limited in my capabilities of pulling through something like that). But because of the LLM tsunami, I'm reconsidering the role of cell (or any other successful substrate): even if we achieve the goal, how will this interact with a world where people mostly write and validate code through natural language prompts? I never meant to say that natural language would itself be this substrate, or that the combination of LLMs and natural languages could do that: I still see that there will be a programming language behind all of this. Apologies for the confusion.

@heikkilevanto & @matheus-rr: Mario Zechner has a very interesting article where he deals with this problem (https://mariozechner.at/posts/2025-06-02-prompts-are-code/#t...). He's exploring how structured, sequential prompts can achieve repeatable results from LLMs, which you still have to verify. I'm experimenting with the same, though I'm just getting started. The idea I sense here is that perhaps a much tighter process of guiding the LLM, with current models, can get you repeatable and reliable results. I wonder if this is the way things are headed.

@woodenchair: I think that we can already experience a revolution with LLMs that are not fully autonomous. The potential is that an engineering-like approach to a prompt flow can allow you to design and review (not write) a lot more code than before. Though you're 100% correct that the analogy doesn't strictly hold until we can stop looking at the code in the same way that a js dev doesn't look at what the interpreter is emitting.

@nly: great point. The thing is that most code we write is not elegant implementations of algorithms, but mostly glue or CRUDs. So LLMs can still broadly be useful.

I hope I didn't rage bait anybody - if I did, it wasn't intentional. This was just me thinking out loud.

dainiusse 16 hours ago

Does the product work? Is it maintainable?

Everything else is secondary.

senfiaj 8 hours ago

For me LLMs feel closer to IDE with steroids. Unless LLMs produce the same output from the same input, I can't view them as compilers.

TZubiri 2 days ago

"Following this hypothesis, what C did to assembler, what Java did to C, what Javascript/Python/Perl did to Java, now LLM agents are doing to all programming languages."

This is not an appropriate analogy, at least not right now.

Code Agents are generating code from prompts, in that sense the metaphor is correct. However Agents then read the code and it becomes input and they generate more code. This was never the case for compilers, an LLM used in this sense is strictly not a compiler because it is not cyclic and not directional.

danparsonson 20 hours ago

I think it's appropriate in terms of the results rather than the process; the bigger problem I see is that programming languages are designed to be completely unambiguous, whereas human language is not ("Go to the shop and buy one box of eggs, and if they have carrots, buy three") so we're transitioning from exactly specifying what we want the software to do, to tying ourselves in knots trying to specify it exactly, while a machine tries to disambiguate our request. I bet lawyers would make good vibe coders.

phplovesong 14 hours ago

Please stop with these rage click posts. There is so much wrong in this article i wont even start...

badgersnake 14 hours ago

More garbage content on the front page. It’s a constant AI hype pieces with zero substance from people who just happen to work for AI companies. Hacker news is really going downhill.

redbell 16 hours ago

As per Andrej Karpathy's viral tweet from three years ago [1]:

  The hottest new programming language is English

______________

1. https://x.com/i/status/1617979122625712128

freejazz 4 hours ago

When a computer programmer finally discovers English.

rco8786 10 hours ago

The thing that’s always missing from these critiques isnt about code quality or LoC or slop.

The issue is that if you fire off 10 agents to work autonomously for an extended period of time at least 9 of them will build the WRONG THING.

The problem is context management and decision making based on that context. LLMs will always make assumptions about what you want, and the more assumptions they make the higher the likelihood that one or more of them is wrong.

cess11 13 hours ago

Was StackOverflow "the new high level language"? The proliferation of public git repos?

Because that's pretty much what "agentic" LLM coding systems are an automation of, skimming through forums or repos and cribbing the stuff that looks OK.

Razengan 7 hours ago

If you see magazines, articles, ads and TV shows from the 1980s (there are lots on YouTube and a fun rabbit hole, like the BBC Archive), the general promise was "Computers can do anything, if you program them."

Well, nobody could figure out how to program them. Except the few outcasts like us who went on to suffer for the rest of our lives for it :')

With phones & LLMs this is the closest we have come to that original promise of a computer in every home and everyone being able to do anything with it, that isn't pre-dictated by corporations and their apps:

Ideally ChatGPT etc should be able to create interactive apps on the fly on iPhone etc. Imagine having a specific need and just being able to say it and get an app right away just for you on your device.

karmasimida 14 hours ago

LLMs are new runtimes.

coffeebeqn 13 hours ago

Alright I’m out

pvtmert 15 hours ago

Except that the output depends on stars' alignment.

Imagine a machine that does the job sometimes but fails on some other times. Wonderful isn't it?

kristjansson 17 hours ago

Code written in a HLL is a sufficient[1] description of the resulting program/behavior. The code, in combination with the runtime, define constraints on the behavior of the resulting program. A finished piece of HLL code encodes all the constraints the programmer desired. Presuming a 'correct' compiler/runtime, any variation in the resulting program (equivalently the behavior of an interpreter running the HLL code) varies within the boundaries of those constraints.

Code in general is also local, in the sense that small perturbation to the code has effects limited to a small and corresponding portion of the program/behavior. A change to the body of a function changes the generated machine code for that function, and nothing else[2].

Prompts provided to an LLM are neither sufficient nor local in the same way.

The inherent opacity of the LLM means we can make only probabilistic guarantees that the constraints the prompt intends to encode are reflected by the output. No theory (that we now know) can even attempt to supply such a guarantee. A given (sequence of) prompts might result in a program that happens to encode the constraints the programmer intended, but that _must_ be verified by inspection and testing.

One might argue that of course an LLM can be made to produce precisely the same output for the same input; it is itself a program after all. However, that 'reproducibility' should not convince us that the prompts + weights totally define the code any more than random.Random(1).random() being constant should cause us to declare python's .random() broken. In both cases we're looking at a single sample from a pRNG. Any variation whatsoever would result in a different generated program, with no guarantee that program would satisfy the constraints the programmer intended to encode in the prompts.

While locality falls similarly, one might point out the an agentic LLM can easily make a local change to code if asked. I would argue that an agentic LLMs prompts are not just the inputs from the user, but the entire codebase in its repo (if sparsely attended to by RAG or retrieval tool calls or w/e). The prompts _alone_ cannot be changed locally in a way that guarantees a local effect.

The prompt LLM -> program abstraction presents leaks of such volume and variety that it cannon be ignored like the code -> compiler -> program abstraction can. Continuing to make forward progress on a project requires the robot (and likely the human) attend to the generated code.

Does any of this matter? Compilers and interpreters themselves are imperfect, their formal verification is incomplete and underutilized. We have to verify properties of programs via testing anyway. And who cares if the prompts alone are insufficient? We can keep a few 100kb of code around and retrieve over it to keep the robot on track, and the human more-or-less in the loop. And if it ends up rewriting the whole thing every few iterations as it drifts, who cares?

For some projects where quality, correctness, interoperability, novelty, etc don't matter, it might be. Even in those, defining a program purely via prompts seems likely to devolve eventually into aggravation. For the rest, the end of software engineering seems to be greatly exaggerated.

[1]: loosely in the statistical sense of containing all the information the programmer was able to encode https://en.wikipedia.org/wiki/Sufficient_statistic

[2]: there're of course many tiny exceptions to this. we might be changing a function that's inlined all over the place; we might be changing something that's explicitly global state; we might vary timing of something that causes async tasks to schedule in a different order etc etc. I believe the point stands regardless.

svilen_dobrev 16 hours ago

so, prompt engineering it is. Happy new LLMium.

retinaros 12 hours ago

As long as SOTA is dumblooping until LLM verify some end goal and spend as many token as possible it wont be a language. At best an inelegant dialect.

rvz 2 days ago

So we are going to certainly see more of these incidents then [0] from those not understanding LLM written code as now 'engineers' will let their skills decay because the 'LLMs know best'.

[0] https://sketch.dev/blog/our-first-outage-from-llm-written-co...

koiueo 14 hours ago

Then commit your prompt to a git repository.

Gosh, LLMs been a thing only for a few years, but people became stupid already.

> what Javascript/Python/Perl did to Java

FFS... What did python do to java?

mock-possum 17 hours ago

Hasnt natural language always been the highest level language?

gloosx 10 hours ago

Natural language is not the highest level – it is just the highest level that still needs words.

stared 2 days ago

No, prompts are not the new source code, vide https://quesma.com/blog/vibe-code-git-blame/.

lofaszvanitt 18 hours ago

Why using agents if there are absolutely zero need for them? It's the usual, here, we spent a shitton of money on this, now find out how we MUST include this horrible thing into our already bloated dev environment.

renewiltord 19 hours ago

We’re missing the boat here. There are already companies with millions in revenue that are pure agent loops of English text. They can do things our traditional software cannot.

kristjansson 17 hours ago

An example or two would go a long way here.

renewiltord 16 hours ago

A natural question to ask in response, but I did not find out in each case in a way where I should say anything about their engineering so since I've said something about that I can't say who. I get how it comes off. If the Criterion of Embarrassment helps, I find myself sorely unqualified for this new world.

To be honest, I shouldn't have said anything in the first place. It isn't useful as it is because you can't reasonably believe me. I just feel blindsided by the stuff that's working now.

paulhebert 17 hours ago

Can you share some examples?

OutOfHere 24 hours ago

The side effect of using LLMs for programming is that no new programming language can now emerge to be popular, that we will be stuck with the existing programming languages forever for broad use. Newer languages will never accumulate enough training data for the LLM to master them. Granted, non-LLM AIs with true neural memory can work around this, as can LLMs with an infinite token frozen+forkable context, but these are not your everyday LLMs.

spacephysics 21 hours ago

I wouldn’t be surprised if in the next 5-10 years the new and popular programming language is one built with the idea of optimizing how well LLM’s (or at that point world models) understand and can use it.

Right now LLMs are taking languages meant for humans to understand better via abstraction, what if the next language is designed for optimal LLM/world model understanding?

Or instead of an entirely new language, theres some form of compiling/transpiling from the model language to a human centric one like WASM for LLMs

raincole 19 hours ago

I don't think we need that many programming languages anyway.

I'm more worried about the opposite: the next popular programming paradigm will be something that's hard to read for humans but not-so-hard for LLM. For example, English -> assembly.

slopusila 16 hours ago

you can invent a new language, ask LLM to translate existing code bases into it, then train on that

Just like AlphaZero ditched human Go matches and trained on synthetic ones, and got better this way

dsr_ 2 days ago

It's not a programming language if you can't read someone else's code, figure out what it does, figure out what they meant, and debug the difference between those things.

"I prompted it like this"

"I gave it the same prompt, and it came out different"

It's not programming. It might be having a pseudo-conversation with a complex system, but it's not programming.

Closi 2 days ago

> It's not a programming language if you can't read someone else's code, figure out what it does, figure out what they meant, and debug the difference between those things.

Well I think the article would say that you can diff the documentation, and it's the documentation that is feeding the AI in this new paradigm (which isn't direct prompting).

If the definition of programming is "a process to create sets of instructions that tell a computer how to perform specific tasks" there is nothing in there that requires it to be deterministic at the definition level.

skydhash 17 hours ago

> If the definition of programming is "a process to create sets of instructions that tell a computer how to perform specific tasks" there is nothing in there that requires it to be deterministic at the definition level.

The whole goal of getting a computer to do a task is that it’s capable to do it many times and reliably. Especially in business, infrastructure, and manufacturing,…

Once you turn specifications and requirements in code, it’s formalized and the behavior is fixed. That’s only when it’s possible to evaluate it. Not with the specifications, but with another set of code that is known to be good (or easier to verify)

The specification is just a description of an idea. It is a map. But the code is the territory. And I’ve never seen anyone farm or mine from a map.

furyofantares 2 days ago

I think I 100% agree with you, and yet the other day I found myself telling someone "Did you know OpenClaw was written Codex and not Claude Code?", and I really think I meant it in the same sense I'd mean a programming language or framework, and I only noticed what I'd said a few minutes later.

hackyhacky 2 days ago

> "I gave it the same prompt, and it came out different"

I wrote a program in C and and gave it to gcc. Then I gave the same program to clang and I got a different result.

I guess C code isn't programming.

svieira 2 days ago

Note that the prompt wasn't fed to another LLM, but to the same one. "I wrote a program in C and gave it to GCC. Then I gave the same program to GCC again and I got a different result" would be more like it.

hackyhacky 24 hours ago

> Then I gave the same program to GCC again and I got a different result" would be more like it.

This is a completely realistic scenario, given variance between compiler output based on optimization level, target architecture, and version.

Sure, LLMs are non-deterministic, but that doesn't matter if you never look at the code.

Retric 24 hours ago

Optimization level, target architecture, etc are just information fed to the compiler. If it’s still nondeterministic with everything kept the same your compiler is broken.

hackyhacky 24 hours ago

You're missing the point. I'm not saying that compilers are nondeterminitsic or that LLMs are deterministic. I'm saying that it doesn't matter. No one cares about deterministic results except programmers. The non-technical user who will make software in the future just knows that he gets what he asked for.

Retric 24 hours ago

Systems randomly failing is of significant concern to non programmers, that’s inherent to the non-deterministic nature of LLM’s.

I can send specific LLM output to QA, I can’t ask QA to validate that this prompt will always produce bug free code even for future versions of the AI.

hackyhacky 24 hours ago

Huh? No.

The output of the LLM is nondeterministic, meaning that the same input to the LLM will result in different output from the LLM.

That has nothing to do with weather the code itself is deterministic. If the LLM produces non-deterministic code, that's a bug, which hopefully will be caught by another sub-agent before production. But there's no reason to assume that programs created by LLMs are non-deterministic just because the LLMs themselves are. After all, humans are non-deterministic.

> I can send specific LLM output to QA, I can’t ask QA to validate that this prompt will always produce bug free code even for future versions of the AI.

This is a crazy scenario that does not correspond to how anyone uses LLMs.

Retric 24 hours ago

I agree it’s nonsense.

That we agree it’s nonsense means we agree that using LLM prompts as a high level language is nonsense.

cocoto 2 days ago

If there is no error on the compiler implementation and no undefined behavior the resulting program is equivalent and the few differences are mostly just implementation defined stuff which are left to the compiler to decide (but often gcc and clang do the same). The performance might differ also. It’s clearly not comparable to the many differences you can get from LLM’s output.

hackyhacky 2 days ago

It just depends what level of abstraction you're willing to pretend doesn't matter.

gcc and clang produce different assembly code, but it "does the same thing," for certain definitions of "same" and "thing."

Claude and Gemini produce different Rust code, but it "does the same thing," for certain definitions of "same" and "thing."

The issue is that the ultimate beneficiary of AI is the business owner. He's not a programmer, and he has a much looser definition of "same."

ebb_earl_co 24 hours ago

No, the ultimate beneficiary of LLM-created code is the toll collectors who stole as much intellectual property as they could (and continue to do so), fleecing everyone else that they are Promethean for having done so and for continuing to do so.

danelski 24 hours ago

It has to stop somewhere. Business owner can also hire a different company to create the product and get a result different by as little as 5% performance difference or something with clearly inferior maintainability and UX. You'd hardly argue that it's 'the same' when they followed the same spec, which will never be fully precise at the business level. I agree that talking to an LLM is akin to using the business oriented logic at the module or even function level.

runarberg 23 hours ago

Your logic sounds like willful ignorance. You are relying on some odd definitions of "definitions", "equivalence", and "procedures". These are all rigorously defined in the underlying theory of computer science (using formal logic, lambda calculus, etc.)

Claude and Gemini do not "do the same thing" in the same way in which Clang and GCC does the same thing with the same code (as long as certain axioms of the code holds).

The C Standard has been rigorously written to uphold certain principles such that the same code (following its axioms) will always produce the same results (under specified conditions) for any standard compliant compiler. There exists no such standard (and no axioms nor conditions to speak of) where the same is true of Claude and Gemini.

If you are interested, you can read the standard here (after purchasing access): https://www.iso.org/obp/ui/#iso:std:iso-iec:9899:ed-5:v1:en

hackyhacky 23 hours ago

> Claude and Gemini do not "do the same thing" in the same way in which Clang and GCC does the same thing with the same code (as long as certain axioms of the code holds).

True, but none of that is relevant to the non-programmer end user.

> You are relying on some odd definitions of "definitions", "equivalence", and "procedures"

These terms have rigorous definitions for programmers. The person making software in the future is a non-programmer and doesn't care about any of that. They care only that the LLM can produce what they asked for.

> The C Standard has been rigorously written to uphold certain principles

I know what a standard is. The point is that the standard is irrelevant if you never look at the code.

runarberg 23 hours ago

It is indeed extremely relevant to the end user. For websites the end user is not the creator of the web site who pushes it to the server, it is the user who opens it on a browser. And for that user it matters a great deal if a button is green or blue, if it responds to keyboard events, if it says “submit” or “continue”, etc. It also matters to the user whether their information is sent to a third party, whether their password is leaked, etc.

Your argument here (if I understand you correctly) is the same argument that to build a bridge you do not need to know all the laws of physics that prevents it from collapsing. The project manager of the construction team doesn’t need to know it, and certainly not the bicyclists who cross it. But the engineer who draws the blueprints needs to know it, and it matters that every detail on those blueprints are rigorously defined, such that the project manager of the construction team follows them to the letter. If the engineer does not know the laws of physics, or the project manager does not follow the blueprints to the letter, the bridge will likely collapse, killing the end user, that poor bicyclist.

measurablefunc 24 hours ago

The output will conform to the standard & it will be semantically equivalent. You're making several category errors w/ your comparison.

hackyhacky 24 hours ago

> You're making several category errors w/ your comparison.

I don't think I am. If you ask an LLM for a burger web site, you will get a burger web site. That's the only category that matters.

mjr00 24 hours ago

> I don't think I am. If you ask an LLM for a burger web site, you will get a burger web site. That's the only category that matters.

If one burger website generated uses PHP and the other is plain javascript, which completely changes the way the website has to be hosted--this category matters quite a bit, no?

hackyhacky 24 hours ago

> which completely changes the way the website has to be hosted--this category matters quite a bit, no?

It matters to you because you're a programmer, and you can't imagine how someone could create a program without being a programmer. But it doesn't really matter.

The non-technical user of the LLM won't care if the LLM generates PHP or JS code, because they don't care how it gets hosted. They'll tell the LLM to take care of it, and it will. Or more likely, the user won't even know what the word "hosting" means, they'll simply ask the LLM to make a website and publish it, and the LLM takes care of all the details.

mjr00 24 hours ago

Is the LLM paying for hosting in this scenario, too? Is the LLM signing up for the new hosting provider that supports PHP after initially deploying to github pages?

Feels like the non-programmer is going to care a little bit about paying for 5 different hosting providers because the LLM decided to generate their burger website in PHP, JavaScript, Python, Ruby and Perl in successive iterations.

hackyhacky 24 hours ago

> Is the LLM paying for hosting in this scenario, too? Is the LLM signing up for the new hosting provider that supports PHP after initially deploying to github pages?

It's an implementation detail. The user doesn't care. OpenClaw can buy its own hosting if you ask it to.

> Feels like the non-programmer is going to care a little bit about paying for 5 different hosting providers because the LLM decided to generate their burger website in PHP, JavaScript, Python, Ruby and Perl in successive iterations.

There's this cool new program that the kids are using. It's called Docker. You should check it out.

mjr00 24 hours ago

How's the non-programmer going to tell the LLM to use Docker? They don't know what Docker is.

How do you guarantee that the prompt "make me a burger website" results in a Docker container?

hackyhacky 24 hours ago

> How's the non-programmer going to tell the LLM to use Docker? They don't know what Docker is.

At this point, I think you are intentionally missing the point.

The non-programmer doesn't need to know about Docker, or race conditions, or memory leaks, or virtual functions. The programmer says "make me a web site" and the LLM figures it out. It will use an appropriate language and appropriate libraries. If appropriate, it will use Docker, and if not, it won't. If the non-programmer wants to change hosting, he can say so, and the LLM will change the hosting.

The level of abstraction goes up. The details that we've spent our lives thinking about are no longer relevant.

It's really not that complicated.

sarchertech 20 hours ago

And then the user says “LLM make this slight change to my website” and suddenly the website is subtly different in 100 different ways and users are confused and frustrated and they massively hemorrhage customers.

mjr00 24 hours ago

How does the non-programmer know about hosting? They just want a burger site. What's hosting? Is that like Facebook?

To maybe get out of this loop: your entire thesis is that nonfunctional requirements don't matter, which is a silly assertion. Anyone who has done any kind of software development work knows that nonfunctional requirements are important, which is why they exist in the first place.

skydhash 17 hours ago

Yeah, in most swe roles, you got some task from the PM that describe a new features. But you were not hired to just translate the ticket to code. More often, it’s your responsibility to figure out all those non-functional requirements, like don’t break anything that is currently working.

More often, the issue with legacy code is not that you don’t know how to make a change, it’s because you don’t know if and how it will blow up after making it.

raincole 19 hours ago

> this category matters quite a bit, no?

No. Put yourself in the shoes of the owner of the burger restaurant (who only heard the term "JavaScript" twice in his life and vaguely remember it's probably something related to "Java", which he heard three times) and you'll know why the answer is no.

mjr00 19 hours ago

I put myself in the shoes of the burger restaurant owner. I vibe coded a website. Sweet. I talk to my cousin who's doing the web hosting and he says he can't host a "Pee Haich Pee" site. I don't know what that is. Suddenly the thing that didn't matter actually really fucking matters

This is like saying it doesn't matter if your pipes are iron, lead or PVC because you don't see them. They all move water and shit where they need to be, so no problem. Ignorance is bliss I guess? Plumbers are obsolete!

hackyhacky 17 hours ago

> I talk to my cousin who's doing the web hosting and he says he can't host a "Pee Haich Pee" site

I already addressed this exact argument in another comment. In short: hosting is an implementation detail. The LLM can solve this problem just like any coding bug. The user can give his cousin's email to the LLM, which will solve the issue by finding better hosting, rewriting the program in another language, or using Docker.

Hosting is not a hard problem to solve, compared to other issues.

mjr00 17 hours ago

> The user can give his cousin's email to the LLM, which will solve the issue by finding better hosting, rewriting the program in another language, or using Docker.

Can you please point to a currently available LLM that can do this?

hackyhacky 16 hours ago

Openclaw

measurablefunc 24 hours ago

I think you are b/c you lack actual understanding of how compilers work & what it would mean to compile the same C code w/ two different conformant C compilers & get semantically different results.

hackyhacky 24 hours ago

> you lack actual understanding of how compilers work

My brother in Christ, please get off your condescending horse. I have written compilers. I know how they work. And also you've apparently never heard of undefined behavior.

The point is that the output is different at the assembly level, but that doesn't matter to the user. Just as output from an LLM but differ from another, but the user doesn't care.

runarberg 23 hours ago

Undefined behavior is a edge case in C. Other programing languages (like JavaScript) goes to great lengths in defining their standards such that it is almost impossible to write code with undefined behavior. By far majority of code written out there has no undefined behavior. I think it is safe to assume that everyone here (except you) are talking about C code without undefined behavior when we mean that the same code produces the same results regardless of the compiler (as long as the compiler is standards conforming).

hackyhacky 21 hours ago

Language-based undefined behavior is just one kind of nondeterminism that programmers deal with every day. There are other examples, such as concurrency. So that claim that using LLMs isn't programming because "nondeterminism" makes no sense.

runarberg 20 hours ago

> Language-based undefined behavior is just one kind of nondeterminism that programmers deal with every day.

Every day you say. I program every day, and I have never, in my 20 years of programming, on purpose written in undefined behavior. I think you may be exaggerating a bit here.

I mean, sure, some leet programmers do dabble in the undefined behavior, they may even rely on some compiler bug for some extreme edge case during code golf. Whatever. However it is not uncommon when enough programmers start relying on undefined behavior behaving in a certain way, that it later becomes a part of the standard and is therefor no longer “undefined behavior”.

Like I said in a different thread, I suspect you may be willfully ignorant about this. I suspect you actually know the difference between:

a) written instructions compiled into machine code for the machine to perform, and,

b) output of a statistical model, that may or may not include written instructions of (a).

There are a million reasons to claim (a) is not like (b), the fact that (a) is (mostly; or rather desirably) deterministic, while (b) is stochastic is only one (albeit a very good) reason.

measurablefunc 24 hours ago

You don't sound like you have written any code at all actually. What you do sound like is someone who is pretending like they know what it means to program which happens a lot on the internet.

hackyhacky 24 hours ago

> You don't sound like you have written any code at all actually

Well, you sound like an ignorant troll who came here to insult people and start fights. Which also happens a lot on the internet.

Take your abrasive ego somewhere else. HN is not for you.

measurablefunc 24 hours ago

I don't care what I sound like to people who front about their programming skills. I'm not here to impress people like you.

beefsack 2 days ago

Prompting isn't programming. Prompting is managing.

echelon 23 hours ago

Is it?

If I know the system I'm designing and I'm steering, isn't it the same?

We're not punching cards anymore, yet we're still telling the machines what to do.

Regardless, the only thing that matters is to create value.

aryonoco 24 hours ago

Interesting how the definition “real programming” keeps changing. I’m pretty sure when the assembler first came, bare metal machine code programmers said “this isn’t programming”. And I can imagine their horror when the compiler came along.

chasd00 19 hours ago

I think about those conversations a lot. I’m in the high power rocketry hobby and wrote my own flight controller using micropython and parts from Adafruit. Worked just fine and did everything I wanted it to do yet the others in the hobby just couldn’t stand that it wasn’t in C. They were genuinely impressed then I said micropython and all of a sudden it was trash. People just have these weird obsessions that blind them to anything different.

Daviey 24 hours ago

How did you come up with this definition?

problynought 2 days ago

All programming achieves the same outcome; requests the OS/machine set aside some memory to hold salient values and mutate those values in-line with mathematical recipe.

Functions like:

updatesUsername(string) returns result

...can be turned into generic functional euphemism

takeStringRtnBool(string) returns bool

...same thing. context can be established by the data passed in, external system interactions (updates user values, inventory of widgets)

as workers SWEs are just obfuscating how repetitive their effort is to people who don't know better

the era of pure data driven systems is arrived. in-line with the push to dump OOP we're dumping irrelevant context in the code altogether: https://en.wikipedia.org/wiki/Data-driven_programming

echelon 2 days ago

[flagged]

happytoexplain 2 days ago

I'm not sure I will ever understand the presentation of "inevitable === un-criticizable" as some kind of patent truth. It's so obviously fallacious that I'm not sure what could even drive a human to write it, and yet there it is, over and over and over.

Lots of horrifying things are inevitable because they represent "progress" (where "progress" means "good for the market", even if it's bad for the idea of civilization), and we, as a society, come to adapt to them, not because they are good, but because they are.

orbital-decay 2 days ago

>"I prompted it like this"

>"I gave it the same prompt, and it came out different"

1:1 reproducibility is much easier in LLMs than in software building pipelines. It's just not guaranteed by major providers because it makes batching less efficient.

isodev 24 hours ago

> 1:1 reproducibility is much easier in LLMs than in software building pipelines

What’s a ‘software building pipeline’ in your view here? I can’t think of parts of the usual SDLC that are less reproducible than LLMs, could you elaborate?

orbital-decay 21 hours ago

Reproducibility across all existing build systems took a decade of work involving everything from compilers to sandboxing, and a hard reproducibility guarantee in completely arbitrary cases is either impossible or needs deterministic emulators which are terribly slow. (e.g. builds that depend on hardware probing or a simulation result)

Input-to-output reproducibility in LLMs (assuming the same model snapshot) is a matter of optimizing the inference for it and fixing the seed, which is vastly simpler. Google for example serves their models in an "almost" reproducible way, with the difference between runs most likely attributed to batching.

sarchertech 19 hours ago

It’s not just about non-determinism, but about how chaotic LLMs are. A one word difference in a spec can and frequently does produce unrecognizably different output.

If you are using an LLM as a high level language, that means that every time you make a slight change to anything and “recompile” all of the thousands upon thousands of unspecified implementation details are free to change.

You could try to ameliorate this by training LLMs to favor making fewer changes, but that would likely end up encoding every bad architecture decisions made along the way and essentially forcing a convergence on bad design.

Fixing this I think requires judgment on a level far beyond what LLMs have currently demonstrated.

orbital-decay 18 hours ago

>It’s not just about non-determinism

I'm very specifically addressing prompt reproducibility mentioned above, because it's a notorious red herring in these discussions. What you want is correctness, not determinism/reproducibility which is relatively trivial. (although thinking of it more, maybe not that trivial... if you want usable repro in the long run, you'll have to store the model snapshot, the inference code, and make it deterministic too)

>A one word difference in a spec can and frequently does produce unrecognizably different output.

This is well out of scope for the reproducibility and doesn't affect it in the slightest. And for practical software development this is also a red herring, the real issue is correctness and spec gaming. As long as the output is correct and doesn't circumvent the intention of the spec, prompt instability is unimportant, it's just the ambiguous nature of the domain LLMs and humans operate in.

sarchertech 12 hours ago

Well if you want to use it as a high level language where you check in the spec and regenerate the code then prompt instability/chaotic output makes that infeasible.

You can’t just tell users “sorry there are a million tiny differences all over the app every time we change the slightest thing, that’s just the ambiguous nature of reality”.

orbital-decay 11 hours ago

>where you check in the spec and regenerate the code then prompt instability/chaotic output makes that infeasible

What, why would you want to write the code anew? Identify the changes in the spec and bring the existing code in line with them.

sarchertech 54 minutes ago

That’s the whole thesis of the article. Using an LLM as a high level language.

>The codebase should be reconstructable from the documentation

zkmon 16 hours ago

Why isn't there a downvote button for the thread?

gaigalas 9 hours ago

But what we really want is to cut down layers of abstraction, not increase them.

I mean, we only have them because it is strictly necessary. If we could make architectures friendly to programming directly, we would have.

In that sense, high level languages are not a marvelous thing but a burden we have to carry because of the strict requirements of low level ones. The less burdens like those we have, the better.

titaniumrain 15 hours ago

lol, this author doesn't even understand what llm is

fullstackchris 16 hours ago

> Using LLM agents is expensive: if they give you already 50% more productivity, and your salary is an average salary, they are not. And LLMs will only get cheaper. They are only expensive in absolute, not in relative terms.

critical distinction: unless your getting paid comparable to your ouput (literally 0 traditional 9-5 software jobs I know unfortunately) this is infact the opposite - a subscription to any of these services reduces your overall salary, it doesnt make it higher...

then there is the case i know the dishonest are doing is firing of claude or whatever and going for a walk

abcde666777 18 hours ago

Are these kinds of articles a new breed of rage bait? They keep ending up on the front page with thriving comment sections, but in terms of content they're pretty low in nutritional value.

So I'm guessing they just rise because they spark a debate?

sph 17 hours ago

It’s both rage and hype bait, farming karma.

The optimists upvote and praise this type of content, then the pessimists come to comment why this field is going to the dogs. Rinse and repeat.

elzbardico 15 hours ago

Vibe coders are the new eternal september

herodoturtle 14 hours ago

I didn’t catch this reference, so adding the below for other folks in the same boat:

https://en.wikipedia.org/wiki/Eternal_September

“Eternal September or the September that never ended was a cultural phenomenon during a period beginning around late 1993 and early 1994, when Internet service providers began offering Usenet access to many new users. Before this, the only sudden changes in the volume of new users of Usenet occurred each September, when cohorts of university students would gain access to it for the first time, in sync with the academic calendar.”

direwolf20 13 hours ago

More generally, it's whenever the expert to newbie ratio in a community crosses a certain threshold and never returns.

discreteevent 13 hours ago

Eternal LLMber

feverzsj 15 hours ago

You can get to the front page easily with dozen upvotes, like from your colleagues and friends. Sadly, that's possibly the only way to get your post some attention here now.

janandonly 13 hours ago

I would disagree. It’s true that scouring the New section reveals a lot of hidden gems, but I know from experience that one in every 20-30 of my submissions ends up on the frontpage.

etrvic 12 hours ago

I once read here on HN that a good metric for filtering controversial comment sections is number of upvotes/comments. If it's bellow one, the thread is probably controversial.

layer8 12 hours ago

Are you saying controversial is good or bad?

etrvic 11 hours ago

Neither, that's up to your individual preference. Although I think that controversial threads have more noise, but sometimes provide a more enjoyable read.

layer8 11 hours ago

So what use is it to filter them? It seems you still have to judge their worth based on their actual contents.

jermaustin1 11 hours ago

The front page has an algorithm that is "less noise, more news" but if you go to the /active page, you get more conversation-driven submissions. I tend to load both up and refresh every few hours.

etrvic 10 hours ago

> The FAQ notes that submission rank is impacted by "software which downweights overheated discussions." A good rule of thumb for this effect is when the number of comments on a submission exceeds its score. Moderators can overrule the downranking for appropriate, not-actually-a-flame-war discussions.

https://github.com/minimaxir/hacker-news-undocumented

amelius 11 hours ago

Consider there are 100 upvotes and 100 downvotes. Net votes: 0. The submission would end up with a lower ranking that you wanted it to have.

layer8 9 hours ago

Submissions don’t have downvotes, only flagging.

amelius 7 hours ago

Fair. But it still holds for the comments section.

cgfjtynzdrfht 12 hours ago

[dead]

ThrowawayTestr 11 hours ago

In common parlance, this would be called "ratioing".

eastbound 18 hours ago

There’s barely any debate, people don’t answer each other; It’s rather about invoking the wonder and imagination of everyone’s brain. Like spatial conquest or an economic crisis: It will change everything but you can’t do anything immediately about it, and everyone tries to understand what it will change so they can adapt. It’s more akin to 24hrs junk news cycle, where everything is presented as an alert but your tempted to listen because it might affect you.

anoplus 17 hours ago

I am interested in a post that will teach me how to consume only the news that really relevant for me

sph 17 hours ago

Here’s one: just don’t read the news.

Google what you are interested in at the moment, and dive long into what matters to you, rather than being fed engagement bait.

Easier said than done, I understand that.

anoplus 16 hours ago

I guess I'll have an agent for that one day. News agent.

gilleain 14 hours ago

You could even have your 'news agent' print your news every day on some cheap paper.

Like a 'news periodical', shall we say.

direwolf20 13 hours ago

I heard The Onion does that!

avaer 14 hours ago

The hidden fallacy in your comment is that there is such a thing as "news that is really relevant for you".

This isn't all that different than saying that it would be nice if someone else did your thinking for you -- which is a totally fine thing to want, but let's not get confused.

"News that is relevant for you" is a concept made up by advertising companies to legitimize them in having power over what you see. Because if they presented it plainly, you would be rightly alarmed.

layer8 12 hours ago

It is the meta-level counterpart of the fact that LLMs are difficult to reason about.

bonoboTP 12 hours ago

This comment has even lower nutritional value. It's just a "dislike" with more words. You could have offered your counterarguments or if you're too tired of it but still feel you need to be heard, you could have linked to a previous comment or post of yours.

abcde666777 11 hours ago

Well, I didn't articulate it but behind my comment was a question - who's actually upvoting this stuff and why?

To me the claim of the article was silly on the surface of it, silly enough that I was surprised that folks consider it worthy of discussion.

Is there just a large number of upvoters here without even a basic understanding of the topic at hand? Or is there some other explanation beyond that?

bonoboTP 11 hours ago

Hacker news is a bit different from Reddit or other social media. This is a good summary, especially the section titled "Comments". https://news.ycombinator.com/newsguidelines.html

Edit: I notice I'm talking to a 2 week old account, I should've checked before engaging.

rTX5CMRXIfFG 12 hours ago

I mean, are you gonna die on a hill defending every low-quality content in HN? Because I think it’s perfectly OK to call it out so that moderators can notice and improve. You seem to think that readers have an inherent responsibility to salvage someone else’s bad article.

bonoboTP 12 hours ago

It's the top comment. But I'll disengage as this gets too meta. I'd encourage talking about the substance.

g947o 12 hours ago

I complained about the same thing, but apparently people take the bait.

Which is why I only quickly scan through the comments to see if there are new insights I haven't seen in the past few months. Surprise, almost never.

xyzsparetimexyz 15 hours ago

This one didn't contain a sepia tinted ai slop diagram so it beats the average

j45 15 hours ago

Maybe it still could be new to some.

zkmon 16 hours ago

> So I'm guessing they just rise because they spark a debate?

Precisely. Attention economy. It rules.

simianwords 17 hours ago

What I find interesting is that similar propositions made 2 years ago was ragebait to the same people but they ended up coming true.

wiseowise 15 hours ago

It is still a rage bait.

ares623 15 hours ago

I'll believe it when companies and projects start committing prompts into Github and nothing else. Let CI/CI regenerate the entire thing on each build.

We don't commit compiled blobs in source control. Why can't the same be done for LLMs?

kittbuilds 7 hours ago

[dead]

MORPHOICES 14 hours ago

[dead]

niobe 15 hours ago

Utterly brainless article. Why am I even commenting.

dankobgd 12 hours ago

echelon 2 days ago

These models are nothing short of astounding.

I can write a spec for an entirely new endpoint, and Claude figures out all of the middleware plumbing and the database queries. (The catch: this is in Rust and the SQL is raw, without an ORM. It just gets it. I'm reviewing the code, too, and it's mostly excellent.)

I can ask Claude to add new data to the return payloads - it does it, and it can figure out the cache invalidation.

These models are blowing my mind. It's like I have an army of juniors I can actually trust.

Calavar 2 days ago

I'm not sure I'd call agents an army of juniors. More like a high school summer intern who has infinite time to do deep dives into StackOverflow but doesn't have nearly enough programming experience yet to have developed a "taste" for good code

In my experience, agentic LLMs tend to write code that is very branchy with cyclomatic complexity. They don't follow DRY principles unless you push them very hard in that direction (and even then not always), and sometimes they do things that just fly in the face of common sense. Example of that last part: I was writing some Ruby tests with Opus 4.6 yesterday, and I got dozens of tests that amounted to this:

   x = X.new
   assert x.kind_of?(X)

This is of course an entirely meaningless check. But if you aren't reading the tests and you just run the test job and see hundreds of green check marks and dozens of classes covered, it could give you a false sense of security

hackyhacky 2 days ago

> In my experience, agentic LLMs tend to write code that is very branchy with cyclomatic complexity

You are missing the forest for the trees. Sure, we can find flaws in the current generation of LLMs. But they'll be fixed. We have a tool that can learn to do anything as well as a human, given sufficient input.

JackSlateur 9 hours ago

We have heard that for years

"trust us, it will work soon .. we just need a bit more time and a couple more dozens billions of dollars .. just trust us, bro .."

hackyhacky 7 hours ago

> We have heard that for years

LLMs have been a thing for about three years now, so you can't have been hearing this for very long. In those three years, the rate of progress has been astounding and there is no sign of slowing down.

JackSlateur 6 hours ago

And god knows that we have heard that a lot in just 3 years.

tryauuum 2 days ago

> this is in Rust and the SQL is raw, without an ORM.

where's the catch? SQL is an old technology, surely an LLM is good with it