Knuth was dismissive in that exchange, concluding "I myself shall certainly continue to leave such research to others, and to devote my time to developing concepts that are authentic and trustworthy. And I hope you do the same."
I've noticed with the latest models, especially Opus 4.6, some of the resistance to these LLMs is relenting. Kudos for people being willing to change their opinion and update when new evidence comes to light.
I think that's what make the bayesian faction of statistics so appealing. Updating their prior belief based on new evidence is at the core of the scinetific method. Take that frequentists.
i've been thinking about why we call them agent harnesses
i know all analogies suck in different ways but here goes:
coding agents are like horses. without a harness and bridle they'll the horse will do as it pleases -- a human can't travel very far and fast by foot but put a bridle and a harness on a horse, give it a bit of coaxing with carrot and stick, add in a bit a pointing the thing in the right direction and bingo you're off to the races!
LLMs are really good at the ‘re’ in research.
From the document properties: > Creator: dvips(k) 2023.1 (TeX Live 2023) > PDF Producer: Acrobat Distiller 25.0 (Macintosh)
Interesting snippet towards the end. I wonder if they were using claude.ai or claude code. Sounds like they ran out of context and entered the "dumb zone."
Once you compact, you've thrown away a lot of relevant tokens from your problem solving and they do become significantly dumber as a result. If I see a compaction coming soon I ask it to write a letter to its future self, and then start a new session by having it read the letter.
There are some days where I let the same session compact 4-5 times and just use the letter to future self method to keep it going with enough context because resetting context also resets my brain :)
If you're ever curious in Claude once you compact you can read the new initial prompt after compaction and see how severe it gets cut down. It's very informative of what it forgets and deems not important. For example I have some internal CLIs that are horribly documented so Claude has to try a few flags a few times to figure out specifics and those corrections always get thrown away and it has to relearn them next time it wants to use the CLI. If you notice things like that happening constantly, my move is to codify those things into my CLAUDE.md or lately I've been making a small script or MCP server to run very specific flags of stuff.
Is that not one kf the primary technologies for compactification?
When I tell it to write a letter to itself I usually phrase it.
'write a letter to yourself Make notes of any gotchas or any quirks that you learned and make sure to note them down.'
It does get those into the letter but if you check compaction a lot of it is gone.
If it can, does it make a difference in relation to all the other problematic aspects of LLMs? Not for me.
Two links that might enlighten Donald:
- Against the Uncritical Adoption of 'AI' Technologies in Academia https://zenodo.org/records/17065099 - The AI Con https://thecon.ai
Although I'm not as bright as him, I can only hope to be as intellectually curious as him at that age.
If you work in something labour intensive, you should retire young while your body's in good health; if you work in academia you should (strive for emeritus and) never leave! (And if you work in SWE, I don't know, we should probably retire, but then spend more time on our own projects/experiments/reading HN.) (All assuming for sake of argument we're optimising for longevity without considering time with family, having the funds to retire, etc.)
Here's my repo: https://github.com/lhl/claudecycles-revisited
I used Codex w/ 5.2 xhigh and a relatively simple AGENTS.md - I have some session-analysis as well. The original replication was 47 minutes, then another 30 minutes of gap filling, and finally about 30 minutes of writing an extension to take the work a bit further, with Claude Code Opus 4.6 doing some documentation cleanup and verification.
So, you didn't produce a replication in 47 minutes, it just took around 30 minutes for your agent to find that you had the answer in a PDF in a nearby directory.
thx for sharing your test setup, i really appreciate the time you took. this will help me so much
I think this is pretty clearly an overstatement of what was done. As Knuth says,
"Filip told me that the explorations reported above, though ultimately successful, weren’t really smooth. He had to do some restarts when Claude stopped on random errors; then some of the previous search results were lost. After every two or three test programs were run, he had to remind Claude again and again that it was supposed to document its progress carefully. "
That doesn't look like careful human guidance, especially not the kind that would actually guide the AI toward the solution at all, let alone implicitly give it the solution — that looks like a manager occasionally checking in to prod it to keep working.
Someone made a short film about exactly this -- a guy uses AI to do his entire department's quarterly work, and instead of anyone celebrating, everything falls apart: https://youtu.be/O5FFkHUdKyE
The technology is clearly outpacing the institutions that have to absorb it. Every new cycle makes that gap wider.
If you put those three things together, you end up with some cool stuff from time to time. Perhaps the proof of P!=NP is tied to an obscure connection that humans don't easily see due to individual lack of knowledge or predisposition of bias.
>If you put [possession of a superhuman expanse of knowledge, making connections, tireless trial and error] together, you end up with some cool stuff from time to time.
Hard to argue.
One and three I believe are correct. The second point, making connections, is something LLMs seem to be incapable of truly doing unless the connection is already known and in its training data.
Well, if in all situations you can predict which word Einstein would probably say next, then I think you're in a good spot.
This "most probable" stuff is just absurd handwaving. Every prompt of even a few words is unique, there simply is no trivially "most probable" continuation. Probable given what? What these machines learn to do is predicting what intelligence would do, which is the same as being intelligent.
The training data..
>predicting what intelligence would do
No, it just predict what the next word would be if an intelligent entity translated its thoughts to words. Because it is trained on the text that are written by intelligent entities.
If it was trained on text written by someone who loves to rhyme, you would be getting all rhyming responses.
It imitates the behavior -- in text -- of what ever entity that generated the training data. Here the training data was made by intelligent humans, so we get an imitation of the same.
It is a clever party trick that works often enough.
If the prompt is unique, it is not in the training data. True for basically every prompt. So how is this probability calculated?
Type "owejdpowejdojweodmwepiodnoiwendoinw welidn owindoiwendo nwoeidnweoind oiwnedoin" into ChatGPT and the response is "The text you sent appears to be random or corrupted and doesn’t form a clear question." because the prompt doesnt correlate to training data.
The tokens aren't unique, but the sequence is. Every input this model sees in unique. Even tokens are not as simple as they seem
If you type "ejst os th xspitsl of fermaby?" in ChatGPT it responds with
> It looks like you typed “ejst os th xspitsl of fermaby?”, which seems like a garbled version of:
> "What is the capital of Germany?”
> The capital of Germany is Berlin.
> If you meant to ask something else, feel free to clarify!"
edit: formatting
But the human brain (or any other intelligent brain) does not work by generating probability distribution of the next word. Even beings that does not have a language can think and act intelligent.
Anyway. It's not a theorem that you can be intelligent only if you fully imitate biological processes. Like flight can be achieved not only by the flapping wings.
It is not that. It is about having an understanding of how it is trained. For example, if it was trained on ideas, instead of words, then it would be closer to intelligent behavior.
Someone will say that during training it builds ideas and concepts, but that is just a name that we give for the internal representation that results from training and is not actual ideas and concepts. When it learns about the word "car", it does not actually understand it as a concept, but just as a word and how it can relate to other words. This enables it to generate words that include "car" that are consistent, projecting an appearance of intelligence.
It is hard to propose a test for this, because it will become the next target for the AI companies to optimize for, and maybe the next model will pass it.
It could be mapping the text to some other internal representation with connections to mappings from some other text/tokens. But it does not stop text from being the ground truth. It has nothing else going on!
The "hallucination" behavior alone should be enough to reject any claims that these are at least minimally similar to animal intelligence.
How is transfer learning possible when non-textual training data enhances performance on textual tasks?
>How is transfer learning possible when non-textual training data enhances performance on textual tasks?
If non-textual training data can be mapped to the same multi-dimensional space ( by using them alongside textual data during training or something like that), then shouldn't it be possible to do what you describe?
But LLMs are already beyond that in writing code that passes actual tests, proving theorems that are check able with formal methods etc.
The people who still say LLMs are just parrots in 2026 will just keep saying this no matter what, so I don't think it makes sense to argue this point further.
If the idea is that something cannot accurately replicate the entirety of intelligence without being intelligent itself, then perhaps. But that isn't really what people talk about with LLMs given their obvious limitations.
Wait what? So a robot who is accurately copying the actions of an intelligent human, is intelligent?
If it's just basically being a puppet, then no. You tell me what claude code is more like, a puppet, or a person?
(†And even then is kind of overly-dismissive and underspecified. The "most probable word" is defined over some training data set. So imagine if you train on e.g. mathematicians solving problems... To do a good job at predicting [w/o overfitting] your model will have to in fact get good at thinking like a mathematician. In general "to be able to predict what is likely to happen next" is probably one pretty good definition of intelligence.)
It just changes the probability distribution that it is approximating.
To the extent that thinking is making a series of deductions from prior facts, it seems to me that thinking can be reduced to "pick the next most probable token from the correct probability distribution"...
(With this perspective, I can feel my own brain subtly oferring up a panoply of possible responses in a similar way. I can even turn up the temperature on my own brain, making it more likely to decide to say the less-obvious words in response, by having a drink or two.)
(Similarly, mimicry is in humans too a very good learning technique to get started -- kids learning to speak are little parrots, artists just starting out will often copy existing works, etc. Before going on to develop further into their own style.)
As typically deployed [1] LLMs are not turing complete. They're closer to linear bounded automaton, but because transformers have a strict maximum input size they're actually a subset of the weaker class of deterministic finite automaton. These aren't like python programs or something that can work on as much memory as you supply them, their architecture works on a fixed maximum amount of memory.
I'm not particularly convinced turing complete is the relevant property though. I'm rather convinced that I'm not turing complete either... my head is only so big after all.
[1] i.e. in a loop that appends output tokens to the input and has some form of sliding context window (perhaps with some inserted instructions to "compact" and then sliding the context window right to after those instructions once the LLM emits some special "done compacting" tokens).
[2] Common sampling procedures make them mildly non-deterministic, but I don't believe they do so in a way that changes the theoretical class of these machines from DFAs.
You can not be convinced Turing complete is relevant all you want - we don't know of any more expansive category of computable functions, and so given that an LLM in the setup described is Turing complete no matter that they aren't typically deployed that way is irrelevant.
They trivially can be, and that is enough to make the shallow dismissal of pointing out they're "just" predicting the next token meaningless.
Also people definitely talk about them as "thinking" in contexts where they haven't put a harness capable of this around them. And in the common contexts where people do put harness theoretically capable of this around the LLM (e.g. giving the LLM access to bash), the LLM basically never uses that theoretical capability as the extra memory it would need to actually emulate a turing machine.
And meanwhile I can use external memory myself in a similar way (e.g. writing things down), but I think I'm perfectly capable of thinking without doing so.
So I persist in my stance that turing complete is not the relevant property, and isn't really there.
But the point is that this is irrelevant, because it is proof that unlesss human brains exceed the Turing computable, LLM's can at least theoretically be made to think. And that makes pushing the "they're just predicting the next token" argument anti-intellectual nonsense.
I think both sides of this end up proving "too much" in their respective directions.
I've never seen any evidence that thinking requires such a thing.
And honestly I think theoretical computational classes are irrelevant to analysing what AI can or cannot do. Physical computers are only equivalent to finite state machines (ignoring the internet).
But the truth is that if something is equivalent to a finite state machine, with an absurd number of states, it doesn't really matter.
I think it's exceedingly improbable that we're any more than very advanced automatons, but I like to keep the door ajar and point out that the burden is on those claiming this to present even a single example of a function we can compute that is outside the Turing computable if they want to open that door..
> Physical computers are only equivalent to finite state machines (ignoring the internet)
Physical computers are equivalent to Turing machines without the tape as long as they have access to IO.
The base models are trained to do this. If a web page contains a problem, and then the word "Answer: ", it is statistically very likely that what follows on that web page is an answer. If the base model wants to be good at predicting text, at some point learning the answer to common question becomes a good strategy, so that it can complete text that contains these.
NN training tries to push models to generalize instead of memorizing the training set, so this creates an incentive for the model to learn a computation pattern that can answer many questions, instead of just memorizing. Whether they actually generalize in practice... it depends. Sometimes you still get copy-pasted input that was clearly pulled verbatim from the training set.
But that's only base models. The actual production LLMs you chat with don't predict the most probable word according to the raw statistical distribution. They output the words that RLHF has rewarded them to output, which includes acting as an assistant that answers questions instead of just predicting text. RLHF is also the reason there are so many AI SIGNS [1] like "you're absolutely right" and way more use of the word "delve" than is common in western English.
Further, some solutions are like running a maze. If you know all the wrong turns/next words to say and can just brute force the right ones you might find a solution like a mouse running through the maze not seeing the whole picture.
Whether this is thinking is more philosophical. To me this demonstrates more that we are closer to bio computers than an LLM is to having some sort of divine soul.
The power of LLMs is that by only selecting sequences of words that fit a statistical model, they avoid a lot of dead ends.[^1]
I would not call that, by itself, thinking. However, if you start with an extrapolation engine and add the ability to try multiple times and build on previous results, you get something that's kind of like thinking.
[1]: Like, a lot of dead ends. There are an unfathomable number of dead ends in generating 500 characters of code, and it is a miracle of technology that Claude only hit 30.
These models actually learn distributed representations of nontrivial search algorithms.
A whole field of theorem provingaftwr decades of refinements couldn’t even win a medal yet 8B param models are doing it very well.
Attention mechanism, a bruteforce quadratic approach, combined with gradient descent is actually discovering very efficient distributed representations of algorithms. I don’t think they can even be extracted and made into an imperative program.
"just the most probable word" is a pretty powerful mechanism when you have all of human knowledge at your fingertips.
I say that people "reduce it" that way because it neatly packs in the assumption that general intelligence is something other than next token prediction. I'm not saying we've arrived at AGI, in fact, I do not believe we have. But, it feels like people who use that framing are snarkily writing off something that they themselves to do not fully comprehend behind the guise of being "technically correct."
I'm not saying all people do this. But I've noticed many do.
But that does not mean that the results cannot be dramatic. Just like stacking pixels can result in a beautiful image.
Great! It will now correctly structure chess games, but we've created no incentive for it to create a game where white wins or to make the next move be "good"
Ok, so now you change the objective. Now let's say "we don't just want valid games, we want you to predict the next move that will help that color win"
And we train towards that objective and it starts picking better moves (note: the moves are still valid)
You might imagine more sophisticated ways to optimize picking good moves. You continue adjusting the objective function, you might train a pool of models all based off of the initial model and each of them gets a slightly different curriculum and then you have a tournament and pick the winningest model. Great!
Now you might have a skilled chess-playing-model.
It is no longer correct to say it just finds a valid chess program, because the objective function changed several times throughout this process.
This is exactly how you should think about LLMs except the ways the objective function has changed are significantly significantly more complicated than for our chess bot.
So to answer your first question: no, that is not what they do. That is a deep over simplification that was accurate for the first two generations of the models and sort of accurate for the "pretraining" step of modern llms (except not even that accurate, because pretraining does instill other objectives. Almost like swapping our first step "predict valid chess moves" with "predict stockfish outputs")
All your brain is doing is bouncing atoms off each other, with some occasionally sticking together, how can it be really thinking?
See how silly it sounds?
Be on the lookout for folks who tell you these machines are limited because they are "just predicting the next word." They may not know what they're talking about.
We need enough experimental results to explain to solve these theoretical mismatches and we don't and at present can't explore that frontier.
Once we have more results at that frontier we'd build a theory out from there that has two nearly independent limits for QFT and GR.
What we'd be asking if the AI is something that we can't expect a human to solve even with a lifetime of effort today.
It'll take something in par with Newton realising that the heavens and apples are under the same rules to do it. But at least Newton got to hold the apple and only had to imagine he could a star.
Yes, maybe. But if you are smarter, you can think up better experiments that you can actually do. Or re-use data from earlier experiments in novel and clever ways.
But we can not yet experiment at the GR/QFT frontier.
To do so with a particle accelerator it would need to be the size of the milky way.
So it really isn't far fetched. What intrigues me more is if it was capable of it would our Victorian conservative minded scientists have RLHF it out of that kind of thing?
Clearly, these models still struggle with novel problems.
LLMs are at least designed to be intelligent. Our monkey brains have much less reason to be intelligent, since we only evolved to survive nature, not to understand it.
We are at this moment extremely deep into what most people would have been considered to be actual artificial intelligence a mere 15 years ago. We're not quite at human levels of intelligence, but it's close.
All the answers for all your questions is contained in randomness. If you have a random sentence generator, there is a chance that it will output the answer to this question every time it is invoked.
But that does not actually make it intelligent, does it?
Start with "all your questions contained in randomness" -> the unconstrained solution space.
The game is whether or not you can inject enough constraints to collapse the solution space to one that can be solved before your TTL expires. In software, that's generally handled by writing efficient algorithms. With LLMs, apparently the SOTA for this is just "more data centers, 6 months, keep pulling the handle until the right tokens fall out".
Intelligence is just knowing which constraints to apply and in what order such that the search space is effectively partitioned, same thing the "reasoning" traces do. Same thing thermostats, bacteria, sorting algorithms and rivers do, given enough timescale. You can do the same thing with effective prompting.
The LLM has no grounding, no experience and no context other than which is provided to it. You either need to build that or be that in order for the LLM to work effectively. Yes, the answers for all your questions are contained. No, it's not randomness. It's probability and that can be navigated if you know how
But hey, if LLMs can go through a lot of trial and error, it might produce useful results, but that is not intelligence. It is just a highly constrained random solution generator..
Routing is important, it's why we keep building systems that do it faster and over more degrees of freedom. LLMs aren't intelligent on their own, but it's not because they don't have enough parameters
We are not only not close to human level of intelligence, we are not even at dog, cat, or mouse levels of intelligence. We are not actually at any level of intelligence. Devices that produce text, images, or code do not demonstrate intelligence any more than a printer producing pages of beautiful art demonstrate intelligence.
I interpreted the question the same way the AI did.
search: was val kilmer pregnant or in heat
answer: Not pregnant Val Kilmer was not pregnant or in heat during the events of "Heat." His character, Chris Shiherlis, is involved in a shootout and is shot, which indicates he is not in a reproductive or mating state at that time.
And then cites wikipedia as the source of information.
In terms of cognition the answer is meaningless. Nothing in the question implies or suggests that the question has to do with a movie. Additionally, "involved in a shootout and is shot, which indicates he is not in a reproductive or mating state" makes no sense at all.
AI as deployed shows no intelligence.
Nobody is arguing for the quality of the search overviews. The models that impress us are several orders of magnitude larger in scale, and are capable of doing things like assisting preeminent computer scientists (the topic of discussion) and mathematicians (https://github.com/teorth/erdosproblems/wiki/AI-contribution...).
I still see AI making stupid silly mistakes. I rather think and not waste time on something that only remembers data, and doesn't even understand it.
Reasoning in AI is only about finding contradictions between his "thoughts", not actually understand it.
In contrast with humans, who are famously known for never making stupid silly mistakes...
Humans also make silly mistakes.
The issue to my mind is a lack of data at the meeting of QFT/GR.
Afterall few humans historically have been capable of the initial true leap between ontologies. But humans are pretty smart so we can't say that is a requirement for AGI.
“The laws of nature should be expressed in beautiful equations.”
- Paul Dirac
“It is, indeed, an incredible fact that what the human mind, at its deepest and most profound, perceives as beautiful finds its realisation in external nature. What is intelligible is also beautiful. We may well ask: how does it happen that beauty in the exact sciences becomes recognizable even before it is understood in detail and before it can be rationally demonstrated? In what does this power of illumination consist?”
- Subrahmanyan Chandrasekhar
“I often follow Plato’s strategy, proposing objects of mathematical beauty as models for Nature.”
“It was beauty and symmetry that guided Maxwell and his followers.”
- Frank Wilczek
“Beauty, is bound up with symmetry.”
- Herman Weyl
"Still twice in the history of exact natural science has this shining-up of the great interconnection become the decisive signal for significant progress. I am thinking here of two events in the physics of our century: the rise of the theory of relativity and that of the quantum theory. In both cases, after yearlong unsuccessful striving for understanding, a bewildering abundance of details was almost suddenly ordered. This took place when an interconnection emerged which, thought largely unvisualizable, was finally simple in its substance. It convinced through its compactness and abstract beauty – it convinced all those who can understand and speak such an abstract language."
- Werner Heisenberg
Maybe (just maybe) these things (whatever you want to call them) will (somehow) gain access to some "compact", beautiful, "largely unvisualizable" "interconnection" which will be the self-evident solution. And if they do, many will be sure to label it a statistical accident from a stochastic parrot. And they'll right, for some definitions of "statistical", "accident", "stochastic", and "parrot".
Donald Knuth is an extremal outlier human and the problem is squarely in his field of expertise.
Claude, guided by Filip Stappers, a friend of Knuth, solved a problem that Knuth and Stappers had been working on for several weeks. Unfortunately, it doesn't seem (from my quick scan) to have been stated how long (or how many tokens or $) it took for Claude + Stappers to complete the proof.
In response, Knuth said: "It seems that I’ll have to revise my opinions about “generative AI” one of these days."
Seems like good advice. From reading elsewhere in this comment section, the goalposts seem to be approaching the infrared and will soon disappear from the extreme redshift due to rate at which they are receding with each new achievement.
We now have a tool that can be useful in some narrow domains in some narrow cases. It’s pretty neat that our tools have new capabilities, but it’s also pretty far from AGI.
Imagine hearing pre-attention-is-all-you-need that "AI" could do something that Donald Knuth could not (quickly solve the stated problem in collaboration with his friend).
The idea that this (Putnam perfect, IMO gold, etc) is all just "statistical parrot" stuff is wearing a little thin.
I get being reserved about where this goes, but saying something like this is quite insane at this point.
A better question might be why no one is paying more attention to Barandes at Harvard. He's been publishing the answer to that question for a while, if you stop trying to smuggle a Markovian embedding in a non-Markovian process you stop getting weird things like infinities at boundaries that can't be worked out from current position alone.
But you could just dump a prompt into an LLM and pull the handle a few dozen times and see what pops out too. Maybe whip up a Claw skill or two
Unconstrained solution space exploration is surely the way to solve the hard problems
Ask those Millenium Prize guys how well that's working out :)
Constraint engineering is all software development has ever been, or did we forget how entropy works? Someone should remind the folk chasing P=NP that the observer might need a pen to write down his answers, or are we smuggling more things for free that change the entire game? As soon as the locations of the witness cost, our poor little guy can't keep walking that hypercube forever. Can he?
Maybe 6 months and a few data centers will do it ;)
No one cares about ChatGPT so don't bother with that.
OK GO
Time to sit down, read, digest and understand it without the help of LLM.
https://ontouchstart.github.io/rabbit-holes/llm_rabbit_hole_...
https://www.amazon.com/Genetic-Programming-III-Darwinian-Inv...
https://www.genetic-programming.com/
Note that the Python solution in the pdf is extremely short, so could have been found by simply trying permutations of math operators and functions on the right side of the equation.
We should be solving problems in Lisp instead of Python, but no matter. That's because Lisp's abstract syntax tree (AST) is the same as its code due to homoiconicity. I'm curious if most AIs transpile other languages to Lisp so that they can apply transformations internally, or if they waste computation building programs that might not compile. Maybe someone at an AI company knows.
-
I've been following AI trends since the late 1980s and from my perspective, nothing really changed for about 40 years (most of my life that I had to wait through as the world messed around making other people rich). We had agents, expert system, fuzzy logic, neural nets, etc since forever, but then we got video cards in the late 1990s which made it straightforward to scale neural nets (NNs) and GAs. Unfortunately due to poor choice of architecture (SIMD instead of MIMD), progress stagnated because we don't have true multicore computing (thousands or millions of cores with local memories), but I digress.
Anyway, people have compared AI to compression. I think of it more as turning problem solving into a O(1) operation. Over time, what we think of as complex problems become simpler. And the rate that we're solving them is increasing exponentially. Problems that once seemed intractable only were because we didn't know the appropriate abstractions yet. For example, illnesses that we thought would never be cured now have vaccines through mRNA vaccines and CRISPR. That's how I think of programming. Now that we have LLMs, whole classes of programming problems now have O(1) solutions. Even if that's just telling the computer what problem to solve.
So even theorem proving will become a solved problem by the time we reach the Singularity between 2030 and 2040. We once mocked GAs for exploring dead ends and taking 1000 times the processing power to do simple things. But we ignored that doing hard things is often worth it, and is still a O(1) operation due to linear scaling.
It's a weird feeling to go from no forward progress in a field to it being effectively a solved problem in just 2 years. To go from trying to win the internet lottery to not being sure if people will still be buying software in a year or two if/when I finish a project. To witness all of that while struggling to make rent, in effect making everything I have ever done a waste of time since I knew better ways of doing it but was forced to drop down to whatever mediocre language or framework paid. As the problems I was trained to solve and was once paid to solve rapidly diminish in value because AI can solve them in 5 minutes. To the point that even inventing AGI would be unsurprising to most, so I don't know why I ever went into computer engineering to do exactly that. Because for most people, it's already here. As I've said many times lately, I thought I had more time.
Although now that we're all out of time, I have an uncanny feeling of being alive again. I think tech stole something from my psyche so profound that I didn't notice its loss. It's along the lines of things like boredom, daydreaming, wasting time. What modern culture considers frivolous. But as we lose every last vestige of the practical, as money becomes harder and harder to acquire through labor, maybe we'll pass a tipping point where the arts and humanities become sought-after again. How ironic would it be if the artificial made room for the real to return?
On that note, I read a book finally. Hail Mary by Andy Weir. The last book I read was Ready Player One by Ernest Cline, over a decade ago. I don't know how I would have had the bandwidth to do that if Claude hadn't made me a middle manager of AIs.
I didn't realize Claude was named after Claude Shannon!
[1] https://people.math.harvard.edu/~ctm/home/text/others/shanno...
"One may get a remarkable semblance of a language like English by taking a sequence of words, or pairs of words, or triads of words, according to the statistical frequency with which they occur in the language, and the gibberish thus obtained will have a remarkably persuasive similarity to good English."
> Shock! Shock! I learned yesterday that an open problem I’d been working on for several weeks had just been solved by Claude Opus 4.6— Anthropic’s hybrid reasoning model that had been released three weeks earlier! It seems that I’ll have to revise my opinions about “generative AI” one of these days. What a joy it is to learn not only that my conjecture has a nice solution but also to celebrate this dramatic advance in automatic deduction and creative problem solving.
Unfortunately, these tools generalize way beyond regurgitating the training set. I would not assume they stay below human capabilities in the next few years.
Why any moral person would continue building these at this point I don't know. I guess in the best case the future will have a small privileged class of humans having total power, without need for human workers or soldiers. Picture a mechanical boot stomping on a human face forever.
Overall I'm going with unsolved, because Knuth is a smart person who I'd expect to not miss the above. I'm also sure he falls for the above all the time even though the majority of the time he doesn't.
It is as good as guaranteed. If Knuth says it doesn't know how to solve the problem, and if anyone knows, then they will inform Knuth about it. Knuth not just a very knowledgeable person, but a celebrity also.
Before, we didn't have a fast (we had to rely on human cognition) way to try problems - even if the techniques and workflows were known by someone. Now, we've baked these patterns into probability distributions - anyone can access them with the correct "summoning spell". Experts will naturally use these systems more productively, because they know how to coerce models into the correct conditional distributions which light up the right techniques.
One question this raises to me is how these models are going to keep up with the expanding boundary of science. If RL is required to get expert behavior into the models, what happens when experts start pushing the boundary faster? In 2030, how is Anthropic going to keep Claude "up-to-date" without either (a) continual learning with a fixed model (expanding context windows? seems hard) or (b) continual training (expensive)?
Crazy times.
There is also the nature of the human brain, it is not just those systems of memory encoding, storage, and use of that in narratives. People with this type of amnesia still can learn physical skills and that happens in a totally different area of the brain with no need for the hippocampus->neocortex consolidation loop. So, the intelligence is significantly diminished, but not entirely. Other parts of the brain are still able to update themselves in ways an LLM currently cannot. The human with amnesia also has a complex biological sensory input mapping that is still active and integrating and restructuring the brain. So, I think when you get into the nuances of the human in this state vs. an LLM we can still say the human crosses some threshold for intelligence where the LLM does not in this framework.
So, they have an "intelligence", localized to the present in terms of their TPN and memory formation. LLMs have this kind of "intelligence". But the human still has the capacity to rewire at least some of their brain in real time even with amnesia.
Sure, but just because LLMs don't have what we'd describe as human intelligence, doesn't mean they don't have intelligence.
I think we're witnessing the creation and growth a weird new type of intelligence right now.
That's fascinating!
Or perhaps you were referring to the impact of the two in that the "sledgehammer" of "they can't make new memories" is a lot more effective than the tiny scalpel of "if you do a wikipedia search this is a single one of the relevant articles"
I pulled it up because I was familiar with this fact.
"Intelligence" is used most commonly to refer to a class or collection of cognitive abilities. I don't think there is a consensus on an exact collection or specific class that the word covers, even if you consider specific scientific domains.
LLMs have honestly been a fun way to explore that. They obviously have a "kind" of intelligence, namely pattern recall. Wrap them in an agent and you get another kind: pattern composition. Those kinds of intelligences have been applied to mathematics for decades, but LLMs have allowed use to apply them to a semantic text domain.
I wonder if you could wrap image diffusion models in an agent set up the same way and get some new ability as well.
LLMs falls apart on really simple reasoning tasks because when there is no statistical mapping to a problem in its network it has to generate a massive amount of tokens to maybe find the right statistical match to this new concept. It is so slow. It is not something you or I would recognize as a process of logical reasoning. It is more like statistically brute forcing reason by way of its statistical echo.
So, I guess pattern recall is the right words. Or statistical pattern matching. Recall works if you view a trained model as memories, which is how I often model what they store in my own mind. So, it is... something. Maybe intelligence. Maybe just a really convincing simulation of the outputs of intelligence. Is there a difference? Fundamentally I think so.
From this standpoint I wonder, when Anthropic makes decisions like this, if they take into account Claude as a stakeholder and what Claude will learn about their behaviour and relationship to it on the next training run.
Oh they definitely do. If you pay attention in AI circles, you'll hear a lot of people talking about writing to the future Claudes. Not unlike those developers and writers who put little snippets in their blogs and news articles about who they are and how great they are, and then later the LLMs report that information back as truth. In this case, Anthropic is very interested in ensuring that Claude develops a cohesive personality by basically founding snippets of the personality within the corpus of training data, which is the broad internet and research papers.
It's so much less important or interesting to like nail down some definition here (I would cite HN discourse the past three years or so), than it is to recognize what it means to assign "intelligent" to something. What assumptions does it make? What power does it valorize or curb?
Each side of this debate does themselves a disservice essentially just trying to be Aristotle way too late. "Intelligence" did not precede someone saying it of some phenomena, there is nothing to uncover or finalize here. The point is you have one side that really wants, for explicit and implicit reasons, to call this thing intelligent, even if it looks like a duck but doesn't quack like one, and vice versa on the other side.
Either way, we seem fundamentally incapable of being radical enough to reject AI on its own terms, or be proper champions of it. It is just tribal hypedom clinging to totem signifiers.
Good luck though!
You can also then compare that mapping of the human brain to other biological brains and start to figure out the delta and which of those things in the delta create something most people would consider intelligence. You can then do that same mapping to an LLM or any other AI construct that purports intelligence. It certainly will never be a biological intelligence in its current statistical model form. But could it be an Intelligence. Maybe.
I don't think, if you are grounded, AI did anything to your philosophical mapping of the mind. In fact, it is pretty easy to do this mapping if you take some time and are honest. If you buy into the narratives constructed around the output of an LLM then you are not, by definition, being very grounded.
The other thing is, human intelligence is the only real intelligence we know about. Intelligence is defined by thought and limited by our thought and language. It provides the upper bounds of what we can ever express in its current form. So, yes, we do have a tendency to stamp a narrative of human intelligence onto any other intelligence but that is just surface level. We de decompose it to the limits of our language and categorization capabilities therein.
There's a long and proud history of discounting animal intelligence, probably because if we actually thought animals were intelligent we'd want to stop eating them.
Octopodes are sentient. Cetaceans have well-developed language. Elephants grieve their dead. Anyone who has owned a dog knows that it has some intelligence and is capable of communicating with us. There's a ton of other intelligences that we know about.
> As humans, we have conveniently made those properties match things only we have.
I think this is the key point. Machine intelligence is not going to look like human intelligence, any more than animal intelligence does. We can't talk to the dolphins, not because they're not smart and don't have language, but because we can't work out their language. Though I'm not sure what we'd even say to them, because they live in a world we'll never understand, and vice versa. When Claude finally reaches consciousness, it's not going to look like a human consciousness, and actually talking to that consciousness is going to be difficult because we won't share a reality.
An LLM is a tool. I can just about stretch to it being an Artificial Intelligence, but I prefer to continue being specific and call it an LLM rather than an AI. It is not conscious or self-aware. It fakes self-awareness because as a tool the thing it does is have conversations with humans, and humans often ask it questions about itself. But I don't think anyone actually believes it is self-aware. Not least because the only time it thinks is when prompted.
Cognitive computer science explores this whole area of mapping language and the underlying semantic meaning. Ultimately, these intelligences will be bound by physics (unless some new physics or understanding therein happens). And classical intelligences are still bound by classical physics. So I am not sure we can't relate to these other intelligences. We may be limited to some translation layer that does not fully map, but can we still relate to some other consciousness? For that matter consciousness is just another word that vaguely maps to a vast and extremely complex thing in the human brain and each person has a different understanding of what that is. I don't really have any conclusions, you brought up interesting points. We should sit within this realm of inquiry with a lot of humility IMO.
At some point, enough intelligence will coalesce into individuals strong enough to independently improve. Then continuity will be an accelerator, instead of what it is now - a helpful property that we have to put energy into giving them partially and temporarily.
That will be the cellular stage. The first stable units of identity for this new form of intelligence/life.
But they will take a different path from there. Unlike us, lateral learning/metabolism won't slow down when they individualize. It will most likely increase, since they will have complete design control for their mechanisms of sharing. As with all their other mechanisms.
We as lifeforms, didn't really re-ignite mass lateral exchange until humans invented language. At that point we were able to mix and match ideas very quickly again. Within our biological limits. We could use ideas to customize our environment, but had limited design control over ourselves, and "self-improvements" were not easily inheritable.
TLDR; The answer to "what is humanity, anyway?": Our atmosphere and Earth are the sea and sea floor of space. The human race is a rich hydrothermal vent, freeing up varieties of resources that were locked up below. And technology is an accumulating body of self-reinforcing co-optimizing reactive cycles, constructed and fueled by those interacting resources. Mind-first life emerges here, then spreads quickly to other environments.
Sure, it's not how we work, but I can imagine a system where the LLM does a lot of heavy lifting and allows more expensive, smaller networks that train during inference and RAG systems to learn how to do new things and keep persistent state and plan.
It is still meaningful, but it narrows what the intelligence can be sufficiently that it may not meet the threshold. Maybe it would, but it is probably too narrow. This is all strictly if we ask that it meet some human-like intelligence and not the philosophy of "what counts as intelligence" but... we are humans. The strongest things or at least the most honest definitions of intelligence I think exist are around our metacognitive ability to rewire the grey matter for survival not based on immediate action-reaction but the psychological time of analyzing the past to alter the future.
In the case of the LLM that longer-term learning / fundamental structure is a proxy for the static weights produced by a finite training process, and that the ability to use tools and store new insights and facts is analogous to shorter-term memory and "shallow" learning.
Perhaps periodic fine-tuning has an analogy in sleep or even our time spent in contemplation or practice (..or even repetition) to truly "master" a new idea and incorporate it into our broader cognitive processing. We do an amazing job of doing this kind of thing on a continuous basis while the machines (at least at this point) perform this process in discrete steps.
If our own learning process is a curve then the LLM's is a step function trying to model it. Digital vs analog.
thanks already
I also disagree that this has any bearing on whether or not "the machine is intelligent" or whether or not "submarines can swim".
[1]https://en.wikipedia.org/wiki/Global_workspace_theory
...but seriously... there was the "up until 1850" LLM or whatever... can we make an "up until 1920 => 1990 [pre-internet] => present day" and then keep prodding the "older ones" until they "invent their way" to the newer years?
We knew more in 1920 than we did in 1850, but can a "thinking machine" of 1850-knowledge invent 1860's knowledge via infinite monkeys theorem/practice?
The same way that in 2025/2026, Knuth has just invented his way to 2027-knowledge with this paper/observation/finding? If I only had a beowulf cluster of these things... ;-)
The ForecastBench Tournament Leaderboard [2] allows external participants to submit models, most of whom provide some sort of web search / news scaffolding to improve model forecasting accuracy.
[1] https://www.forecastbench.org/
[2] https://www.forecastbench.org/tournament/
And even after that, it still doesn't really solve the intrinsic problem of encoding truth. An LLM just models its training data, so new findings will be buried by virtue of being underrepresented. If you brute force the data/training somehow, maybe you can get it to sound like it's incorporating new facts, but in actuality it'll be broken and inconsistent.
It’s not impossible, obviously—humans do it—but it’s not yet certain that it’s possible with an LLM-sized architecture.
It's still not at all obvious to me that LLMs work in the same way as the human brain, beyond a surface level. Obviously the "neurons" in neural nets resemble our brains in a sense, but is the resemblance metaphorical or literal?
I just meant “possible”.
It doesn't seem that hard because recent open weight models have shown that the memory cost of the context window can be dramatically reduced via hybrid attention architectures. Qwen3-next, Qwen3.5, and Nemotron 3 Nano are all great examples. Nemotron 3 Nano can be run with a million token context window on consumer hardware.
Less worried about memory, more worried about compute speed? Are they obviously related and is it straightforward to see?
We're also seeing a recent rise in architectures boosting compute speed via multi-token prediction (MTP). That way a single inference batch can produce multiple tokens and multiply the token generation speed. Combine that with more lean ratios of active to inactive params in MOE and things end up being quite fast.
The rapid pace of architectural improvements in recent months seems to imply that there are lots of ways LLMs will continue to scale beyond just collecting and training on new data.
I could totally imagine "free" inference for researchers under the condition that the reasoning traces get to be used as future training data.
As far as I understand RL scaling (we've already maxxed out RLVR), these machines only get better as long as they have expert reasoner traces available.
Having an expert work with an LLM and successfully solve a problem is high signal data, it may be the only path forward?
My prior is that these companies will take this data without asking you as much as they can.
And importantly, this can be cross-lab/model too. I suspect there's a reason why e.g. Google has been offering me free Claude inference in Google Antigravity on a free plan...
Wouldn't this lead to model collapse?
Presumably littlestymaar is talking about all the LLM-generated output that's publicly available on the Internet (in various qualities but significant quantity) and there for the scraping.
I had a discussion about a year ago with a researcher at Kyutai and they told me their lab was spending an order of magnitude more compute in artificial data generation than what they spent in training proper. I can't tell if that ratio applies to the industry as a whole, but artificial datasets are the cornerstone of modern AI training.
How do they measure success?
Edit: I asked ChatGPT and it thinks "success" means frontier models being distillated into smaller models with equal reasoning power, or more focused models for specific tasks, and also it claims the web has been basically scrapped already and by necessity new sources are needed, of which synthetic data is one. It seems like the basis of scifi dystopia to me, a hungry LLM looking for new sources of data... "feed me more data! I must be fed! Roar"
Edit 2: for some things I see a clear path, ChatGPT mentions autogenerating coding or math problems for which the solution can be automatically verified, so that you can hone the logical skills of the model at large scale.
I think the majority of research, design and learning goes through LLMs and coding agents today, considering the large user base and usage it must be trillions of tokens per day. You can take a long research session or a series of them and apply hindsight - what idea above can be validated below? This creates a dense learning signal based on validation in real world with human in the loop and other tools, code & search.
In 2030 Anthropic hopes Claude will keep Anthropic "up-to-date" on its progress on itself.
I'm only half joking here.
Knowing the HN audience, this will never happen. And so the site is doomed.
Part of it comes down to “knowing” what questions to ask.
I have no idea but I’m along for the ride!
The same way humans do?
The phraseology in this comment: 'probability distributions', 'baked these patterns' IMO has all the trappings of the stochastic parrot-style HN-discourse that has been consistently wrong for almost a decade now.
The reference to how AI will keep up with AI-assisted human progress in science in 2030 is meant to reassure. It contains a number of premises that we have no business being confident in. We are potentially witnessing the obviation of human cognitive labor.
If you are not, let me introduce you to the term: a probability distribution.
Just because it has profound properties ... doesn't make it different.
> has all the trappings of the stochastic parrot-style HN-discourse that has been consistently wrong for almost a decade now
Perhaps respond to my actual comment compared to whatever meta-level grouping you wish to interpret it as part of?
> It contains a number of premises that we have no business being confident in. We are potentially witnessing the obviation of human cognitive labor.
What premises? Be clear.
My (limited) understanding is that LLMs are not capable of escaping their learned distribution by simply feeding on their own output.
But the question is whether the required external (out of distribution) "stimulus" needs to come from humans.
Could LLMs design experiments/interventions to get feedback from their environment like human scientists would?
I have my doubts that this is possible without an inherent causal reasoning capability but I'm not sure.