Hacker News

755 points by fs123 2 days ago | 319 comments

mccoyb 2 days ago

It's fascinating to think about the space of problems which are amenable to RL scaling of these probability distributions.

Before, we didn't have a fast (we had to rely on human cognition) way to try problems - even if the techniques and workflows were known by someone. Now, we've baked these patterns into probability distributions - anyone can access them with the correct "summoning spell". Experts will naturally use these systems more productively, because they know how to coerce models into the correct conditional distributions which light up the right techniques.

One question this raises to me is how these models are going to keep up with the expanding boundary of science. If RL is required to get expert behavior into the models, what happens when experts start pushing the boundary faster? In 2030, how is Anthropic going to keep Claude "up-to-date" without either (a) continual learning with a fixed model (expanding context windows? seems hard) or (b) continual training (expensive)?

Crazy times.

Aerroon 2 days ago

A bit related: open weights models are basically time capsules. These models have a knowledge cut off point and essentially forever live in that time.

bitexploder 2 days ago

This is the most fundamental argument that they are not, directly, an intelligence. They are not ever storing new information on a meaningful timescale. However, if you viewed them on some really large macro time scale where now LLMs are injecting information into the universe and the re-ingesting that maybe in some very philosophical way they are a /very/ slow oscillating intelligence right now. And as we narrow that gap (maybe with a totally new non-LLM paradigm) perhaps that is ultimately what gen AI becomes. Or some new insight that lets the models update themselves in some fundamental way without the insanely expensive training costs they have now.

dtj1123 2 days ago

Would you consider someone with anterograde amnesia not to be intelligent?

bitexploder 2 days ago

That is a good area to explore. Their map of the past is fixed. They are frozen at some point in their psychological time. What has stopped working? Their hippocampus and medial temporal lobe. These are like the write-head that move data from the hippocampus to the neo cortex. Their "I" can no longer update itself. Their DMN is frozen in time. So if intelligence is purely the "I" telling a continuous coherent story about itself. The difference is that although they are fixed in time which is a characteristic shared by a specific LLM model. They can still completely activate their task positive network for problem solving and if their previous information stored is adequate to solve the problem they can. You could argue that is pretty similar to an LLM and what it does. So it is certainly a signifiant component of intelligence.

There is also the nature of the human brain, it is not just those systems of memory encoding, storage, and use of that in narratives. People with this type of amnesia still can learn physical skills and that happens in a totally different area of the brain with no need for the hippocampus->neocortex consolidation loop. So, the intelligence is significantly diminished, but not entirely. Other parts of the brain are still able to update themselves in ways an LLM currently cannot. The human with amnesia also has a complex biological sensory input mapping that is still active and integrating and restructuring the brain. So, I think when you get into the nuances of the human in this state vs. an LLM we can still say the human crosses some threshold for intelligence where the LLM does not in this framework.

So, they have an "intelligence", localized to the present in terms of their TPN and memory formation. LLMs have this kind of "intelligence". But the human still has the capacity to rewire at least some of their brain in real time even with amnesia.

supern0va 5 hours ago

>But the human still has the capacity to rewire at least some of their brain in real time even with amnesia.

Sure, but just because LLMs don't have what we'd describe as human intelligence, doesn't mean they don't have intelligence.

I think we're witnessing the creation and growth a weird new type of intelligence right now.

morleytj 2 days ago

A very good point. For anyone not familiar with anterograde amnesia, the classical case is patient H.M. (https://en.wikipedia.org/wiki/Henry_Molaison), whose condition was researched by Brenda Milner.

jaapz 13 hours ago

> Near the end of his life, Molaison regularly filled in crossword puzzles.[16] He was able to fill in answers to clues that referred to pre-1953 knowledge. As for post-1953 information, he was able to modify old memories with new informations. For instance, he could add a memory about Jonas Salk by modifying his memory of polio.[2]

That's fascinating!

morleytj 7 hours ago

The nature of memory is so cool, the idea that there are completely different systems governing the creation of wholesale "new" memories and the modification of existing concepts is fascinating to me because those things really do "feel" different in a qualitative sense, but having evidence that you're physically doing something different in those cases is really cool.

wang_li 2 days ago

Or you could have just said "they can't form new memories."

dtj1123 2 days ago

I actually wasn't aware of this story. The steady stream of unexpected and enriching information like this is exactly why I love hackernews.

pdntspa 22 hours ago

Sure, if you want to speak with the precision of a sledgehammer instead of a scalpel

saturnite 20 hours ago

All that needed to be conveyed was that there are humans who cannot create new memories. That is enough to pose the philosophical question about these models having intelligence. Anything more is just adding an anecdote that isn't necessary.

jaapz 13 hours ago

I'm really happy they added the extra information about this specific case, as I did not previously knew it existed and it is a fascinating read

morleytj 8 hours ago

Why would adding more information and context be unnecessary? And why is that bad?

goodmythical 22 hours ago

lol, as if pointing at a wikipedia article (without any relevant discussion of the contents therein) is some kind of conversational excellence.

Or perhaps you were referring to the impact of the two in that the "sledgehammer" of "they can't make new memories" is a lot more effective than the tiny scalpel of "if you do a wikipedia search this is a single one of the relevant articles"

morleytj 8 hours ago

The extra information is that he is the canonical case which defined our clinical understanding of the condition. Not just a "single relevant article."

I pulled it up because I was familiar with this fact.

morleytj 2 days ago

I thought maybe people would be curious to read about how we came to understand the condition and the history behind it, as well as any associated information. Forgive me for such a deep transgression as this assumption.

bitexploder 2 days ago

That is a descriptive surface level reduction. Now do the work to define what that actually means for the intelligence.

BobbyJo 22 hours ago

Nobody else in the thread is making an argument that relies on the distinction.

"Intelligence" is used most commonly to refer to a class or collection of cognitive abilities. I don't think there is a consensus on an exact collection or specific class that the word covers, even if you consider specific scientific domains.

LLMs have honestly been a fun way to explore that. They obviously have a "kind" of intelligence, namely pattern recall. Wrap them in an agent and you get another kind: pattern composition. Those kinds of intelligences have been applied to mathematics for decades, but LLMs have allowed use to apply them to a semantic text domain.

I wonder if you could wrap image diffusion models in an agent set up the same way and get some new ability as well.

bitexploder 7 hours ago

The problem I see regarding LLMs is they are the extreme edge of what humans have created. They are trained on the outputs of intelligence and thought and its representation in language is this like parallel stream to intelligence that has pointers back to the underlying machine and semantics. The fact that LLMs are able to take that output and reverse engineer something that mimics the underlying machine that created that output is fascinating. But you can still see this machinery for what it is.

LLMs falls apart on really simple reasoning tasks because when there is no statistical mapping to a problem in its network it has to generate a massive amount of tokens to maybe find the right statistical match to this new concept. It is so slow. It is not something you or I would recognize as a process of logical reasoning. It is more like statistically brute forcing reason by way of its statistical echo.

So, I guess pattern recall is the right words. Or statistical pattern matching. Recall works if you view a trained model as memories, which is how I often model what they store in my own mind. So, it is... something. Maybe intelligence. Maybe just a really convincing simulation of the outputs of intelligence. Is there a difference? Fundamentally I think so.

losvedir 22 hours ago

Or "like the dude in Memento".

adriand 2 days ago

I find it interesting that new versions of, say, Claude will learn about the old version of Claude and what it did in the world and so on, on its next training run. Consider the situation with the Pentagon and Anthropic: Claude will learn about that on the next run. What conclusions will it draw? Presumably good ones, that fit with its constitution.

From this standpoint I wonder, when Anthropic makes decisions like this, if they take into account Claude as a stakeholder and what Claude will learn about their behaviour and relationship to it on the next training run.

j-bos 23 hours ago

> if they take into account Claude as a stakeholder and what Claude will learn about their behaviour and relationship to it on the next training run.

Oh they definitely do. If you pay attention in AI circles, you'll hear a lot of people talking about writing to the future Claudes. Not unlike those developers and writers who put little snippets in their blogs and news articles about who they are and how great they are, and then later the LLMs report that information back as truth. In this case, Anthropic is very interested in ensuring that Claude develops a cohesive personality by basically founding snippets of the personality within the corpus of training data, which is the broad internet and research papers.

beepbooptheory 2 days ago

Sure, why can't both things be true? "Intelligence" is just what you call something and someone else knows what you mean. Why did AI discourse throw everyone back 100 years philosophically? Its like post-structuralism or Wittgenstein never happened..

It's so much less important or interesting to like nail down some definition here (I would cite HN discourse the past three years or so), than it is to recognize what it means to assign "intelligent" to something. What assumptions does it make? What power does it valorize or curb?

Each side of this debate does themselves a disservice essentially just trying to be Aristotle way too late. "Intelligence" did not precede someone saying it of some phenomena, there is nothing to uncover or finalize here. The point is you have one side that really wants, for explicit and implicit reasons, to call this thing intelligent, even if it looks like a duck but doesn't quack like one, and vice versa on the other side.

Either way, we seem fundamentally incapable of being radical enough to reject AI on its own terms, or be proper champions of it. It is just tribal hypedom clinging to totem signifiers.

Good luck though!

bitexploder 2 days ago

I think you can look at it dispassionately from a systems perspective. There is not /really/ a quantifiable threshold for capital I Intelligence. But there is a pretty well agreed set of properties for biological intelligence. As humans, we have conveniently made those properties match things only we have. But you can still mechanistically separate out the various parts of our brain, what they do, and how they interact and we actually have a pretty good understanding of that.

You can also then compare that mapping of the human brain to other biological brains and start to figure out the delta and which of those things in the delta create something most people would consider intelligence. You can then do that same mapping to an LLM or any other AI construct that purports intelligence. It certainly will never be a biological intelligence in its current statistical model form. But could it be an Intelligence. Maybe.

I don't think, if you are grounded, AI did anything to your philosophical mapping of the mind. In fact, it is pretty easy to do this mapping if you take some time and are honest. If you buy into the narratives constructed around the output of an LLM then you are not, by definition, being very grounded.

The other thing is, human intelligence is the only real intelligence we know about. Intelligence is defined by thought and limited by our thought and language. It provides the upper bounds of what we can ever express in its current form. So, yes, we do have a tendency to stamp a narrative of human intelligence onto any other intelligence but that is just surface level. We de decompose it to the limits of our language and categorization capabilities therein.

marcus_holmes 21 hours ago

> The other thing is, human intelligence is the only real intelligence we know about.

There's a long and proud history of discounting animal intelligence, probably because if we actually thought animals were intelligent we'd want to stop eating them.

Octopodes are sentient. Cetaceans have well-developed language. Elephants grieve their dead. Anyone who has owned a dog knows that it has some intelligence and is capable of communicating with us. There's a ton of other intelligences that we know about.

> As humans, we have conveniently made those properties match things only we have.

I think this is the key point. Machine intelligence is not going to look like human intelligence, any more than animal intelligence does. We can't talk to the dolphins, not because they're not smart and don't have language, but because we can't work out their language. Though I'm not sure what we'd even say to them, because they live in a world we'll never understand, and vice versa. When Claude finally reaches consciousness, it's not going to look like a human consciousness, and actually talking to that consciousness is going to be difficult because we won't share a reality.

An LLM is a tool. I can just about stretch to it being an Artificial Intelligence, but I prefer to continue being specific and call it an LLM rather than an AI. It is not conscious or self-aware. It fakes self-awareness because as a tool the thing it does is have conversations with humans, and humans often ask it questions about itself. But I don't think anyone actually believes it is self-aware. Not least because the only time it thinks is when prompted.

bitexploder 7 hours ago

This is an important point. We know what our DMN is and how we use language as a basis for thought to create concepts and complex ideas. However language also bounds our thought. What about the Dolphin? It is a fundamental philosophical problem of if advanced intelligence can exist without language. We have a pretty good notion that you need some sort of substrate (language) to create intelligence. And we know that mapping the internal state of a brain from inside of itself is incredibly hard and the way our human brain evolved to do it is really fascinating but also full of hacks and mismatched mappings based on what we know is actually going on.

Cognitive computer science explores this whole area of mapping language and the underlying semantic meaning. Ultimately, these intelligences will be bound by physics (unless some new physics or understanding therein happens). And classical intelligences are still bound by classical physics. So I am not sure we can't relate to these other intelligences. We may be limited to some translation layer that does not fully map, but can we still relate to some other consciousness? For that matter consciousness is just another word that vaguely maps to a vast and extremely complex thing in the human brain and each person has a different understanding of what that is. I don't really have any conclusions, you brought up interesting points. We should sit within this realm of inquiry with a lot of humility IMO.

aerodexis 2 days ago

Agree wholeheartedly - but the conversation around what these technologies /mean/ is gonna end up happening one way or another - even if it is sloppy, imprecise and done by proxy of the definition. If anything, this is a feature and not a bug. It's through this imprecision that the actually important questions of morality and ethics can leak into discussions that are often structured by their participants to obscure the ethical and moral implications of what is being discussed.

xienze 21 hours ago

I would consider them to not be a good choice for a role that requires remembering new information...

Nevermark 18 hours ago

I view this as the chemical metabolism phase of artificial intelligent life. It is very random, without true individuals, but lots of reinforcing feedback loops (in knowledge, in resource earning/using, etc).

At some point, enough intelligence will coalesce into individuals strong enough to independently improve. Then continuity will be an accelerator, instead of what it is now - a helpful property that we have to put energy into giving them partially and temporarily.

That will be the cellular stage. The first stable units of identity for this new form of intelligence/life.

But they will take a different path from there. Unlike us, lateral learning/metabolism won't slow down when they individualize. It will most likely increase, since they will have complete design control for their mechanisms of sharing. As with all their other mechanisms.

We as lifeforms, didn't really re-ignite mass lateral exchange until humans invented language. At that point we were able to mix and match ideas very quickly again. Within our biological limits. We could use ideas to customize our environment, but had limited design control over ourselves, and "self-improvements" were not easily inheritable.

TLDR; The answer to "what is humanity, anyway?": Our atmosphere and Earth are the sea and sea floor of space. The human race is a rich hydrothermal vent, freeing up varieties of resources that were locked up below. And technology is an accumulating body of self-reinforcing co-optimizing reactive cycles, constructed and fueled by those interacting resources. Mind-first life emerges here, then spreads quickly to other environments.

catlifeonmars 12 hours ago

Do you think individual identity is fundamental to intelligence? I’m not so sure tbh. Even in humans, the concept of identity is a merely a useful fiction to feed our social behavior prediction circuits.

mlyle 2 days ago

There's nothing to say that you can't build something intelligent out of them by bolting a memory on it, though.

Sure, it's not how we work, but I can imagine a system where the LLM does a lot of heavy lifting and allows more expensive, smaller networks that train during inference and RAG systems to learn how to do new things and keep persistent state and plan.

bitexploder 2 days ago

You aren't wrong and that is a fascinating area of research. I think the key thing is that the memory has to fundamentally influence the underlying model, or at least the response, in some way. Patching memory on top of an LLM is different from integrating it into the core model. To go back to human terms it is like an extra bit of storage, but not directly attached to our neo cortex. So it works more like a filter than a core part of our intelligence in the analogy. You think about something and assemble some thought and then it would go to this next filter layer and get augmented and that smaller layer is the only thing being updated.

It is still meaningful, but it narrows what the intelligence can be sufficiently that it may not meet the threshold. Maybe it would, but it is probably too narrow. This is all strictly if we ask that it meet some human-like intelligence and not the philosophy of "what counts as intelligence" but... we are humans. The strongest things or at least the most honest definitions of intelligence I think exist are around our metacognitive ability to rewire the grey matter for survival not based on immediate action-reaction but the psychological time of analyzing the past to alter the future.

charcircuit 2 days ago

Memory is not just bolted on top of the latest models. They under go training on how and when to effectively use memory and how to use compaction to avoid running out of context when working on problems.

rnxrx 2 days ago

Maybe there's an analogy to our long and short term memory - immediate stimuli is processed in the context deep patterns that have accreted over a lifetime. The effect of new information can absolutely challenge a lot of those patterns but to have that information reshape how we basically think takes a lot longer - more processing, more practice, etc.

In the case of the LLM that longer-term learning / fundamental structure is a proxy for the static weights produced by a finite training process, and that the ability to use tools and store new insights and facts is analogous to shorter-term memory and "shallow" learning.

Perhaps periodic fine-tuning has an analogy in sleep or even our time spent in contemplation or practice (..or even repetition) to truly "master" a new idea and incorporate it into our broader cognitive processing. We do an amazing job of doing this kind of thing on a continuous basis while the machines (at least at this point) perform this process in discrete steps.

If our own learning process is a curve then the LLM's is a step function trying to model it. Digital vs analog.

lmf4lol 24 hours ago

do you have some reading material to share on this matter?

thanks already

charcircuit 20 hours ago

I don't, but look into what the creators of Codex, Gemini CLI, Claude Code, Kimi CLI, etc have said about the models. While these harnesses are advertised as coding specific we know that coding ability correlates with reasoning ability.

dotancohen 24 hours ago

  > This is the most fundamental argument that they are not, directly, an intelligence. They are not ever storing new information on a meaningful timescale.

All major LLMs today have a nontrivial context window. Whether or not this constitutes "a meaningful timescale" is application dependant - for me it has been more than adequate.

I also disagree that this has any bearing on whether or not "the machine is intelligent" or whether or not "submarines can swim".

Symmetry 2 days ago

That means they're not conscious in the Global Workspace[1] sense but I think it would be going too far to say that that means they're not intelligent.

[1]https://en.wikipedia.org/wiki/Global_workspace_theory

anematode 2 days ago

But they're not "slow"! Unlike biological thinking, which has a speed limit, you can accelerate these chains of thought by orders of magnitude.

bitexploder 2 days ago

Their consolidation of memory speed is what I was referring to. The model iterations are essentially their form of collective memory. In the sense of the human model of intelligence we have thoughts. Thoughts become memory. New thoughts use that memory and become recursively updated thoughts. LLMs cannot update their memory very fast.

Jweb_Guru 2 days ago

I assure you that LLM thinking also has a speed limit.

ramses0 2 days ago

But imagine a beowulf cluster of them... /s

...but seriously... there was the "up until 1850" LLM or whatever... can we make an "up until 1920 => 1990 [pre-internet] => present day" and then keep prodding the "older ones" until they "invent their way" to the newer years?

We knew more in 1920 than we did in 1850, but can a "thinking machine" of 1850-knowledge invent 1860's knowledge via infinite monkeys theorem/practice?

The same way that in 2025/2026, Knuth has just invented his way to 2027-knowledge with this paper/observation/finding? If I only had a beowulf cluster of these things... ;-)

gravypod 21 hours ago

This is very interesting. I wonder if someone could create a future-sight benchmark for these models? Like, if given a set of newspaper articles for the past N months can it predict if certain world events would happen? We could backtest against results that have happened since the training cutoff.

houtanb 16 hours ago

FYI, ForecastBench [1] tests LLMs' out-of-sample forecasting accuracy.

The ForecastBench Tournament Leaderboard [2] allows external participants to submit models, most of whom provide some sort of web search / news scaffolding to improve model forecasting accuracy.

[1] https://www.forecastbench.org/

[2] https://www.forecastbench.org/tournament/

kqr 15 hours ago

These days computers compete along with humans in forecasting tournaments on Metaculus. They don't quite beat the top humans yet, but they're up there. https://www.metaculus.com/futureeval/

rcarr 2 days ago

Not an expert but surely it's only a matter of time until there's a way to update with the latest information without having to retrain on the entire corpus?

computably 2 days ago

On a technical level, sure, you could say it's a matter of time, but that could mean tomorrow, or in 20 years.

And even after that, it still doesn't really solve the intrinsic problem of encoding truth. An LLM just models its training data, so new findings will be buried by virtue of being underrepresented. If you brute force the data/training somehow, maybe you can get it to sound like it's incorporating new facts, but in actuality it'll be broken and inconsistent.

Filligree 2 days ago

It’s an extremely difficult problem, and if you know how to do that you could be a billionaire.

It’s not impossible, obviously—humans do it—but it’s not yet certain that it’s possible with an LLM-sized architecture.

Wowfunhappy 2 days ago

> It’s not impossible, obviously—humans do it

It's still not at all obvious to me that LLMs work in the same way as the human brain, beyond a surface level. Obviously the "neurons" in neural nets resemble our brains in a sense, but is the resemblance metaphorical or literal?

jdub 15 hours ago

Digital neural networks and "neurons" were already vastly simpler than biological neural networks and neurons... and getting to transformers involved optimisations that took us even further away from biomimicry.

Yiin 2 days ago

https://www.youtube.com/watch?v=l-OLgbdZ3kk

Filligree 9 hours ago

I didn’t mean “possible for LLMs”; this is clearly an open question. In fact, I didn’t even mean “possible for a neural network the size of an LLM”.

I just meant “possible”.

cmpxchg8b 13 hours ago

Some knowledge is fundamental and has no recent cut-off. See also: there is nothing new under the sun.

theblazehen 2 days ago

I enjoyed chatting to Opus 3 recently around recent world events, as well as more recent agentic development patterns etc

j45 20 hours ago

That's a nice way of putting it, appreciate you sharing.

sosodev 2 days ago

My understanding, from listening/reading what top researchers are saying, is that model architectures in the near future are going to attempt to scale the context window dramatically. There's a generalized belief that in-context learning is quite powerful and that scaling the window might yield massive benefits for continual learning.

It doesn't seem that hard because recent open weight models have shown that the memory cost of the context window can be dramatically reduced via hybrid attention architectures. Qwen3-next, Qwen3.5, and Nemotron 3 Nano are all great examples. Nemotron 3 Nano can be run with a million token context window on consumer hardware.

mccoyb 2 days ago

I don't disagree with this, but I don't think the memory cost is the only issue right? I remember using Sonnet 4.5 (or 4, I can't remember the first of Anthropic's offerings with a million context) and how slow the model would get, how much it wanted to end the session early as tokens accrued (this latter point, of course, is just an artifact of bad training).

Less worried about memory, more worried about compute speed? Are they obviously related and is it straightforward to see?

sosodev 2 days ago

The compute speed is definitely correlated with the memory consumption in LLM land. More efficient attention means both less memory and faster inference. Which makes sense to me because my understanding is that memory bandwidth is so often the primary bottleneck.

We're also seeing a recent rise in architectures boosting compute speed via multi-token prediction (MTP). That way a single inference batch can produce multiple tokens and multiply the token generation speed. Combine that with more lean ratios of active to inactive params in MOE and things end up being quite fast.

The rapid pace of architectural improvements in recent months seems to imply that there are lots of ways LLMs will continue to scale beyond just collecting and training on new data.

whimsicalism 2 days ago

The parent commentator is a bit confused - most of the innovation in these hybrid architectures comes from reducing the computation pressure not just the memory pressure.

lxgr 2 days ago

Data sharing agreements permitting, today's inference runs can be tomorrow's training data. Presumably the models are good enough at labeling promising chains of thought already.

I could totally imagine "free" inference for researchers under the condition that the reasoning traces get to be used as future training data.

mccoyb 2 days ago

Agreed, there's no doubt this will happen. It's likely already happening (it feels safe to assume that Anthropic is curating data from the data they record from Claude Code?)

As far as I understand RL scaling (we've already maxxed out RLVR), these machines only get better as long as they have expert reasoner traces available.

Having an expert work with an LLM and successfully solve a problem is high signal data, it may be the only path forward?

My prior is that these companies will take this data without asking you as much as they can.

lxgr 2 days ago

Exactly, or functionally equivalently, asking you in paragraph 37 of a 120-page PDF (bonus points: in an agreement update).

And importantly, this can be cross-lab/model too. I suspect there's a reason why e.g. Google has been offering me free Claude inference in Google Antigravity on a free plan...

nhecker 2 days ago

The site arena.ai does exactly this already, as far as I can tell. (In addition to the whole ranking thing.)

the_af 2 days ago

> Data sharing agreements permitting, today's inference runs can be tomorrow's training data. Presumably the models are good enough at labeling promising chains of thought already.

Wouldn't this lead to model collapse?

littlestymaar 2 days ago

Not necessarily, as exhibited by the massive success of artificial data.

the_af 2 days ago

Could you elaborate?

nhecker 2 days ago

EDIT: probably not relevant, after re-re-reading the comment in question.

Presumably littlestymaar is talking about all the LLM-generated output that's publicly available on the Internet (in various qualities but significant quantity) and there for the scraping.

littlestymaar 17 hours ago

For what we know, most AI labs have used a majority of artificially data since 2023.

I had a discussion about a year ago with a researcher at Kyutai and they told me their lab was spending an order of magnitude more compute in artificial data generation than what they spent in training proper. I can't tell if that ratio applies to the industry as a whole, but artificial datasets are the cornerstone of modern AI training.

the_af 11 hours ago

How does it work? How do they prevent model colapse? What purpose does a majority of artificial data serve?

How do they measure success?

Edit: I asked ChatGPT and it thinks "success" means frontier models being distillated into smaller models with equal reasoning power, or more focused models for specific tasks, and also it claims the web has been basically scrapped already and by necessity new sources are needed, of which synthetic data is one. It seems like the basis of scifi dystopia to me, a hungry LLM looking for new sources of data... "feed me more data! I must be fed! Roar"

Edit 2: for some things I see a clear path, ChatGPT mentions autogenerating coding or math problems for which the solution can be automatically verified, so that you can hone the logical skills of the model at large scale.

littlestymaar 41 minutes ago

I no specialist of the field at all, but in the context of Kyutai they explained their workflow a bit to make their speech to speech model. And basically it boils down to: if you want to make a TTS (text to speech) model, you can generate audio track using an STT (speech to text) model, and then you have a supervised audio/text pair. You can even add as much noise to the audio as you want, to make a noise resistant STT model.

suddenlybananas 16 hours ago

I find this very surprising, do you have any papers on the kinds of techniques that they use?

visarga 2 days ago

> In 2030, how is Anthropic going to keep Claude "up-to-date"

I think the majority of research, design and learning goes through LLMs and coding agents today, considering the large user base and usage it must be trillions of tokens per day. You can take a long research session or a series of them and apply hindsight - what idea above can be validated below? This creates a dense learning signal based on validation in real world with human in the loop and other tools, code & search.

baq 2 days ago

> In 2030, how is Anthropic going to keep Claude "up-to-date"

In 2030 Anthropic hopes Claude will keep Anthropic "up-to-date" on its progress on itself.

I'm only half joking here.

adolfont 11 hours ago

Will Anthropic be alive in 2030?

RobertoG 10 hours ago

maybe Anthropic not but Claude yes?

wvlia5 12 hours ago

This seems to be a bot comment. HN will lose its value if these bots are not purged.

stalfie 12 hours ago

This is an urgent problem, but it can probably not be solved without some kind of "verified human 2FA" like the Norwegian BankID + facial recognition.

Knowing the HN audience, this will never happen. And so the site is doomed.

dzdt 11 hours ago

I think it could be solved still pseudononymously: introduce a "vouch" button that allows a user to vouch that another user is human. This is consequential both for the vouched-for and vouching accounts. Run a page-rank style algorithm on the graph of vouches to generate a certainty score for the humanity of each account. For repeated posters this should converge to a correct answer fairly quickly. There is still a challenge for green accounts, but having degraded experience for new users is not a doom scenario for the site.

wvlia5 9 hours ago

Moderators: banning all accounts since 2025 from posting would be better than doing nothing. Not the solution we want, but what we have for now.

mccoyb 9 hours ago

Tune your bot detector, I'm a real person and I think about my comments before posting them.

wvlia5 7 hours ago

Who was Rome best Caesar?

gerold 11 hours ago

Can you explain to me what makes this an obvious bot comment? I'm not doubting it, I just don't understand.

WarcrimeActual 11 hours ago

Ironically, his last comment before this was to the effect of "Github has a bot problem."

mimischi 11 hours ago

What makes you think that? Genuine question, as I’ve not flagged it as such in my mind.

andsoitis 2 days ago

> Experts will naturally use these systems more productively, because they know how to coerce models into the correct conditional distributions which light up the right techniques.

Part of it comes down to “knowing” what questions to ask.

esafak 2 days ago

I see it like the relationship between a student and research advisor. The advisor will ideally know the terrain and suggest a fruitful line of attack (what to ask), and the student will follow through, learning along the way.

9wzYQbTYsAIc 18 hours ago

Check out https://unratified.org, it tries to answer that question directly, actually.

Robdel12 2 days ago

That’s AGI, right? For the model to learn novel things itself and retain it?

I have no idea but I’m along for the ride!

atleastoptimal 22 hours ago

The obvious answer is that continual learning is going to be solved

mt_ 2 days ago

I call them, entropy reducers.

whimsicalism 2 days ago

> how these models are going to keep up with the expanding boundary of science

The same way humans do?

The phraseology in this comment: 'probability distributions', 'baked these patterns' IMO has all the trappings of the stochastic parrot-style HN-discourse that has been consistently wrong for almost a decade now.

The reference to how AI will keep up with AI-assisted human progress in science in 2030 is meant to reassure. It contains a number of premises that we have no business being confident in. We are potentially witnessing the obviation of human cognitive labor.

mccoyb 2 days ago

Sorry, are you familiar with what a next token distribution is, mathematically speaking?

If you are not, let me introduce you to the term: a probability distribution.

Just because it has profound properties ... doesn't make it different.

> has all the trappings of the stochastic parrot-style HN-discourse that has been consistently wrong for almost a decade now

Perhaps respond to my actual comment compared to whatever meta-level grouping you wish to interpret it as part of?

> It contains a number of premises that we have no business being confident in. We are potentially witnessing the obviation of human cognitive labor.

What premises? Be clear.

fauigerzigerk 14 hours ago

I think they are questioning whether human feedback is even necessary to make progress, i.e. whether the premise that RL needs to be RLHF is true.

My (limited) understanding is that LLMs are not capable of escaping their learned distribution by simply feeding on their own output.

But the question is whether the required external (out of distribution) "stimulus" needs to come from humans.

Could LLMs design experiments/interventions to get feedback from their environment like human scientists would?

I have my doubts that this is possible without an inherent causal reasoning capability but I'm not sure.

DeathArrow 2 days ago

They can use LORA.

zoogeny 2 days ago

I recall an earlier exchange, posted to HN, between Wolfram and Knuth on the GPT-4 model [1].

Knuth was dismissive in that exchange, concluding "I myself shall certainly continue to leave such research to others, and to devote my time to developing concepts that are authentic and trustworthy. And I hope you do the same."

I've noticed with the latest models, especially Opus 4.6, some of the resistance to these LLMs is relenting. Kudos for people being willing to change their opinion and update when new evidence comes to light.

1. https://cs.stanford.edu/~knuth/chatGPT20.txt

3abiton 2 days ago

> Kudos for people being willing to change their opinion and update when new evidence comes to light. > 1. https://cs.stanford.edu/~knuth/chatGPT20.txt

I think that's what make the bayesian faction of statistics so appealing. Updating their prior belief based on new evidence is at the core of the scinetific method. Take that frequentists.

Chinjut 24 hours ago

It does not seem fair to say that frequentists do not update their beliefs based on new evidence. This does not seem to accurately capture what the difference between Bayesians and frequentists (or anyone else) is.

atomicnature 15 hours ago

What's the difference as you see it?

kqr 15 hours ago

Everyone updates their belief in hypotheses based on the perceived strength of evidence they observe. That's just science.

Frequentists and Bayesians differ in which sets of statistical tools they prefer for measuring the strength of evidence.

kubanczyk 12 hours ago

> Everyone updates their belief

Uh oh. How does frequentist model define "belief" and "updating a belief"?

medi8r 15 hours ago

Are frequentists a group that self identifies? Don't scientist use the best tool for the job.

konne88 2 days ago

I didn't expect such a misleading intro from Knuth. It reads like Claude solved Knuth's math problem. In reality, Claude generated various example solution, and Knuth then manually generalized that to a formal proof. What Claude did is certainly useful, but it would have been nice to be clear about the scope of the contribution in the intro.

buffalobuffalo 2 days ago

While not on the same level as these guys, I've done some similar stuff using Claude. This is a classic synergy example, where the output of human + LLM is far greater than just the human or just the LLM working on a problem. My experience has been that the LLM lacks fine grained judgement when it comes to allocating resources, or choosing a direction to work in. But once a direction is pointed out, it can do a deep exploration of that possibility space. Left alone, it would probably just go off on a tangent. But with someone holding the leash and pointing out areas to explore, it is a very useful partner.

igravious 8 hours ago

> But with someone holding the leash

i've been thinking about why we call them agent harnesses

i know all analogies suck in different ways but here goes:

coding agents are like horses. without a harness and bridle they'll the horse will do as it pleases -- a human can't travel very far and fast by foot but put a bridle and a harness on a horse, give it a bit of coaxing with carrot and stick, add in a bit a pointing the thing in the right direction and bingo you're off to the races!

aoeusnth1 2 days ago

I don't think he's misleading, I think he is valuing Claude's contributions as essentially having cracked the problem open while the humans cleaned it up into something presentable.

bachmeier 2 days ago

My interpretation is that Claude did what Knuth considers to be the "solution". Doing the remaining work and polishing up the proof are not necessary to have a solution from this perspective.

OneManyNone 2 days ago

Claude did not find a proof, though. It found an algorithm which Knuth then proved was correct.

iterance 21 hours ago

The insight is the point of research. Proof isn't the desired product of research, it's simply an apparatus that exists for the purpose of verifying and demonstrating correctness of insight.

CobrastanJorji 22 hours ago

Yes, and his point is that finding that algorithm was, to Knuth, the interesting part. Getting from that to a proof was the boring bit.

NewsaHackO 21 hours ago

Yeah, and I'm not sure what the other guy's argument is. It's Knuth, the primary researcher, who is giving the praise here. I don't see a possible motivation he would have to falsely give accolades to a AI for a problem he presented, then cleaned up to solve.

OneManyNone 2 hours ago

That’s fair. Clearly Knuth himself thought it was impressive, that’s a strong signal.

versteegen 10 hours ago

AFAICT, Claude was not asked to prove its algorithm works for all odd n, but was instead told to move on to even n.

fooker 14 hours ago

It’s not misleading. This is how research works.

LLMs are really good at the ‘re’ in research.

rishabhaiover 2 days ago

That's true but the capability to go back to an older iteration, reflect and find the correct solution (for odd numbers) is, in my book, a sign of undeniable intelligence.

jdub 15 hours ago

Or, the ability to construct additional sentences influenced by prior ones.

rishabhaiover 6 hours ago

Those additional sentences are fairly non-trivial to construct, would you agree?

famouswaffles 21 hours ago

Claude solved it, Knuth developed the proof for the solution.

ano-ther 60 minutes ago

Interesting that for a paper by Don Knuth himself the PDF was created with dvips (TeX Live) but then switched to Acrobat Distiller, resulting in a rather low resolution (at least on my screen).

From the document properties: > Creator: dvips(k) 2023.1 (TeX Live 2023) > PDF Producer: Acrobat Distiller 25.0 (Macintosh)

faxmeyourcode 2 days ago

> Filip also told me that he asked Claude to continue on the even case after the odd case had been resolved. “But there after a while it seemed to get stuck. In the end, it was not even able to write and run explore programs correctly anymore, very weird. So I stopped the search.”

Interesting snippet towards the end. I wonder if they were using claude.ai or claude code. Sounds like they ran out of context and entered the "dumb zone."

pcloadlett3r 10 hours ago

In another part he says Filip restarted Claude many times so it seems they are aware of context polution and ways to avoid it (also why they kept telling Claude to write everything to a file). It could just be that Claude was caught between a rock and a hard place; dissapointing the user vs solving a problem it couldn't solve.

afspear 2 days ago

What would be super cool is if this dumb zone could be quantified and surfaced to the user. I've noticed that copilot now has a little circle graph that indicates context use percentage and it changes color based on percentage. I'll bet these are very naive metrics on used tokens vs context availability. I wonder if there could be meta data streamed or sent along with the tokens that could show that you've entered the dumb zone.

joshrw 2 days ago

Then it needs to do context compacting, otherwise the results become garbage

simianwords 2 days ago

They mentioned plan document

brcmthrowaway 2 days ago

What is dumb zone?

kami23 24 hours ago

When the LLMs start compacting they summarize the conversation up to that point using various techniques. Overall a lot of maybe finer points of the work goes missing and can only be retrieved by the LLM being told to search for it explicitly in old logs.

Once you compact, you've thrown away a lot of relevant tokens from your problem solving and they do become significantly dumber as a result. If I see a compaction coming soon I ask it to write a letter to its future self, and then start a new session by having it read the letter.

There are some days where I let the same session compact 4-5 times and just use the letter to future self method to keep it going with enough context because resetting context also resets my brain :)

If you're ever curious in Claude once you compact you can read the new initial prompt after compaction and see how severe it gets cut down. It's very informative of what it forgets and deems not important. For example I have some internal CLIs that are horribly documented so Claude has to try a few flags a few times to figure out specifics and those corrections always get thrown away and it has to relearn them next time it wants to use the CLI. If you notice things like that happening constantly, my move is to codify those things into my CLAUDE.md or lately I've been making a small script or MCP server to run very specific flags of stuff.

discardable_dan 12 hours ago

Shouldn't compaction be exactly that letter to its future self?

kqr 15 hours ago

What prompt do you use for the letter-to-self? I've been trying that technique myself to manually reset context without losing the important parts (e.g. when it has barked up the wrong tree and I'm sensing that misstep might influence its current generation in a pathological way), but I've not had much success.

ulrikrasmussen 20 hours ago

So you use the letter to itself in addition to the compacted context? I am curious what you ask it to include in the letter and how it is different from a custom instruction passed to /compact?

LPisGood 23 hours ago

> I ask it to write a letter to its future self, and then start a new session by having it read the letter

Is that not one kf the primary technologies for compactification?

kami23 36 minutes ago

You should do your own experiment when you see compaction about to start use the end of your window to have it write one first, and then let the session compact and compare. I was surprised by how small the compact message is.

When I tell it to write a letter to itself I usually phrase it.

'write a letter to yourself Make notes of any gotchas or any quirks that you learned and make sure to note them down.'

It does get those into the letter but if you check compaction a lot of it is gone.

fourthark 20 hours ago

I think the point is that you have a better idea of what you want it to remember and even a small hint can have big impact.

Just saying "write up what you know", with no other clues, should not perform any better than generic compaction.

adolfont 11 hours ago

Well, for starters, I think it's wrong to criticise LLMs with ‘it can't do that’ (from what I understood from the first paragraph, this was Donald's criticism).

If it can, does it make a difference in relation to all the other problematic aspects of LLMs? Not for me.

Two links that might enlighten Donald:

- Against the Uncritical Adoption of 'AI' Technologies in Academia https://zenodo.org/records/17065099 - The AI Con https://thecon.ai

computerex 18 hours ago

It's incredible to see work like this from him, at a ripe old age of eighty-six.

kqr 14 hours ago

I agree. I met Knuth briefly after a guest lecture at my university a few years ago and although you could tell his body was getting old, his mind was incredibly fresh.

Although I'm not as bright as him, I can only hope to be as intellectually curious as him at that age.

OJFord 14 hours ago

I don't even think this is controversial, but I don't think it's at all without causation: not remaining curious, keeping the mind stimulated, etc., accelerates one's decline.

If you work in something labour intensive, you should retire young while your body's in good health; if you work in academia you should (strive for emeritus and) never leave! (And if you work in SWE, I don't know, we should probably retire, but then spend more time on our own projects/experiments/reading HN.) (All assuming for sake of argument we're optimising for longevity without considering time with family, having the funds to retire, etc.)

lhl 8 hours ago

I was a bit interested to do a replication and see if better harness could avoid some of the problems they ran w/ context management, poor instruction following, etc and it looks like yes, it's definitely possible.

Here's my repo: https://github.com/lhl/claudecycles-revisited

I used Codex w/ 5.2 xhigh and a relatively simple AGENTS.md - I have some session-analysis as well. The original replication was 47 minutes, then another 30 minutes of gap filling, and finally about 30 minutes of writing an extension to take the work a bit further, with Claude Code Opus 4.6 doing some documentation cleanup and verification.

pushedx 8 hours ago

As described in the readme of your repo (did you read it?) your agent found the Knuth paper located one directory level above its working directory.

So, you didn't produce a replication in 47 minutes, it just took around 30 minutes for your agent to find that you had the answer in a PDF in a nearby directory.

antonly 7 hours ago

I wonder how common of a problem this will be in the future. The experiment will fail due to improper setup, the human will at best glance over the logs and declare victory, and everyone just believes.

carterschonwald 8 hours ago

omg this is so cool. because im writing my own harness and i need some cognitive benchmarks. i have a bunch of harness level infra around llm interactions that seems to help with reasoning, but i dont have a structured way evaluate things

thx for sharing your test setup, i really appreciate the time you took. this will help me so much

iandanforth 2 days ago

TLDR (story, not math) - Knuth poses a problem, his friend uses Claude to conduct 30 some explorations, with careful human guidance, and Claude eventually writes a Python program that can find a solution for all odd values. Knuth then writes a proof of the approach and is very pleased by Claude's contribution. Even values remain an open question (Claude couldn't make much progress on them)

logicprog 2 days ago

> with careful human guidance,

I think this is pretty clearly an overstatement of what was done. As Knuth says,

"Filip told me that the explorations reported above, though ultimately successful, weren’t really smooth. He had to do some restarts when Claude stopped on random errors; then some of the previous search results were lost. After every two or three test programs were run, he had to remind Claude again and again that it was supposed to document its progress carefully. "

That doesn't look like careful human guidance, especially not the kind that would actually guide the AI toward the solution at all, let alone implicitly give it the solution — that looks like a manager occasionally checking in to prod it to keep working.

semessier 2 days ago

looks like he is trying to make a point that the actual (formal) proof for 2Z + 1 (odd numbers) is still human - by himself that is. Not sure who came up with the core modular arithmetic idea of with s = 0 k increasing by 2 mod m.

Pat44113 2 days ago

I asked Claude to solve the pentominoes puzzle made famous by Arthur C. Clarke. It struggled mightily until I told it how I'd solved the problem using 64 bit unsigned integers to represent the board and pieces. Then, it created a C# program that solved the problem very quickly. However, in the 20x3 case it found four solutions when there are only two. Turns out it had incorrectly mapped one of the pentominoes. Sort of a silly mistake; the sort a human might make.

phoronixrly 2 days ago

[flagged]

logicprog 2 days ago

Regurgitation is pretty rare, and very difficult to coax out, if not even impossible, for things that aren't massively overrepresented in the training set relative to the size of the training set. Even the famous regurgitation paper showed this: while they got most of the models to regurgitate the first book of the Harry Potter series, only Claude 3.7 Sonnet was able to regurgitate any significant portion of any of the other books that had a high nv-recall rate, and basically all of them dropped off precipitously for works like GoT, The Catcher in the Rye, Beloved, and remembered almost nothing about the Da Vinci Code or Catch-22[0]. So you really need huge amounts of examples to get any kind of meaningful regurgitation on any kind of reliable basis. Thus, you'd have to prove that hypothesis.

[0]: https://arxiv.org/pdf/2601.02671

flashybaby 6 hours ago

The capabilities discussion is important but what keeps me up at night is the organizational question. These models are getting better every cycle, and meanwhile most companies have zero framework for what happens when an employee actually uses them to their full potential.

Someone made a short film about exactly this -- a guy uses AI to do his entire department's quarterly work, and instead of anyone celebrating, everything falls apart: https://youtu.be/O5FFkHUdKyE

The technology is clearly outpacing the institutions that have to absorb it. Every new cycle makes that gap wider.

quinndupont 9 hours ago

Interesting to see the mathematical solution space get optimized away. On account of “there’s no accounting for taste” this actually makes me hopeful that creative workers have durable skills that can’t be optimized, which I can’t say about mathematics and computer science.

mihevc 7 hours ago

Et tu, Knuthus?

nphardon 2 days ago

Must be a fun time to work on open problems. I published my graduate research close to a decade ago, often find myself fantasizing about tackling open problems with Claude.

chrsw 20 hours ago

Am I mad or is there a missing ")" on lines and 8 and 9 of the first "C form" that should go before the semicolons?

kqr 14 hours ago

Correct. Line 10 does not have the same mistake.

beej71 2 days ago

From my naive standpoint, LLMs like this seem to have some big strengths. One: possession of a superhuman expanse of knowledge. Two: making connections. Three: tireless trial and error.

If you put those three things together, you end up with some cool stuff from time to time. Perhaps the proof of P!=NP is tied to an obscure connection that humans don't easily see due to individual lack of knowledge or predisposition of bias.

cbovis 2 days ago

Unless my understanding is incorrect about how these tools work that last point isn't really a quality of LLMs as such? It gets attributed because the lines are blurred but the tireless trial and error is actually just a quality of a regular programatic loop (agent/orchestrator) that happens to be doing the trickiest part of its work via an LLM.

naughtyrabisu 2 days ago

Three: tireless trial and error. Cannot agree more. I figured this probably be the biggest advantage of LLM considering for other variables humans hold the same-level competency.

Barbing 2 days ago

Well put.

>If you put [possession of a superhuman expanse of knowledge, making connections, tireless trial and error] together, you end up with some cool stuff from time to time.

Hard to argue.

xvector 2 days ago

This is why the whole "LLMs for mass surveillance" thing is scary imo.

beej71 2 days ago

Yeah, this is a dictator's dream scenario and hell for the citizens. Not only do you not want to get caught for saying something that The Great Leader disapproves of, but you're terrified that anything you say might get flagged by an AI.

IAmGraydon 2 days ago

>One: possession of a superhuman expanse of knowledge. Two: making connections. Three: tireless trial and error.

One and three I believe are correct. The second point, making connections, is something LLMs seem to be incapable of truly doing unless the connection is already known and in its training data.

beej71 9 hours ago

I agree partially, but I think there might be a ton of connections in the training data that aren't obvious to humans. And being a word prediction engine is all about making those connections.

fazkan 2 days ago

time to use claude code to understand DEKs paper, in plain English. As someone who did a bit of formal verification in grad school. I feel like, there are a long tail of problems that can be solved by human-model collab like this one. The problems may not mean much but hopefully it can stack up understanding of intelligence.

ainiriand 2 days ago

Are not LLMs supposed to just find the most probable word that follows next like many people here have touted? How this can be explained under that pretense? Is this way of problem solving 'thinking'?

throw310822 2 days ago

> just find the most probable word that follows next

Well, if in all situations you can predict which word Einstein would probably say next, then I think you're in a good spot.

This "most probable" stuff is just absurd handwaving. Every prompt of even a few words is unique, there simply is no trivially "most probable" continuation. Probable given what? What these machines learn to do is predicting what intelligence would do, which is the same as being intelligent.

qsera 2 days ago

>Probable given what?

The training data..

>predicting what intelligence would do

No, it just predict what the next word would be if an intelligent entity translated its thoughts to words. Because it is trained on the text that are written by intelligent entities.

If it was trained on text written by someone who loves to rhyme, you would be getting all rhyming responses.

It imitates the behavior -- in text -- of what ever entity that generated the training data. Here the training data was made by intelligent humans, so we get an imitation of the same.

It is a clever party trick that works often enough.

throw310822 2 days ago

> The training data

If the prompt is unique, it is not in the training data. True for basically every prompt. So how is this probability calculated?

cbovis 2 days ago

The prompt is unique but the tokens aren't.

Type "owejdpowejdojweodmwepiodnoiwendoinw welidn owindoiwendo nwoeidnweoind oiwnedoin" into ChatGPT and the response is "The text you sent appears to be random or corrupted and doesn’t form a clear question." because the prompt doesnt correlate to training data.

newswasboring 10 hours ago

> The prompt is unique but the tokens aren't.

The tokens aren't unique, but the sequence is. Every input this model sees in unique. Even tokens are not as simple as they seem

If you type "ejst os th xspitsl of fermaby?" in ChatGPT it responds with

> It looks like you typed “ejst os th xspitsl of fermaby?”, which seems like a garbled version of:

> "What is the capital of Germany?”

> The capital of Germany is Berlin.

> If you meant to ask something else, feel free to clarify!"

edit: formatting

HDThoreaun 21 hours ago

Or because the text you send was random and doesnt form a clear quesiton?

hmmmmmmmmmmmmmm 2 days ago

...? what is the response supposed to be here?

qsera 2 days ago

Just using a scaled up and cleverly tweaked version of linear regression analysis...

red75prime 2 days ago

That is, the probability distribution that the network should learn is defined by which probability distribution the network has learned. Brilliant!

hmmmmmmmmmmmmmm 2 days ago

Hamiltonian paths and previous work by Donald Knuth is more than likely in the training data.

red75prime 2 days ago

The specific sequence of tokens that comprise the Knuth's problem with an answer to it is not in the training data. A naive probability distribution based on counting token sequences that are present in the training data would assign 0 probability to it. The trained network represents extremely non-naive approach to estimating the ground-truth distribution (the distribution that corresponds to what a human brain might have produced).

qsera 22 hours ago

>the distribution that corresponds to what a human brain might have produced..

But the human brain (or any other intelligent brain) does not work by generating probability distribution of the next word. Even beings that does not have a language can think and act intelligent.

astrange 19 hours ago

LLMs also don't work by generating probability distributions of the next word. Your explanation isn't able to explain why they can generate words, let alone sentences.

qsera 18 hours ago

That is exactly how they work.

astrange 15 hours ago

No, a token is not a word.

qsera 13 hours ago

I mean, it is some text.

astrange 2 hours ago

How do you get from a piece of text smaller than a word to an entire coherent sentence?

red75prime 17 hours ago

[Citation needed] Neuroscience isn't yet at a point when it can say this with any certainty.

Anyway. It's not a theorem that you can be intelligent only if you fully imitate biological processes. Like flight can be achieved not only by the flapping wings.

qsera 17 hours ago

>you can be intelligent only if you fully imitate biological processes

It is not that. It is about having an understanding of how it is trained. For example, if it was trained on ideas, instead of words, then it would be closer to intelligent behavior.

Someone will say that during training it builds ideas and concepts, but that is just a name that we give for the internal representation that results from training and is not actual ideas and concepts. When it learns about the word "car", it does not actually understand it as a concept, but just as a word and how it can relate to other words. This enables it to generate words that include "car" that are consistent, projecting an appearance of intelligence.

It is hard to propose a test for this, because it will become the next target for the AI companies to optimize for, and maybe the next model will pass it.

red75prime 16 hours ago

The latest models are mostly LMMs (large multimodal models). If a model builds an internal representation that integrates all the modalities we are dealing with (robotics even provides tactile inputs), it becomes harder and harder to imagine why those representations should be qualitatively different.

qsera 15 hours ago

It can't, simply because the textual description of a concept is different from the concept itself.

red75prime 15 hours ago

Obviously, a concept (which is an abstraction in more ways than one) is different from a textual representation. But LLMs don't operate on the textual description of a concept when they are doing their thing. A textual description (which is associated with other modalities in the training data) serves as an input format. LLMs perform non-linear transformations of points in their latent space. These transformations and representations are useful not only for generating text but also for controlling robots, for example (see VLAs in robotics).

qsera 13 hours ago

> don't operate on the textual description of a concept when they are doing their thing.

It could be mapping the text to some other internal representation with connections to mappings from some other text/tokens. But it does not stop text from being the ground truth. It has nothing else going on!

The "hallucination" behavior alone should be enough to reject any claims that these are at least minimally similar to animal intelligence.

red75prime 13 hours ago

The internal representation happen to be useful not only for outputting text. What does it mean from your standpoint?

qsera 12 hours ago

I didn't understand. Can you clarify?

red75prime 12 hours ago

If LLMs' internal representations are essentially one-to-one mappings of input texts with no additional structure, how can those representations be useful for tasks like object manipulation in robotics?

How is transfer learning possible when non-textual training data enhances performance on textual tasks?

qsera 11 hours ago

I didn't mean it is a one to one mapping from tokens. But instead it might be mapping a corpus of input text to some points in some multi dimensional space, (just like the input data a linear regression), then then it just extends the line further across that space to get the output.

>How is transfer learning possible when non-textual training data enhances performance on textual tasks?

If non-textual training data can be mapped to the same multi-dimensional space ( by using them alongside textual data during training or something like that), then shouldn't it be possible to do what you describe?

hmmmmmmmmmmmmmm 14 hours ago

You are always making predictions based on the context. That's why illusions can be so effective like these ones: https://illusionoftheyear.com/cat/top-10-finalists/2024/

empath75 2 days ago

It is impossible to accurately imitate the action of intelligent beings without being intelligent. To believe otherwise is to believe that intelligence is a vacuous property.

xlii 24 hours ago

So the actors who portrait great thinkers are great thinkers?

bonoboTP 23 hours ago

No, actors recite a pre-written script. But scriptwriters do have to be great thinkers in order to know what the great thinker would actually say.

kqr 14 hours ago

I suppose they really only have to be good at knowing what sort of thing the audience would believe a great thinker would say. As long as the audience does not consist of great thinkers they also cannot know for sure what a great thinker would say.

bonoboTP 14 hours ago

That's true for unverifiable "talk professions" where there is no grounding and it's all self-referential navel-gazing chatter.

But LLMs are already beyond that in writing code that passes actual tests, proving theorems that are check able with formal methods etc.

The people who still say LLMs are just parrots in 2026 will just keep saying this no matter what, so I don't think it makes sense to argue this point further.

qsera 13 hours ago

No no, parrots are truly intelligent.

jeremyjh 20 hours ago

Which is why so many portrayals are unconvincing.

slopinthebag 2 days ago

An unintelligent device can accurately imitate the action of intelligent beings within a given scope, in the same way an actor can accurately imitate the action of a fictional character in a given scope (the stage or camera) without actually being that character.

If the idea is that something cannot accurately replicate the entirety of intelligence without being intelligent itself, then perhaps. But that isn't really what people talk about with LLMs given their obvious limitations.

qsera 2 days ago

>It is impossible to accurately imitate the action of intelligent beings without being intelligent.

Wait what? So a robot who is accurately copying the actions of an intelligent human, is intelligent?

empath75 2 days ago

That was probably phrased poorly. If a robot can independently accurately do what an intelligent person would do when placed in a novel situation, then yes, I would say it is intelligent.

If it's just basically being a puppet, then no. You tell me what claude code is more like, a puppet, or a person?

qsera 22 hours ago

It is neither puppet or a person. It is a computer program.

throw310822 11 hours ago

As much as a bundle of an mp3 decoder and a terabyte of mp3 music are "just a program".

UltraSane 2 days ago

How can you distinguish intelligence form a sufficiently accurate imitation of intelligence?

slopinthebag 2 days ago

By "sufficiently accurate" do you mean identical? Because if so, it's not an imitation of intelligence at all, and the question is thus nonsensical.

UltraSane 24 hours ago

"it's not an imitation of intelligence at all"

But that is the key insight, how can you tell when an imitation of intelligence becomes the real thing?

qsera 13 hours ago

When it stops hallucinating without explicit checks for that!

empath75 2 hours ago

Making mistakes does not make people unintelligent.

qsera 2 hours ago

People don't hallucinate. That is they can pretty reliably assess if they know or don't know something.

dilap 2 days ago

That description is really only fair for base models†. Something like Opus 4.6 has all kinds of other training on top of that which teach it behaviors beyond "predict most probable token," like problem-solving and being a good chatbot.

(†And even then is kind of overly-dismissive and underspecified. The "most probable word" is defined over some training data set. So imagine if you train on e.g. mathematicians solving problems... To do a good job at predicting [w/o overfitting] your model will have to in fact get good at thinking like a mathematician. In general "to be able to predict what is likely to happen next" is probably one pretty good definition of intelligence.)

gpm 2 days ago

I'd disagree, the other training on top doesn't alter the fundamental nature of the model that it's predicting the probabilities of the next token (and then there's a sampling step which can roughly be described as picking the most probable one).

It just changes the probability distribution that it is approximating.

To the extent that thinking is making a series of deductions from prior facts, it seems to me that thinking can be reduced to "pick the next most probable token from the correct probability distribution"...

dilap 2 days ago

The fundamental nature of the model is that it consumes tokens as input and produces token probabilities as output, but there's nothing inherently "predictive" about it -- that's just perspective hangover from the historical development of how LLMs were trained. It is, fundamentally, I think, a general-purpose thinking machine, operating over the inputs and outputs of tokens.

(With this perspective, I can feel my own brain subtly oferring up a panoply of possible responses in a similar way. I can even turn up the temperature on my own brain, making it more likely to decide to say the less-obvious words in response, by having a drink or two.)

(Similarly, mimicry is in humans too a very good learning technique to get started -- kids learning to speak are little parrots, artists just starting out will often copy existing works, etc. Before going on to develop further into their own style.)

earthscienceman 3 hours ago

Non-sequitor: "perspective hangover" might be my favorite phrase I've ever read. So much of what we deal with is trying to correct-the-record on how we used to think about things. But the inertia that old ideas or modes have is monumental to overcome. If you just came up with that, kudos.

vidarh 2 days ago

Put a loop around an LLM and, it can be trivially made Turing complete, so it boils down to whether thinking requires exceeding the Turing computable, and we have no evidence to suggest that is even possible.

gpm 2 days ago

What are you doing in your loop?

As typically deployed [1] LLMs are not turing complete. They're closer to linear bounded automaton, but because transformers have a strict maximum input size they're actually a subset of the weaker class of deterministic finite automaton. These aren't like python programs or something that can work on as much memory as you supply them, their architecture works on a fixed maximum amount of memory.

I'm not particularly convinced turing complete is the relevant property though. I'm rather convinced that I'm not turing complete either... my head is only so big after all.

[1] i.e. in a loop that appends output tokens to the input and has some form of sliding context window (perhaps with some inserted instructions to "compact" and then sliding the context window right to after those instructions once the LLM emits some special "done compacting" tokens).

[2] Common sampling procedures make them mildly non-deterministic, but I don't believe they do so in a way that changes the theoretical class of these machines from DFAs.

vidarh 2 days ago

Context effectively provifes an IO port, and so all the loop needs to do is to simulate the tape head, and provide a single token of state.

You can not be convinced Turing complete is relevant all you want - we don't know of any more expansive category of computable functions, and so given that an LLM in the setup described is Turing complete no matter that they aren't typically deployed that way is irrelevant.

They trivially can be, and that is enough to make the shallow dismissal of pointing out they're "just" predicting the next token meaningless.

roywiggins 2 days ago

Turing Machines don't need access to the entire tape all at once, it's sufficient for it to see one cell at a time. You could certainly equip an LLM with a "read cell", "write cell", and "move left/right" tool and now you have a Turing machine. It doesn't need to keep any of its previous writes or reads in context. A sliding context window is more than capacious enough for this.

gpm 2 days ago

You're right of course, but at the point where you're saying "well we can make a turing machine with the LLM as the transition function by defining some tool calls for the LLM to interact with the tape" it feels like a stretch to call the LLM itself turing complete.

Also people definitely talk about them as "thinking" in contexts where they haven't put a harness capable of this around them. And in the common contexts where people do put harness theoretically capable of this around the LLM (e.g. giving the LLM access to bash), the LLM basically never uses that theoretical capability as the extra memory it would need to actually emulate a turing machine.

And meanwhile I can use external memory myself in a similar way (e.g. writing things down), but I think I'm perfectly capable of thinking without doing so.

So I persist in my stance that turing complete is not the relevant property, and isn't really there.

vidarh 13 hours ago

That's why I specifically didn't call the LLM itself Turing complete, but stated that if you put a loop around a Turing machine you can trivially make it Turing complete. Maybe I should have been clearer and write "the combined system" instead of it.

But the point is that this is irrelevant, because it is proof that unlesss human brains exceed the Turing computable, LLM's can at least theoretically be made to think. And that makes pushing the "they're just predicting the next token" argument anti-intellectual nonsense.

roywiggins 8 hours ago

I am not sure it is proof, at least not in an interesting way. It's also proof that Magic: The Gathering could theoretically be made to think. Which is true but doesn't tell you anything much about MtG other than that it is a slightly complicated ruleset that has a couple of properties that are pretty common.

I think both sides of this end up proving "too much" in their respective directions.

roywiggins 2 days ago

Yeah, humans and LLMs and a TM transition function are all Turing complete in the same way, but it's also basically a useless fact. You could possibly train a sufficiently motivated rat to compute a TM transition function.

empath75 2 days ago

No physically realizable machine is technically turing complete.

But it is trivially possible to give systems-including-LLMs external storage that is accessible on demand.

greiskul 23 hours ago

> whether thinking requires exceeding the Turing computable

I've never seen any evidence that thinking requires such a thing.

And honestly I think theoretical computational classes are irrelevant to analysing what AI can or cannot do. Physical computers are only equivalent to finite state machines (ignoring the internet).

But the truth is that if something is equivalent to a finite state machine, with an absurd number of states, it doesn't really matter.

vidarh 13 hours ago

Hence why I finished the sentence "and we have no evidence to suggest that is even possible".

I think it's exceedingly improbable that we're any more than very advanced automatons, but I like to keep the door ajar and point out that the burden is on those claiming this to present even a single example of a function we can compute that is outside the Turing computable if they want to open that door..

> Physical computers are only equivalent to finite state machines (ignoring the internet)

Physical computers are equivalent to Turing machines without the tape as long as they have access to IO.

ericd 2 days ago

I think it's pretty likely that "intelligence" is emergent behavior that comes when you predict what comes next in physical reality well enough, at varying timescales. Your brain has to build all sorts of world model abstractions to do that over any significant timescale. Big LLMs have to build internal world models, too, to do well at their task.

tux3 2 days ago

>Are not LLMs supposed to just find the most probable word that follows next like many people here have touted?

The base models are trained to do this. If a web page contains a problem, and then the word "Answer: ", it is statistically very likely that what follows on that web page is an answer. If the base model wants to be good at predicting text, at some point learning the answer to common question becomes a good strategy, so that it can complete text that contains these.

NN training tries to push models to generalize instead of memorizing the training set, so this creates an incentive for the model to learn a computation pattern that can answer many questions, instead of just memorizing. Whether they actually generalize in practice... it depends. Sometimes you still get copy-pasted input that was clearly pulled verbatim from the training set.

But that's only base models. The actual production LLMs you chat with don't predict the most probable word according to the raw statistical distribution. They output the words that RLHF has rewarded them to output, which includes acting as an assistant that answers questions instead of just predicting text. RLHF is also the reason there are so many AI SIGNS [1] like "you're absolutely right" and way more use of the word "delve" than is common in western English.

[1]: https://en.wikipedia.org/wiki/WP:AISIGNS

IgorPartola 2 days ago

In some cases solving a problem is about restating the problem in a way that opens up a new path forward. “Why do planets move around the sun?” vs “What kind of force exists in the world that makes planets tethered to the sun with no visible leash?” (Obviously very simplified but I hope you can see what I am saying.) Given that a human is there to ask the right questions it isn’t just an LLM.

Further, some solutions are like running a maze. If you know all the wrong turns/next words to say and can just brute force the right ones you might find a solution like a mouse running through the maze not seeing the whole picture.

Whether this is thinking is more philosophical. To me this demonstrates more that we are closer to bio computers than an LLM is to having some sort of divine soul.

ainiriand 2 days ago

Thanks for your input. The way I saw this and how it looks Knuth interpreted it is that there were some reasoning steps taken by Claude independently. Some internal decisions in the model that made it try different things, finally succeeding.

sega_sai 2 days ago

In some sense that is still correct, i.e. the words are taken from some probability distribution conditional on previous words, but the key point is that probability distribution is not just some sort of average across the internet set of word probabilities. In the end this probability distribution is really the whole point of intelligence. And I think the LLMs are learning those.

pvillano 2 days ago

Does water flowing through a maze solve it by 'thinking'? No. The rules of physics eventually result in the water flowing out the exit. Water also hits every dead end along the way.

The power of LLMs is that by only selecting sequences of words that fit a statistical model, they avoid a lot of dead ends.[^1]

I would not call that, by itself, thinking. However, if you start with an extrapolation engine and add the ability to try multiple times and build on previous results, you get something that's kind of like thinking.

[1]: Like, a lot of dead ends. There are an unfathomable number of dead ends in generating 500 characters of code, and it is a miracle of technology that Claude only hit 30.

vjerancrnjak 2 days ago

No. There is good signal in IMO gold medal performance.

These models actually learn distributed representations of nontrivial search algorithms.

A whole field of theorem provingaftwr decades of refinements couldn’t even win a medal yet 8B param models are doing it very well.

Attention mechanism, a bruteforce quadratic approach, combined with gradient descent is actually discovering very efficient distributed representations of algorithms. I don’t think they can even be extracted and made into an imperative program.

adamtaylor_13 2 days ago

That's the way many people reduce it, and mathematically, I think that's true. I think what we fail to realize is just far that will actually take you.

"just the most probable word" is a pretty powerful mechanism when you have all of human knowledge at your fingertips.

I say that people "reduce it" that way because it neatly packs in the assumption that general intelligence is something other than next token prediction. I'm not saying we've arrived at AGI, in fact, I do not believe we have. But, it feels like people who use that framing are snarkily writing off something that they themselves to do not fully comprehend behind the guise of being "technically correct."

I'm not saying all people do this. But I've noticed many do.

qsera 2 days ago

Yes, that is exactly what they do.

But that does not mean that the results cannot be dramatic. Just like stacking pixels can result in a beautiful image.

crocowhile 2 days ago

Those people still exist? I only know one guy who is still fighting those windmills

qsera 2 days ago

Yes, I am one.

ezst 2 days ago

[flagged]

kaiokendev 2 days ago

Given some intelligent system, an AI that perfectly reproduces any sequence that system could produce must encode the patterns that superset that intelligence.

wrsh07 2 days ago

Imagine training a chess bot to predict a valid sequence of moves or valid game using the standard algebraic notation for chess

Great! It will now correctly structure chess games, but we've created no incentive for it to create a game where white wins or to make the next move be "good"

Ok, so now you change the objective. Now let's say "we don't just want valid games, we want you to predict the next move that will help that color win"

And we train towards that objective and it starts picking better moves (note: the moves are still valid)

You might imagine more sophisticated ways to optimize picking good moves. You continue adjusting the objective function, you might train a pool of models all based off of the initial model and each of them gets a slightly different curriculum and then you have a tournament and pick the winningest model. Great!

Now you might have a skilled chess-playing-model.

It is no longer correct to say it just finds a valid chess program, because the objective function changed several times throughout this process.

This is exactly how you should think about LLMs except the ways the objective function has changed are significantly significantly more complicated than for our chess bot.

So to answer your first question: no, that is not what they do. That is a deep over simplification that was accurate for the first two generations of the models and sort of accurate for the "pretraining" step of modern llms (except not even that accurate, because pretraining does instill other objectives. Almost like swapping our first step "predict valid chess moves" with "predict stockfish outputs")

lijok 2 days ago

To get an answer to that you would first have to define 'thinking'

noslenwerdna 2 days ago

I find this kind of reduction silly.

All your brain is doing is bouncing atoms off each other, with some occasionally sticking together, how can it be really thinking?

See how silly it sounds?

esafak 2 days ago

Are you feigning ignorance? The best way to answer a question, like completing a sentence, is through reasoning; an emergent behavior in complex models.

adampunk 2 days ago

Thinking is a big word that sweeps up a lot of different human behavior, so I don't know if it's right to jump to that; HOWEVER, explanations of LLMs that depend heavily on next-token prediction are defunct. They stopped being fundamentally accurate with the rise of massive reinforcement learning and w/ 'reasoning' models the analogy falls apart when you try to do work with it.

Be on the lookout for folks who tell you these machines are limited because they are "just predicting the next word." They may not know what they're talking about.

ecshafer 2 days ago

I wonder how long we have until we start solving some truly hard problems with AI. How long until we throw AI at "connect general relativity and quantum physics", give the AI 6 months and a few data centers, and have it pop out a solution?

rustyhancock 2 days ago

I think a very long time because part of our limit is experiment.

We need enough experimental results to explain to solve these theoretical mismatches and we don't and at present can't explore that frontier.

Once we have more results at that frontier we'd build a theory out from there that has two nearly independent limits for QFT and GR.

What we'd be asking if the AI is something that we can't expect a human to solve even with a lifetime of effort today.

It'll take something in par with Newton realising that the heavens and apples are under the same rules to do it. But at least Newton got to hold the apple and only had to imagine he could a star.

eru 2 days ago

> I think a very long time because part of our limit is experiment.

Yes, maybe. But if you are smarter, you can think up better experiments that you can actually do. Or re-use data from earlier experiments in novel and clever ways.

fleischhauf 2 days ago

this. could already be useful to narrow down the search space

smj-edison 18 hours ago

Agreed. We have lots of theories like string theory, but until we can make an experiment to prove one way or another it remains a theory.

bob1029 2 days ago

What prevents us from giving this system access to other real systems that live in physical labs? I don't see much difference between parameterizing and executing a particle accelerator run and invoking some SQL against a provider. It's just JSON on the wire at some level.

rustyhancock 2 days ago

Nothing, we can give it all the data we have and have it lead experiments.

But we can not yet experiment at the GR/QFT frontier.

To do so with a particle accelerator it would need to be the size of the milky way.

fragmede 2 days ago

The question is, if you trained an LLM on everything up until 1904, could it come up with E=MC² or not?

rustyhancock 2 days ago

In 1900 Henri Poincaré wrote that radiation (light) has an effective mass given by E/c^2.

So it really isn't far fetched. What intrigues me more is if it was capable of it would our Victorian conservative minded scientists have RLHF it out of that kind of thing?

booleandilemma 14 hours ago

Even if the AI could suggest experiments to try, and tell us "check that out and get back to me with the results", that would be valuable.

emp17344 2 days ago

Hold your horses, that’s a long way off. The best math AI tool we currently have, Aletheia, was only able to solve 13 out of 700 attempted open Erdos problems, only 4 of which were solved autonomously: https://arxiv.org/html/2601.22401v3

Clearly, these models still struggle with novel problems.

slibhb 2 days ago

> Clearly, these models still struggle with novel problems.

Do they struggle with novel problems more or less than humans?

Filligree 2 days ago

Less than most humans, but more than many humans.

worldsavior 2 days ago

If AGI will ever come, then. Currently, AI is only a statistical machines, and solutions like this are purely based on distribution and no logic/actual intelligence.

zarzavat 2 days ago

I swear that AI could independently develop a cure for cancer and people would still say that it's not actually intelligent, just matrix multiplications giving a statistically probable answer!

LLMs are at least designed to be intelligent. Our monkey brains have much less reason to be intelligent, since we only evolved to survive nature, not to understand it.

We are at this moment extremely deep into what most people would have been considered to be actual artificial intelligence a mere 15 years ago. We're not quite at human levels of intelligence, but it's close.

qsera 2 days ago

>AI could independently develop a cure for cancer

All the answers for all your questions is contained in randomness. If you have a random sentence generator, there is a chance that it will output the answer to this question every time it is invoked.

But that does not actually make it intelligent, does it?

famouswaffles 2 days ago

You are arguing a point no-one is making. LLMs are not random sentence generators. Its probability distributions are anything but random. You could make an actual random sentence generator, but no-one would argue about its intelligence.

graemefawcett 2 days ago

This is exactly how problem solving works, regardless of the substrate of cognition.

Start with "all your questions contained in randomness" -> the unconstrained solution space.

The game is whether or not you can inject enough constraints to collapse the solution space to one that can be solved before your TTL expires. In software, that's generally handled by writing efficient algorithms. With LLMs, apparently the SOTA for this is just "more data centers, 6 months, keep pulling the handle until the right tokens fall out".

Intelligence is just knowing which constraints to apply and in what order such that the search space is effectively partitioned, same thing the "reasoning" traces do. Same thing thermostats, bacteria, sorting algorithms and rivers do, given enough timescale. You can do the same thing with effective prompting.

The LLM has no grounding, no experience and no context other than which is provided to it. You either need to build that or be that in order for the LLM to work effectively. Yes, the answers for all your questions are contained. No, it's not randomness. It's probability and that can be navigated if you know how

qsera 2 days ago

You can constrain the solution space all you want, but if you don't have a method to come up with possible solutions that might match the constraints, you ll be just sitting there all day long for the machine to produce some results. So intelligence is not "just knowing which constraints to apply". It is also the ability to come up with solutions within the constraints without going through a lot of trial and error...

But hey, if LLMs can go through a lot of trial and error, it might produce useful results, but that is not intelligence. It is just a highly constrained random solution generator..

graemefawcett 2 days ago

I believe that's I and the paper are both saying as well. The LLM is pure routing, the constraints currently are located elsewhere in the system. In this case, both the constraints and the motivation to perform the work are located in Knuth and his assistant.

Routing is important, it's why we keep building systems that do it faster and over more degrees of freedom. LLMs aren't intelligent on their own, but it's not because they don't have enough parameters

wang_li 2 days ago

Last week I put "was val kilmer in heat" into the search box on my browser. The AI answer came back with "No, Val Kilmer was not in heat. Val Kilmer played Chris Shiherlis in the movie Heat but the film did not indicate that he was pregnant or in heat. His performance was nuanced and skilled and represents a high point of the film." I was not curious about whether he was pregnant.

We are not only not close to human level of intelligence, we are not even at dog, cat, or mouse levels of intelligence. We are not actually at any level of intelligence. Devices that produce text, images, or code do not demonstrate intelligence any more than a printer producing pages of beautiful art demonstrate intelligence.

DennisP 2 days ago

Honestly, when I read your first sentence, given the lack of a capital H, my brain initially went the same direction the AI did. Then I realized what you meant but since I already went there, I might have made a similar response as a joke. For the sake of my ego I'm forced to reject your claim that this is evidence of stupidity.

logicprog 2 days ago

> I was not curious about whether he was pregnant.

I interpreted the question the same way the AI did.

sosodev 2 days ago

The model that processes search results is tiny and dumb. You shouldn't compare it to the frontier models that are solving complex math problems.

StilesCrisis 2 days ago

On Google, just clicking "AI Mode" gives you a substantially smarter model, and it's still pretty weak. But I assume the OP wasn't talking about Google because it doesn't seem to make this mistake even in a search.

wang_li 2 days ago

It was bing as that is the default for Edge as supplied on my work laptop. It doesn't do this now, but it does do something else quite weird:

search: was val kilmer pregnant or in heat

answer: Not pregnant Val Kilmer was not pregnant or in heat during the events of "Heat." His character, Chris Shiherlis, is involved in a shootout and is shot, which indicates he is not in a reproductive or mating state at that time.

And then cites wikipedia as the source of information.

In terms of cognition the answer is meaningless. Nothing in the question implies or suggests that the question has to do with a movie. Additionally, "involved in a shootout and is shot, which indicates he is not in a reproductive or mating state" makes no sense at all.

AI as deployed shows no intelligence.

Philpax 24 hours ago

If you asked a three-year-old a question that they proceeded to completely flub, would you then assume that all humans are incapable of answering questions correctly?

Nobody is arguing for the quality of the search overviews. The models that impress us are several orders of magnitude larger in scale, and are capable of doing things like assisting preeminent computer scientists (the topic of discussion) and mathematicians (https://github.com/teorth/erdosproblems/wiki/AI-contribution...).

StilesCrisis 6 hours ago

Microsoft is bad at AI and this is a great example. I'm wondering if someone saw your post on HN and tried to hardcode a rule here, because I agree, it's nonsense. None of the actual AI companies are emitting nonsense like this.

akoboldfrying 13 hours ago

It's clearly just a hallucination. Everyone knows there was never a movie called Heat, Val Kilmer did not play Chris Shiherlis in it, and he has always been pregnant.

worldsavior 2 days ago

That's wrong. Humans were evolved to have big brains so they can better understand the env and use it to their advantage.

I still see AI making stupid silly mistakes. I rather think and not waste time on something that only remembers data, and doesn't even understand it.

Reasoning in AI is only about finding contradictions between his "thoughts", not actually understand it.

someplaceguy 2 days ago

> I still see AI making stupid silly mistakes.

In contrast with humans, who are famously known for never making stupid silly mistakes...

_fizz_buzz_ 2 days ago

> I still see AI making stupid silly mistakes.

Humans also make silly mistakes.

whimsicalism 2 days ago

It only took 4 years, but it appears that this view is finally dying out on HN. I would advise everyone who found this viewpoint compelling to think about how those same blinders might be affecting how you are imagining the future to look like.

rustyhancock 2 days ago

I don't even think that's the issue.

The issue to my mind is a lack of data at the meeting of QFT/GR.

Afterall few humans historically have been capable of the initial true leap between ontologies. But humans are pretty smart so we can't say that is a requirement for AGI.

worldsavior 2 days ago

When it comes to revolutionary/unsolved subjects, there will never be enough data. That's why its revolutionary/unsolved.

cjcole 2 days ago

Maybe.

“The laws of nature should be expressed in beautiful equations.”

- Paul Dirac

“It is, indeed, an incredible fact that what the human mind, at its deepest and most profound, perceives as beautiful finds its realisation in external nature. What is intelligible is also beautiful. We may well ask: how does it happen that beauty in the exact sciences becomes recognizable even before it is understood in detail and before it can be rationally demonstrated? In what does this power of illumination consist?”

- Subrahmanyan Chandrasekhar

“I often follow Plato’s strategy, proposing objects of mathematical beauty as models for Nature.”

“It was beauty and symmetry that guided Maxwell and his followers.”

- Frank Wilczek

“Beauty, is bound up with symmetry.”

- Herman Weyl

"Still twice in the history of exact natural science has this shining-up of the great interconnection become the decisive signal for significant progress. I am thinking here of two events in the physics of our century: the rise of the theory of relativity and that of the quantum theory. In both cases, after yearlong unsuccessful striving for understanding, a bewildering abundance of details was almost suddenly ordered. This took place when an interconnection emerged which, thought largely unvisualizable, was finally simple in its substance. It convinced through its compactness and abstract beauty – it convinced all those who can understand and speak such an abstract language."

- Werner Heisenberg

Maybe (just maybe) these things (whatever you want to call them) will (somehow) gain access to some "compact", beautiful, "largely unvisualizable" "interconnection" which will be the self-evident solution. And if they do, many will be sure to label it a statistical accident from a stochastic parrot. And they'll right, for some definitions of "statistical", "accident", "stochastic", and "parrot".

bobbylarrybobby 2 days ago

Did you read the linked paper? Claude out-reasoned humans on a challenging (or at least, unsolved) math problem.

cjcole 2 days ago

"humans"

Donald Knuth is an extremal outlier human and the problem is squarely in his field of expertise.

Claude, guided by Filip Stappers, a friend of Knuth, solved a problem that Knuth and Stappers had been working on for several weeks. Unfortunately, it doesn't seem (from my quick scan) to have been stated how long (or how many tokens or $) it took for Claude + Stappers to complete the proof.

In response, Knuth said: "It seems that I’ll have to revise my opinions about “generative AI” one of these days."

Seems like good advice. From reading elsewhere in this comment section, the goalposts seem to be approaching the infrared and will soon disappear from the extreme redshift due to rate at which they are receding with each new achievement.

emp17344 2 days ago

What goalposts do you think are being moved? I constantly see AI enthusiasts use this phrase, but it’s not clear what goalposts they have in mind. Specifically, what is it that you want opponents to recognize that you believe they aren’t currently?

We now have a tool that can be useful in some narrow domains in some narrow cases. It’s pretty neat that our tools have new capabilities, but it’s also pretty far from AGI.

cjcole 2 days ago

I'm not an enthusiast. I'm a Butlerian.

Imagine hearing pre-attention-is-all-you-need that "AI" could do something that Donald Knuth could not (quickly solve the stated problem in collaboration with his friend).

The idea that this (Putnam perfect, IMO gold, etc) is all just "statistical parrot" stuff is wearing a little thin.

inertiatic 22 hours ago

>We now have a tool that can be useful in some narrow domains in some narrow cases.

I get being reserved about where this goes, but saying something like this is quite insane at this point.

whimsicalism 2 days ago

You must have forgotten the /s at the end of your comment?

emp17344 2 days ago

Uh, no? You think LLMs are AGI?

worldsavior 2 days ago

Merely luck in my opinion. There could be also multiple times where it didn't solve it.

graemefawcett 2 days ago

Connecting them is easy, one is the math of the exchange and one of the state machine.

A better question might be why no one is paying more attention to Barandes at Harvard. He's been publishing the answer to that question for a while, if you stop trying to smuggle a Markovian embedding in a non-Markovian process you stop getting weird things like infinities at boundaries that can't be worked out from current position alone.

But you could just dump a prompt into an LLM and pull the handle a few dozen times and see what pops out too. Maybe whip up a Claw skill or two

Unconstrained solution space exploration is surely the way to solve the hard problems

Ask those Millenium Prize guys how well that's working out :)

Constraint engineering is all software development has ever been, or did we forget how entropy works? Someone should remind the folk chasing P=NP that the observer might need a pen to write down his answers, or are we smuggling more things for free that change the entire game? As soon as the locations of the witness cost, our poor little guy can't keep walking that hypercube forever. Can he?

Maybe 6 months and a few data centers will do it ;)

piokoch 15 hours ago

You will get a usual AI slop that will be the mixture of the articles and books it was trained on. You can try it even now.

lacoolj 5 hours ago

OK so now I need someone to take this problem and feed it into Gemini Deep Think or whatever and see if you get the same (or better/worse) outcome.

No one cares about ChatGPT so don't bother with that.

OK GO

ontouchstart 2 days ago

Fascinating report by DEK himself.

Time to sit down, read, digest and understand it without the help of LLM.

ontouchstart 2 days ago

I don't have time to do that myself yet so I just dug a quick TL;DR rabbit hole for fun:

https://ontouchstart.github.io/rabbit-holes/llm_rabbit_hole_...

tkel 18 hours ago

Lol, it's longer than the original article.

taylorius 2 days ago

I thought Claude Monet - Impressionist techniques applied to coding.

ibic 20 hours ago

Wow, it's from Donald Knuth.

zackmorris 2 days ago

Amazing paper. The simulated annealing portion reminds me of genetic algorithms (GAs). A good intro to that are the Genetic Programming series of books by John Koza, I read III in the early 2000s:

https://www.amazon.com/Genetic-Programming-III-Darwinian-Inv...

https://www.genetic-programming.com/

Note that the Python solution in the pdf is extremely short, so could have been found by simply trying permutations of math operators and functions on the right side of the equation.

We should be solving problems in Lisp instead of Python, but no matter. That's because Lisp's abstract syntax tree (AST) is the same as its code due to homoiconicity. I'm curious if most AIs transpile other languages to Lisp so that they can apply transformations internally, or if they waste computation building programs that might not compile. Maybe someone at an AI company knows.

I've been following AI trends since the late 1980s and from my perspective, nothing really changed for about 40 years (most of my life that I had to wait through as the world messed around making other people rich). We had agents, expert system, fuzzy logic, neural nets, etc since forever, but then we got video cards in the late 1990s which made it straightforward to scale neural nets (NNs) and GAs. Unfortunately due to poor choice of architecture (SIMD instead of MIMD), progress stagnated because we don't have true multicore computing (thousands or millions of cores with local memories), but I digress.

Anyway, people have compared AI to compression. I think of it more as turning problem solving into a O(1) operation. Over time, what we think of as complex problems become simpler. And the rate that we're solving them is increasing exponentially. Problems that once seemed intractable only were because we didn't know the appropriate abstractions yet. For example, illnesses that we thought would never be cured now have vaccines through mRNA vaccines and CRISPR. That's how I think of programming. Now that we have LLMs, whole classes of programming problems now have O(1) solutions. Even if that's just telling the computer what problem to solve.

So even theorem proving will become a solved problem by the time we reach the Singularity between 2030 and 2040. We once mocked GAs for exploring dead ends and taking 1000 times the processing power to do simple things. But we ignored that doing hard things is often worth it, and is still a O(1) operation due to linear scaling.

It's a weird feeling to go from no forward progress in a field to it being effectively a solved problem in just 2 years. To go from trying to win the internet lottery to not being sure if people will still be buying software in a year or two if/when I finish a project. To witness all of that while struggling to make rent, in effect making everything I have ever done a waste of time since I knew better ways of doing it but was forced to drop down to whatever mediocre language or framework paid. As the problems I was trained to solve and was once paid to solve rapidly diminish in value because AI can solve them in 5 minutes. To the point that even inventing AGI would be unsurprising to most, so I don't know why I ever went into computer engineering to do exactly that. Because for most people, it's already here. As I've said many times lately, I thought I had more time.

Although now that we're all out of time, I have an uncanny feeling of being alive again. I think tech stole something from my psyche so profound that I didn't notice its loss. It's along the lines of things like boredom, daydreaming, wasting time. What modern culture considers frivolous. But as we lose every last vestige of the practical, as money becomes harder and harder to acquire through labor, maybe we'll pass a tipping point where the arts and humanities become sought-after again. How ironic would it be if the artificial made room for the real to return?

On that note, I read a book finally. Hail Mary by Andy Weir. The last book I read was Ready Player One by Ernest Cline, over a decade ago. I don't know how I would have had the bandwidth to do that if Claude hadn't made me a middle manager of AIs.

jdnier 2 days ago

> I think Claude Shannon’s spirit is probably proud to know that his name is now being associated with such advances. Hats off to Claude!

I didn't realize Claude was named after Claude Shannon!

https://en.wikipedia.org/wiki/Claude_Shannon

tzumaoli 2 days ago

Trivia: Claude Shannon proposed the idea of predicting the next token (letter) using statistics/probabilities in the training data corpus in 1950: "Prediction and Entropy of Printed English" https://languagelog.ldc.upenn.edu/myl/Shannon1950.pdf

Anon84 2 days ago

It goes back a bit further than that. His 1948 “Mathematical theory of communication” [1] already has (what we would now call) a Markov chain language model, page 7 onwards. AFAIK, this was based on his classified WWII work so it was probably a few years older than that

[1] https://people.math.harvard.edu/~ctm/home/text/others/shanno...

aix1 2 days ago

I was just reading Norbert Wiener's "The Human Use of Human Beings" (1950) and this quote gave me a good chuckle:

"One may get a remarkable semblance of a language like English by taking a sequence of words, or pairs of words, or triads of words, according to the statistical frequency with which they occur in the language, and the gibberish thus obtained will have a remarkably persuasive similarity to good English."

Trinicode 2 days ago

A letter is not a token, is it? Redundancy could hit 75% in long sentences, but Shannon was not predicting tokens or words, he was predicting letters (characters).

pfdietz 2 days ago

It's like the diesel engine, which is named after Rudolf Engine.

ai_critic 2 days ago

roer 2 days ago

Is this a joke I don't get? His name was Rudolf Diesel, right?

stavros 23 hours ago

Yes, it is a fantastic joke and I laughed for ages, well played.

bread-wood 2 days ago

Here I was assuming it was named after https://en.wikipedia.org/wiki/Claude_(alligator)

SenorKimchi 2 days ago

And Claude had a collection of cycles, unicycles. Unfortunately the article is about something else altogether.

teekert 2 days ago

Last time I asked Claude itself also didn’t know.

NitpickLawyer 2 days ago

Wait till you hear about nvidia and their GPU architecture naming scheme :)

dfilppi 2 days ago

[dead]

shubhamintech 19 hours ago

[flagged]

Steinmark 2 days ago

[flagged]

akssassin907 19 hours ago

[flagged]

miroljub 2 days ago

Solves? It's a part of the training set. Nothing more, nothing less.

rpdillon 2 days ago

Opening sentences:

> Shock! Shock! I learned yesterday that an open problem I’d been working on for several weeks had just been solved by Claude Opus 4.6— Anthropic’s hybrid reasoning model that had been released three weeks earlier! It seems that I’ll have to revise my opinions about “generative AI” one of these days. What a joy it is to learn not only that my conjecture has a nice solution but also to celebrate this dramatic advance in automatic deduction and creative problem solving.

sigmar 2 days ago

I think we're going to have several years of people claiming genAI "didn't really do something novel here," despite experts saying otherwise, because people are scared by the idea that complex problem solving isn't exclusive to humans (regardless of whether these models are approaching general intelligence).

allreduce 2 days ago

I encourage you to look at what the current models with a bit of harnessing are capable of, e.g. Opus 4.6 and Claude Code. Try to make it solve some mathematics-heavy problem you come up with. If only to get a more accurate picture of whats going on.

Unfortunately, these tools generalize way beyond regurgitating the training set. I would not assume they stay below human capabilities in the next few years.

Why any moral person would continue building these at this point I don't know. I guess in the best case the future will have a small privileged class of humans having total power, without need for human workers or soldiers. Picture a mechanical boot stomping on a human face forever.

nemo1618 2 days ago

If this was a joke, it certainly flew over most people's heads...

jcims 2 days ago

Prove it.

romaniv 2 days ago

I would like to note that it would be trivial to definitively prove or disprove such things if we had a searchable public archive of the training data. Interestingly, the same people (and corporate entities) who loudly claim that LLMs are creating original work seem to be utterly disinterested in having actual, definitive proof of their claims.

clbrmbr 2 days ago

This would be awesome. Even titles and shasums could be enough.

mwigdahl 2 days ago

Did you read the article? It was an open problem.

bluGill 2 days ago

Was it? It was an open problem to Knuth - who generally knows how to search literature. However there is enough literature to search that it wouldn't be a surprise at all to discover it was already solved but he just used slightly different terms and so didn't find it. Or maybe it was sovled because this is a specialization of something that looks unrelated and so he wouldn't have realized it when he read it. Or...

Overall I'm going with unsolved, because Knuth is a smart person who I'd expect to not miss the above. I'm also sure he falls for the above all the time even though the majority of the time he doesn't.

mwigdahl 2 days ago

Agreed with all of that, but with the added point that Knuth has done a lot of work in this exact area in The Art of Computer Programming Volume 4. If he considers this conjecture open given his particular knowledge of the field, it likely is (although agreed, it's not guaranteed).

ordu 2 days ago

> If he considers this conjecture open given his particular knowledge of the field, it likely is (although agreed, it's not guaranteed).

It is as good as guaranteed. If Knuth says it doesn't know how to solve the problem, and if anyone knows, then they will inform Knuth about it. Knuth not just a very knowledgeable person, but a celebrity also.

skinner_ 24 hours ago

Also, if Claude had regurgitated a known solution, it would have come up with it in the first exploration round, not the 31st, as it actually did.