Meanwhile you have multiple Fields Medalists (Tau, Gowers) saying they’re very impressed by LLMs’ mathematical reasoning, something that the stochastic parrots thesis (if it has any empirically-predictive content at all) would predict was impossible. I doubt Tau and Gowers thought much of LLMs a few years ago either. But they changed their minds. Who do you want to listen to?
I think it’s time to retire the Stochastic Parrots metaphor. A few years ago a lot of us didn’t think LLMs would ever be capable of doing what they can do now. I certainly didn’t. But new methods of training (RLVR) changed the game and took LLMs far beyond just reducing cross entropy on huge corpuses of text. And so we changed our opinions. Shame Emily Bender hasn’t too.
Sigh.
Did you read TFA? This is precisely one of the non-questions that she answers.
Renaissance Philanthropies is a front for VC companies.
They never publish allocated computational resources, prior art or any novel algorithm that is used in the LLMs. For all we know, all accounts that are known to work on math stunts get 20% of total compute.
In other words, they ignore prior art, do not investigate and just celebrate if they get a vibe math result. It isn't science, it is a disgrace.
Not only would it be a leap to suggest that people automatically lose their integrity by taking funds for projects they believe are useful, especially after involvement with adjacent fields, but you are suggesting merely being impressed by a fund is enough to dismiss their views?
You also have no evidence that Renaissance Philanthropies is a front for VC companies. All news coverage indicates that they seek to be an alternative for high net worth individuals engaging in philanthropy.
Many people discovering Erdos results, engaging in Olympiads etc, are doing so with publicly available models and publish the resources used in the process.
It's rare to read an author who can directly face Brandolini's Law of misinformation asymmetry and not only hold his own against the bullshit but overcome it.
LLMs certainly use something similar, except they understand text as input. LLMs, especially used for marketing stunts, have way more computing power available than any theorem prover ever had. They probably do random restarts if a proof fails which amounts to partially brute forcing.
Lawrence Paulson correctly complained about some of the hype that Lean/LLMs are getting.
ACL2 even uses formulaic text output that describes the proof in human language, despite being all in Common Lisp and not a mythical clanker.
They do not think and use old and well established algorithms or perhaps novel ones that were added.
They act as a learned proposal mechanism on top of hard search. Things like suggesting relevant lemmas, tactics, turning intent into formal steps, and ranking branches based on trained knowledge.
Maybe a kind of learned "intuition engine", from a large corpus of mathematical text, that still has to pass a formal checker. This is not really something we've had to this extent before.
> They do not think
That claim seems less useful, unless “think” is defined in a way that predicts some difference in capability. If the objection is that LLMs are not conscious, fine, but that doesn't say much about whether they can help produce correct formal proofs.
Modelling text describing the world is not modelling (some aspect) of the world?
Modelling the probability that a reader likes or dislike a piece of text is not modelling (some aspect) of a reader's state of mind?