Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>> 1) Next token prediction can itself be argued to be a task that requires reasoning

That is wishful thinking popularised by Ilya Sutskever and Greg Brockman of OpenAI to "explain" why LLMs are a different class of system than smaller language models or other predictive models.

I'm sorry to say that (John Mearsheimer voice) that's simply not a serious argument. Take a multivariate regression model that predicts blood pressure from demographic data (age, sex, weight, etc). You can train a pretty accurate model for that kind of task if you have enough data (a few thousand data points). Does that model need to "reason" about human behaviour in order to be good at predicting BP? Nope. All it needs is a lot of data. That's how statistics works. So why is it different for a predictive model of BP and different for a next-token prediction model? The only answer seems to be "because language is magickal and special". But without any attempt to explain why, in terms of sequence prediction, language is special. Unless the er reasoning is that humans can produce language, humans can reason, LLMs can produce language, therefore LLMs can reason; which obviously doesn't follow.

But I have to guess here because neither Sutskever nor Brockman have ever tried to explain why next token prediction needs reasoning (or, more precisely, "understanding", the term they have used).




  > That is wishful thinking popularised by Ilya Sutskever
Ilya and Hinton have claimed even crazier things

  | to understand next token prediction you must understand the casual reality
This is objectively false. It's a result known in physics to be wrong for centuries. You can probably reason a weaker case yourself, that I'm sure you can make accurate predictions about some things without fully understanding them.

But the stronger version is the entire difficulty of physics and causal modeling. Distinguishing a confounding variable is very very hard. But you can still make accurate predictions without access to the underlying causal graph


Hinton and Sutskever are victims of their own success: they can say whatever they like and nobody dares criticise them, or tell them how they're wrong.

I recently watched a video of Sutskever speaking to some students, not sure where and I can't dig out the link now. To summarise he told them that the human brain is a biological computer. He repeated this a couple of times then said that this is why we can create a digital computer that can do everything a brain can.

This is the computational theory of mind, reduced to a pin-point with all context removed. Two seconds of thought suffice to show how that doesn't work: if a digital computer can do everything the brain can do, because the brain is a biological computer, then how come the brain can't do everything a digital computer can do? Is it possible that two machines can be both computers, and still not equivalent in every sense of the term? Nooooo!!! Biological computers!! AGI!!

Those guys really need to stop and think about what they're talking about before someone notices what they're saying and the entire field becomes a laughing stock.


> Two seconds of thought suffice to show how that doesn't work: if a digital computer can do everything the brain can do, because the brain is a biological computer, then how come the brain can't do everything a digital computer can do? Is it possible that two machines can be both computers, and still not equivalent in every sense of the term? Nooooo!!! Biological computers!! AGI!!

Another two seconds of thought would suffice to answer that: because you can freely change neither hardware or software of the brain, like you can with computers.

Obviously, Angry Birds on the phone can't do everything digital computers can do, but that doesn't mean a smartphone isn't a digital computer.


Another 2 seconds of thought might have told you only a magic genie can "freely" change hardware and software capability.

Humans have to work within whatever constraints accompany being physical things with physical bodies trying to invent software and hardware in the physical world.


I'm fine with calling the brain a computer. A computer is a very vague term. But yes, I agree that the conclusion does not necessarily follow. It's possible, not not necessarily


> Take a multivariate regression model that predicts blood pressure from demographic data (age, sex, weight, etc). You can train a pretty accurate model for that kind of task if you have enough data (a few thousand data points). Does that model need to "reason" about human behaviour in order to be good at predicting BP? Nope. All it needs is a lot of data. That's how statistics works. So why is it different for a predictive model of BP and different for a next-token prediction model?

For one, because the goal function for the latter is "predict output that makes sense to humans", in the fully broad, fully general sense of that statement.

It's not just one thing, like parse grocery lists, XOR write simple code, XOR write a story, XOR infer sentiment. XOR be a lossy cache for Wikipedia. It's all of them, separate or together, plus much more, plus correctly handling humor, sarcasm, surface-level errors (e.g. typos, naming), implied rules, shorthands, deep errors (think user being confused and using terminology wrong; LLMs can handle that fine), and an uncountable number of other things (because language is special, see below). It's quite obvious this is a different class of things than a narrowly specialized model like BP predictor.

And yes, language is special. Despite Chomsky's protestations to the contrary, it's not really formally structured; all the grammar and syntax and vocabulary is merely classification of high-level patterns that tend to occur (though invention of print and public education definitely strengthened them). Any experience with learning a language, or actual talking to other people, makes it obvious that grammar or vocabulary are neither necessary nor sufficient to communication. At the same time, though, once established, the particular choices become another dimension that packs meaning (as it becomes apparent when e.g. pondering why some books or articles seem better than other).

Ultimately, language not a set of easy patterns you can learn (or code symbolically!) - it's a dance people do when communicating, whose structure is fluid and bound by reasoning capabilities of humans. Being able to reason this way is required to communicate with real humans in real, generic scenarios. Now, this isn't a proof LLMs can do it, but the degree to which they excel at this is at least a strong suggestion they qualitatively could be.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: