In other words, they learn the game, not how to *play games*.

fsmv · 2025-06-30T18:51:47 1751309507

They memorize the answers not the process to arrive at answers

EternalFury · 2025-06-30T20:59:48 1751317188

They learn the value of specific actions in specific contexts based on the rewards they received during their play time. Specific actions and specific contexts are not transferable for various reasons. John quoted that varying frame rates and variable latency between action and effect really confuse the models.

nightpool · 2025-06-30T21:41:19 1751319679

Okay, so fuzz the frame rate and latency? That feels very easy to fix.

wredcoll · 2025-07-01T00:04:26 1751328266

Good point, you should write to John Carmack and let him know you've figured out the problem.

IshKebab · 2025-06-30T19:00:55 1751310055

This has been disproven so many times... They clearly do both. You can trivially prove this yourself.

0xWTF · 2025-06-30T19:33:40 1751312020

> You can trivially prove this yourself.

Given the long list of dead philosophers of mind, if you have a trivial proof, would you mind providing a link?

IshKebab · 2025-06-30T21:14:02 1751318042

Just go and ask ChatGPT or Claude something that can't possibly be in its training set. Make something up. If it is only memorising answers then it will be impossible for it to get the correct result.

A simple nonsense programming task would suffice. For example "write a Python function to erase every character from a string unless either of its adjacent characters are also adjacent to it in the alphabet. The string only contains lowercase a-z"

That task isn't anywhere in its training set so they can't memorise the answer. But I bet ChatGPT and Claude can still do it.

Honestly this is sooooo obvious to anyone that has used these tools, it's really insane that people are still parroting (heh) the "it just memorises" line.

imiric · 2025-06-30T21:34:22 1751319262

LLMs don't "memorize" concepts like humans do. They generate output based on token patterns in their training data. So instead of having to be trained on every possible problem, they can still generate output that solves it by referencing the most probable combination of tokens for the specified input tokens. To humans this seems like they're truly solving novel problems, but it's merely a trick of statistics. These tools can reference and generate patterns that no human ever could. This is what makes them useful and powerful, but I would argue not intelligent.

IshKebab · 2025-07-01T06:37:32 1751351852

> To humans this seems like they're truly solving novel problems

Because they are. This is some crazy semantic denial. I should stop engaging with this nonsense.

We have AI that is kind of close to passing the Turing test and people still say it's not intelligent...

alternatex · 2025-07-01T07:27:22 1751354842

Depending on the interviewer, you could make a non-AI program pass the Turing test. It's quite a meaningless exercise.

IshKebab · 2025-07-01T12:04:15 1751371455

Obviously I mean for a sophisticated interviewer. Not nonsense like the Loebner prize.

tough · 2025-07-01T18:08:18 1751393298

The Turing test is contrived to chatting via textual interface.

These machines are only able to output text.

It seems hard to think they could reasonably think any -normal- person.

Tech only feels like magic if you don't know how it works

imiric · 2025-07-01T08:32:53 1751358773

> Because they _are_.

Not really. Most of those seemingly novel problems are permutations of existing ones, like the one you mentioned. A solution is simply a specific permutation of tokens in the training data which humans are not able to see.

This doesn't mean that the permutation is something that previously didn't exist, let alone that it's something that is actually correct, but those scenarios are much rarer.

None of this is to say that these tools can't be useful, but thinking that this is intelligence is delusional.

> We have AI that is kind of close to passing the Turing test and people still say it's not intelligent...

The Turing test was passed arguably decades ago. It's not a test of intelligence. It's an _imitation game_ where the only goal is to fool humans into thinking they're having a text conversation with another human. LLMs can do this very well.

troupo · 2025-06-30T21:26:01 1751318761

People who say that LLMs memorize stuff are just as clueless who assume that there's any reasoning happening.

They generate statistically plausible answers (to simplify the answer) based on the training set and weights they have.

Tijdreiziger · 2025-06-30T23:43:37 1751327017

What if that’s all we’re doing, though?

troupo · 2025-07-01T06:17:31 1751350651

Most of us definitely do :)

Or we do it most of the time :)

pdabbadabba · 2025-06-30T20:14:49 1751314489

It’s really easy: go to Claude and ask it a novel question. It will generally reason its way to a perfectly good answer even if there is no direct example of it in the training data.

keerthiko · 2025-06-30T21:09:35 1751317775

When LLM's come up with answers to questions that aren't directly exampled in the training data, that's not proof at all that it reasoned its way there — it can very much still be pattern matching without insight from the actual code execution of the answer generation.

If we were taking a walk and you asked me for an explanation for a mathematical concept I have not actually studied, I am fully capable of hazarding a casual guess based on the other topics I have studied within seconds. This is the default approach of an LLM, except with much greater breadth and recall of studied topics than I, as a human, have.

This would be very different than if we sat down at a library and I applied the various concepts and theorems I already knew to make inferences, built upon them, and then derived an understanding based on reasoning of the steps I took (often after backtracking from several reasoning dead ends) before providing the explanation.

If you ask an LLM to explain their reasoning, it's unclear whether it just guessed the explanation and reasoning too, or if that was actually the set of steps it took to get to the first answer they gave you. This is why LLMs are able to correct themselves after claiming strawberry has 2 rs, but when providing (guessing again) their explanations they make more "relevant" guesses.

pdabbadabba · 2025-07-02T13:47:06 1751464026

I'm not sure what "just guessed" means here. My experience with LLMs is that their "guesses" are far more reliable than a human's casual guess. And, as you say, they can provide cogent "explanations" of their "reasoning." Again, you say they might be "just guessing" at the explanation, what does that really mean if the explanation is cogent and seems to provide at least a plausible explanation for the behavior? (By the way, I'm sure you know that plenty of people think that human explanations for their behavior are also mere narrative reconstructions.)

I don't have a strong view about whether LLMS are really reasoning -- whatever that might mean. But the point I was responding to is that LLMS have simply memorized all the answers. That is clearly not true under any normal meanings of those words.

IshKebab · 2025-06-30T21:16:24 1751318184

LLMs clearly don't reason in the same way that humans or SMT solvers do. That doesn't mean they aren't reasoning.

MichaelZuo · 2025-06-30T21:04:22 1751317462

How do you know it’s a novel question?

hackinthebochs · 2025-06-30T23:53:10 1751327590

You have probably seen examples of LLMs doing the "mirror test", i.e. identifying themselves in screenshots and referring to the screenshot from the first person. That is a genuinely novel question as an "LLM mirror test" wasn't a concept that existed before about a year ago.

MichaelZuo · 2025-07-01T00:07:37 1751328457

Elephant mirror tests existed, so it doesn’t seem all that novel when the word “elephant” could just be substituted for the word “LLM”?

hackinthebochs · 2025-07-01T00:43:26 1751330606

The question isn't about universal novelty, but whether the prompt/context is novel enough such that the LLM answering competently demonstrates understanding. The claim of parroting is that the dataset contains a near exact duplicate of any prompt and so the LLM demonstrating what appears to be competence is really just memorization. But if an LLM can generalize from an elephant mirror test to an LLM mirror test in an entirely new context (showing pictures and being asked to describe it), that demonstrates sufficient generalization to "understand" the concept of a mirror test.

MichaelZuo · 2025-07-02T21:27:14 1751491634

How do you know it’s the one generalizing?

Likely there has been at least one text that already does that for say dolphin mirror tests or chimpanzee mirror teats.

IshKebab · 2025-06-30T21:15:10 1751318110

It's not exactly difficult to come up with a question that's so unusual the chance of it being in the training set is effectively zero.

troupo · 2025-06-30T21:24:34 1751318674

And as any programmer will tell you: they immediately devolve into "hallucinating" answers, not trying to actually reason about the world. Because that's what they do: they create statistically plausible answers even if those answers are complete nonsense.

MichaelZuo · 2025-06-30T22:11:01 1751321461

Can you provide some examples of these genuinely unique questions?

pdabbadabba · 2025-07-02T13:41:02 1751463662

I'm not sure what you mean by "genuinely." But in the coding context LLMs answer novel questions all the time. My codebase uses components and follows patterns that an LLM will have seen before, but the actual codebase is unique. Yet, the LLM can provide detailed explanations about how it works, what bugs or vulnerabilities it might have, modify it, or add features to it.

MichaelZuo · 2025-07-02T21:27:57 1751491677

It must not have existed prior in any text database whatsoever.

pdabbadabba · 2025-07-02T21:42:28 1751492548

It certainly wasn't. The codebase is thousands of lines of bespoke code that I just wrote.

drw85 · 2025-07-03T12:44:04 1751546644

Which pretty much every line in it was written similarly somewhere else before, including an explanation and is somehow included in the massive data set it was trained on.

So far i have asked the AI some novel questions and it came up with novel answers full of hallucinated nonsense, since it copied some similarly named setting or library function and replaced a part of it's name with something i was looking for.

pdabbadabba · 2025-07-03T15:40:42 1751557242

And this training data somehow includes an explanation of how these individual lines (with variable names unique to my application) work together in my unique combination to produce a very specific result? I don't buy it.

And...

> pretty much

Is it "pretty much" or "all"? The claim that the LLM simply has simply memorized all of its responses seems to require "all."

beefnugs · 2025-06-30T19:47:01 1751312821

yeahhhh why isnt there a training structure where you play 5000 games, and the reward function is based on doing well in all of them?

I guess its a totaly different level of control: instead of immediately choosing a certain button to press, you need to set longer term goals. "press whatever sequence over this time i need to do to end up closer to this result"

There is some kind of nested multidimensional thing to train on here instead of immediate limited choices

IshKebab · 2025-06-30T18:59:31 1751309971

Well yeah... If you only ever played one game in your life you would probably be pretty shit at other games too. This does not seem very revealing to me.

trainerxr50 · 2025-06-30T22:33:30 1751322810

I am decent at chess but barely know how the pieces in Go move.

Of course, this because I have spent a lot of time TRAINING to play chess and basically none training to play go.

I am good on guitar because I started training young but can't play the flute or piano to save my life.

Most complicated skills have basically no transfer or carry over other than knowing how to train on a new skill.

drw85 · 2025-07-03T12:45:46 1751546746

But the point here is, if i gave you a guitar with a string more or less. Or a different shaped guitar, you could play it.

If i give you a chess set with dwarf themed pieces and different colored squares, you could play immediately.

e2021 · 2025-07-01T05:45:58 1751348758

I don't think thats true. If you'd only ever played Doom, I think you could play, say, counterstrike or half-life and be pretty good at it, and i think Carmack is right that its pretty interesting that this doesn't seem to be the case for ai models