Disclaimer: I am very well aware this is not a valid test or indicative or anything else. I just thought it was hilarious.
When I asked the normal "How many 'r' in strawberry" question, it gets the right answer and argues with itself until it convinces itself that its (2). It counts properly, and then says to it self continuously, that can't be right.
Skynet sends Terminator to eradicate humanity, the Terminator uses this as its internal reasoning engine... "instructions unclear, dick caught in ceiling fan"
It's funny because this simple excercise shows all the problems that I have using the reasoning models: they give a long reasoning that just takes too much time to verify and still can't be trusted.
I may be looking at this too deeply, but I think this suggests that the reasoning is not always utilized when forming the final reply.
For example, IMMEDIATELY, upon it's first section of reasoning where it starts counting the letters:
> R – wait, is there another one? Let me check again. After the first R, it goes A, W, B, E, then R again, and then Y. Oh, so after E comes R, making that the second 'R', and then another R before Y? Wait, no, let me count correctly.
1. During its counting process, it repeatedly finds 3 "r"s (at positions 3, 8, and 9)
2. However, its intrinsic knowledge that "strawberry" has "two Rs" keeps overriding this direct evidence
3. This suggests there's an inherent weight given to the LLM's intrinsic knowledge that takes precedence over what it discovers through step-by-step reasoning
To me that suggests an inherent weight (unintended pun) given to its "intrinsic" knowledge, as opposed to what is presented during the reasoning.
Strawberry is "difficult" not because the reasoning is difficult, but because tokenization doesn't let the model reason at the level of characters. That's why it has to work so hard and doesn't trust its own conclusions.
Yeah, but it clearly breaks down the spelling correctly in it's reasoning, e.g. a letter per line. So it gets past the tokenization barrier, but still gets hopelessly confused.
How excellent for a quantized 27GB model (the Q6_K_L GGUF quantization type uses 8 bits per weight in the embedding and output layers since they're sensitize to quantization)
I wonder if the reason the models have problem with this is that their tokens aren't the same as our characters. It's like asking someone who can speak English (but doesn't know how to read) how many R's are there in strawberry. They are fluent in English audio tokens, but not written tokens.
The way LLMs get it right by counting the letters then change their answer at the last second makes me feel like there might be a large amount of text somewhere (eg. a reddit thread) in the dataset that repeats over and over that there is the wrong number of Rs. We've seen may weird glitches like this before (eg. a specific reddit username that would crash chatgpt)
The amazing thing continues to be that they can ever answer these questions correctly.
It's very easy to write a paper in the style of "it is impossible for a bee to fly" for LLMs and spelling. The incompleteness of our understanding of these systems is astonishing.
Is that really true? Like, the data scientists making these tools are not confident why certain patterns are revealing themselves? That’s kind of wild.
Yeah that’s my understanding of the root cause. It can also cause weirdness with numbers because they aren’t tokenized one digit at a time. For good reason, but it still causes some unexpected issues.
I believe DeepSeek models do split numbers up into digits, and this provides a large boost to ability to do arithmetic. I would hope that it's the standard now.
Could be the case, I’m not familiar with their specific tokenizers. IIRC llama 3 tokenizes in chunks of three digits. That seems better than arbitrary sized chunks with BPE, but still kind of odd. The embedding layer has to learn the semantics of 1000 different number tokens, some of which overlap in meaning in some cases and not in others, e.g 001 vs 1.
This was my first prompt after downloading too and I got the same thing. Just spinning again and again based on it's gut instinct that there must be 2 R's in strawberry, despite the counting always being correct. It just won't accept that the word is spelled that way and it's logic is correct.
That gut feeling approach is very human like. You have a bias and even when the facts say that you are wrong you think that there must be a mistake, because your original bias is so strong.
Maybe we need a dozen LLMs with different biases. Let them try to convince the main reasoning LLM that it’s wrong in various ways.
Or just have an LLM that is trained on some kind of critical thinking dataset where instead of focusing on facts it focuses on identifying assumptions.
1/3 chance you picked the door with the car, 2/3 chance it's behind one of the other two doors.
These probabilities don't change just because you subsequently open any of the doors.
So, Monty now opens one of the other 2 doors and car isn't there, but there is still a 2/3 chance that it's behind ONE of those 2 other doors, and having eliminated one of them this means there's a 2/3 chance it's behind the other one!!
So, do you stick with your initial 1/3 chance of being right, or go with the other closed door that you NOW know (new information!) has a 2/3 chance of being right ?!
if you get to pick one and he opens 98 of the remaining ones, obviously you would switch to the remaining one you didnt pick, since 99/100 times the winning door will be in his set.
On the initial choice yes. But on the second choice, that other door is a single door that is the sum of the odds of the other 99 doors. So you're second choice would be to keep the door you initially chose (1/100) or select the other door (99/100).
Remember, the host always knows which is the correct door, and if you selected incorrectly on the initial choice they will ALWAYS select the correct door for the second choice.
I thought it would be obvious that I’m not arguing the statistical facts, but the idea that “it is easier to think about” the 100 doors scenario. There is simply no straightforward explanation that works for laypeople.
I think the issue most lay people have is that the host opening a door changes the odds of winning, because he knows where the prize is.
I think the easiest way to demonstrate that this is true is to play the same game with two doors, except the host doesn't open the other door if it has the prize behind it. This makes it obvious that the act of opening the door changes the probability of winning, because if the host opens the other door, you now have 100% chance of winning if you don't switch. Similarly, if they don't open the other door, you have a 0% chance of winning, and should switch. It's the fact that the host knows and chooses that is important.
It's only once you get over that initial hurdle that the 100 door game becomes "obvious". You know from the two door example that the answer isn't 50/50, and so the only answer that makes sense is that the probability mass gets concentrated in the other door.
It's probably easier for most people to not think of them as two remaining doors, but two remaining sets. Originally, with one hundred doors, if the goal object is only behind one of them, then there would be a 1/100 probability it would be behind the initially chosen door, which comprises one set, while there's a 99/100 probability that the goal object is behind one of the doors in the set of not originally chosen doors. If 98/99 of the doors in the not originally chosen doors set are excluded as having the goal object, then this does not change that there's a 99/100 probability that the goal object is behind a door in this set, it just means it wasn't one of the other doors in the set.
Chasing this tangent a bit -- I have never been happy with the Monty Hall problem as posed.
To me the problem is that it is posed as a one-shot question. If you were in this actual situation, how do you know that Monty is not deliberately trying to make you lose? He could, for example, have just let you open the first door you picked, revealing the goat. But he chose to ask you to switch, then maybe that is a big hint that you picked the right door the first time?
If the game is just "you will pick a door, he will reveal another door, and then you can choose to switch" then clearly the "usual" answer is correct; always switch because the only way you lost is if you guessed correctly the first time (1/3).
But if the game is "try to find the car while the host tries to make you lose" then you should never switch. His ideal behavior is that if you pick the door with the goat then he gives you the goat; if you pick the door with the car then he tries to get you to switch.
If his desire is for the contestant to lose, then he can't really do better (formally) than winning 2/3 of the time by simply opening the door that they choose. In practice, always opening a goat-door and always asking to switch for a car-door can do slightly better than 2/3 because some contestants, unaware of his strategy and objectives, might choose to switch.
If his objective is more subtle -- increasing suspense or entertainment value or getting a kick out of people making a self-destructive choice or just deciding whether he likes a contestant -- then I'm not sure what the metrics are or what an optimal strategy would be in those cases.
Given that his motives are opaque and given no history of games upon which to even inductively reason, I don't think you can reach any conclusion about whether switching is preferable. Given the spread of possibilities I would tend to default to 50/50 for switch/no-switch, but I don't have a formal justification for this.
I think it's great that you can see the actual chain of thought behind the model, not just the censored one from OpenAI.
It strikes me that it's both so far from getting it correct and also so close- I'm not an expert but it feels like it could be just an iteration away from being able to reason through a problem like this. Which if true is an amazing step forward.
How long until we get to the point where models know that LLMs get this wrong, and that it is an LLM, and therefore answers wrong on purpose? Has this already happened?
(I doubt it has, but there ARE already cases where models know they are LLMs, and therefore make the plausible but wrong assumption that they are ChatGPT.)
My understanding is that the model does not "know" it is an LLM. It is prompted (in the app's system prompt) or trained during RLHF to answer that it is an LLM.
I think there is an inherent weight associated with the intrinsic knowledge opposed to the reasoning steps as intrinsic knowledge can override reasoning.
I feel like one round of RL could potentially fix "short circuits" like these. It seems to be convinced that a particular rule isn't "allowed," when it's totally fine. Wouldn't that mean that you just have to fine tune it a bit more on its reasoning path?
If I asked you, "hey. How many Rs in strawberry?". You're going to tell me 2, because the likelihood is I am asking about the ending Rs. That's at least how I'd interpret the question without the "llm test" clouding my vision.
Same for if I asked how many gullible. I'd say "it's a double L after the u".
This is from a small model. 32B and 70B answer this correctly. "Arrowroot" too. Interestingly, 32B's "thinking" is a lot shorter and it seems to be more "sure". Could be because it's based on Qwen rather than LLaMA.
My models are both 4 bit. But yeah, that could be - small models are much worse at tolerating quantization. That's why people use LoRA to recover the accuracy somewhat even if they don't need domain adaptation.
How would they build guardrails for this? In CFD, physical simulation with ML, they talk about using physics-informed models instead of purely statistical. How would they make language models that are informed with formal rules, concepts of English?
When I asked the normal "How many 'r' in strawberry" question, it gets the right answer and argues with itself until it convinces itself that its (2). It counts properly, and then says to it self continuously, that can't be right.
https://gist.github.com/IAmStoxe/1a1e010649d514a45bb86284b98...