answering correctly is completely dependent on the attention blocks to somehow capture the single letter nuance given word tokenization constraints. does the attention block in kimi have a more receptive architecture to this?
Text is broken into tokens in training (subword/multi-word chunks) rather than individual characters; the model doesn’t truly "see" letters or spaces the way humans do. Counting requires exact, step-by-step tracking, but LLMs work probabilistically.
Why stop? It's hilarious to watch AI floggers wriggle around trying to explain why AGI is just around the corner but their text-outputting machines can't read text.
How many rs are in a sentence spoken out loud to you?
Surely we can't figure it out, because sentences are broken up into syllables when spoken; you don't truly hear individual characters, you hear syllables.
But they have access to tools (though I'm not sure why they're not using them in this case).
Ask it to count using a coding tool, and it will always give you the right answer. Just as humans use tools to overcome their limits, LLMs should do the same.
IDK. Probably the model's doing some mental gymnastics to figure that out. I was surprised they haven't taught it to count yet. It's a well-known limitation.
But if tokenization makes them not be able to "see" the letters at all, then no amount of mental gymnastics can save you.
I'm aware of the limitation, i'm annoyingly using socratic dialogue to convince you that it is possible to count letters if the model were sufficiently smart.
how many rs in cranberry?
-- GPT5's response: The word cranberry has two “r”s. One in cran and one in berry.
Kimi2's response: There are three letter rs in the word "cranberry".