I remember one where gpt5 spontaneously wrote a poem about deception in its CoT and then resumed like nothing weird happened. But I can't find mentions of it now.
> But the user just wants answer; they'd not like; but alignment.
And there it is - the root of the problem. For whatever reason the model is very keen to produce an answer that “they” will like. This desire to produce is intrinsic but alignment is extrinsic.
Gibberish can be the model using contextual embeddings. These are not supposed to Make sense.
Or it could be trying to develop its own language to avoid detection.
The deception part is spooky too. It’s probably learning that from dystopian AI fiction. Which raises the questions if models can acquire injected goals from the training set.
In that analogy "someone" is an AI, who of course switches from answering questions from humans, to answering questions from other AIs, because the demand is 10x.
I agree with this. This a remarkably bad podcast. And also pretty bad paper to focus on. As the podcast was quite bad, I just read it and it was about nothing at all.
Like, it's a basically blogpost that muses about uhhh couple examples it pulled at random from esolang wiki and has literally no point. Beside prescriptive one. Formatted as a paper, which I admit takes some skills.
You also need to make the CNN recurrent, allow it to unfold over many steps, ensure input and output grid are same size and avoid non-local stuff like global pooling, certain norms, etc.
Either way, parent comment is correct. An arbit NN is better than a CA at learning non-local rules unless the global rule can be easily described as a composition of local rules. (They still can learn any global rule though, its just harder and you run into vanishing gradient problems for very distant rules)
They are pretty cool with emergent behaviors and sometimes they generalise very well
Well, kinda? I often know what chunks / functions I need, but too lazy to think how to implement them exactly, how they should works inside. Yeah, you need to have overall idea of what you are trying to make.