Before we get models that we can’t possibly understand, before they are complex ...

simonw · 2025-06-05T09:26:34 1749115594

"we need them to have a baseline understanding that destroying the world is bad"

That's what Anthropic's "constitutional AI" approach is meant to solve: https://www.anthropic.com/research/constitutional-ai-harmles...

tough · 2025-06-05T22:07:06 1749161226

The main issue from a layman's POV is that to adjudicate -understanding- to an LLM is a stretch.

These are matrixes of tokens that produce other tokens based on training.

These do not understand the world. existing, or human beings, beyond words. period.

pjc50 · 2025-06-05T09:27:06 1749115626

> we need them to have a baseline understanding that destroying the world is bad

How do we get HGI (human general intelligence) to understand this? We've not solved the human alignment problem.

qgin · 2025-06-05T12:52:31 1749127951

Most humans seem to understand it, more or less. For the ones that don't, we generally have enough that do understand it that we're able to eventually stop the ones that don't.

I think that's the best shot here as well. You want the first AGIs and the most powerful AGIs and the most common AGIs to understand it. Then when we inevitably get ones that don't, intentionally or unintentionally, the more-aligned majority can help stop the misaligned minority.

Whether that actually works, who knows. But it doesn't seem like anyone has come up with a better plan yet.

pixl97 · 2025-06-05T14:41:14 1749134474

This is more like saying the aligned humans will stop the unaligned humans in deforestation and climate change... they might, but the amount of environmental damage we've caused in the meantime is catastrophic.