Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why do you think that AGI necessitates modification of the model during use? Couldn’t all the insights the model gains be contained in the context given to it?


Because time marches on and with it things change.

You could maybe accomplish this if you could fit all new information into context or with cycles of compression but that is kinda a crazy ask. There's too much new information, even considering compression. It certainly wouldn't allow for exponential growth (I'd expect sub linear).

I think a lot of people greatly underestimate how much new information is created every day. It's hard if you're not working on any research and seeing how incremental but constant improvement compounds. But try just looking at whatever company you work for. Do you know everything that people did that day? It takes more time to generate information than process information so that's on you side, but do you really think you could keep up? Maybe at a very high level but in that case you're missing a lot of information.

Think about it this way: if that could be done then LLM wouldn't need training or tuning because you could do everything through prompting.


The specific instance doesn’t need to know everything happening in the world at once to be AGI though. You could feed the trained model different contexts based on the task (and even let the model tell you what kind of raw data it wants) and it could still hypothetically be smarter than a human.

I’m not saying this is a realistic or efficient method to create AGI, but I think the argument „Model is static once trained -> model can’t be AGI“ is fallacious.


I think that makes a lot of assumptions about the size of data and what can be efficiently packed into prompts. Even if we're assuming all info in a prompt is equal while in context and that it compresses information into the prompts before it falls out of context, then you're going to run into the compounding effects pretty quickly.

You're right, you don't technically need infinite, but we are still talking about exponential growth and I don't think that effectively changes anything.



Like I already said, the model can remember stuff as long as it’s in the context. LLMs can obviously remember stuff they were told or output themselves, even a few messages later.


AGI needs to genuinely learn and build new knowledge from experience, not just generate creative outputs based on what it has already seen.

LLMs might look “creative” but they are just remixing patterns from their training data and what is in the prompt. They cant actually update themselves or remember new things after training as there is no ongoing feedback loop.

This is why you can’t send an LLM to medical school and expect it to truly “graduate”. It cannot acquire or integrate new knowledge from real-world experience the way a human can.

Without a learning feedback loop, these models are unable to interact meaningfully with a changing reality or fulfill the expectation from an AGI: Contribute to new science and technology.


I agree that this is kind of true with a plain chat interface, but I don’t think that’s an inherent limit of an LLM. I think OpenAI actually has a memory feature where the LLM can specify data it wants to save and can then access later. I don’t see why this in principle wouldn’t be enough for the LLM to learn new data as time goes on. All possible counter arguments seem related to scale (of memory and context size), not the principle itself.

Basically, I wouldn’t say that an LLM can never become AGI due to its architecture. I also am not saying that LLM will become AGI (I have no clue), but I don’t think the architecture itself makes it impossible.


LLMs lack mechanisms for persistent memory, causal world modeling, and self-referential planning. Their transformer architecture is static and fundamentally constrains dynamic reasoning and adaptive learning. All core requirements for AGI.

So yeah, AGI is impossible with today LLMs. But at least we got to watch Sam Altman and Mira Murati drop their voices an octave onstage and announce “a new dawn of intelligence” every quarter. Remember Sam Altman 7 trillion?

Now that the AGI party is over, its time to sell those NVDA shares and prepare for the crash. What a ride it was. I am grabbing the popcorn.


  > the model can remember stuff as long as it’s in the context.
You would need an infinite context or compression

Also you might be interested in this theorem

https://en.wikipedia.org/wiki/Data_processing_inequality


> You would need an infinite context or compression

Only if AGI would require infinite knowledge, which it doesn’t.


You're right, but compounding effects get out of hand pretty quickly. There's a certain point where finite is not meaningfully different than infinite and that threshold is a lot lower than you're accounting for. There's only so much compression you can do, so even if that new information is not that large it'll be huge in no time. Compounding functions are a whole lot of fun... try running something super small like only 10GB of new information a day and see how quickly that grows. You're in the TB range before you're half way into the year...


This seems kind of irrelevant? Humans have General Intelligence while having a context window of, what, 5MB, to be generous. Model weights only need to contain the capacity for abstract reasoning and querying relevant information. That they currently hold real-world information at all is kind of an artifact of how models are trained.


  > Humans have General Intelligence while having a context window
Yes, but humans also have more than a context window. They also have more than memory (weights). There's a lot of things humans have besides memory. For example, human brains are not a static architecture. New neurons as well as pathways (including between existing neurons) are formed and destroyed all the time. This doesn't stop either, it continues happening throughout life.

I think your argument makes sense, but is over simplifying the human brain. I think once we start considering the complexity then this no longer makes sense. It is also why a lot of AGI research is focused on things like "test time learning" or "active learning", not to mention many other areas including dynamic architectures.


For starters, if it were superintelligent it would eventually make discoveries. New discoveries were not in the training set originally. The model needs to be trained to use the new discovery to aid it in the future.

As it is, it has to keep "rediscovering" the same thing each and every time, no matter how many inferences you run.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: