> . . . with a smart and fast model that answers most questions, a deeper reasoning model for harder problems, and a real-time router that quickly decides which model to use based on conversation type, complexity, tool needs, and explicit intent (for example, if you say “think hard about this” in the prompt).
So that's not really a unified system then, it's just supposed to appear as if it is.
This looks like they're not training the single big model but instead have gone off to develop special sub models and attempt to gloss over them with yet another model. That's what you resort to only when doing the end-to-end training has become too expensive for you.
I know this is just arguing semantics, but wouldn't you call it a unified system since it has a single interface that automatically interacts with different components? It's not a unified model, but it seems correct to call it a unified system.
Altman et al have been discussing the many model interface in ChatGPT is confusing to users and they want to move to a unified system that exposes a model that routes based on the task rather than depending on users understanding how and when to do that. Presumably this is what they’ve been discussing for some time. I don’t know that was intended to mean they would be working toward some unified inference architecture and model, although I’m sure goal posts will be moved to ensure it’s insufficient.
He's the boss of the researchers so he knows more than them /s
But seriously tho, what parent is saying isn't a deep insight, it makes sense from a business perspective to consolidate your products into one so you don't confuse users
so openai is in the business of GPT wrappers now? I'm guessing their open model is an escape for those who wanted to have a "plain" model, though from my systematic testing, it's not much better than Kimi K2
The API lets you directly choose the model you want. Automatic thinking is a ChatGPT feature since ChatGPT has always been a “GPT wrapper” in that sense.
> While GPT‑5 in ChatGPT is a system of reasoning, non-reasoning, and router models, GPT‑5 in the API platform is the reasoning model that powers maximum performance in ChatGPT. Notably, GPT‑5 with minimal reasoning is a different model than the non-reasoning model in ChatGPT, and is better tuned for developers. The non-reasoning model used in ChatGPT is available as gpt-5-chat-latest.
Too expensive maybe, or just not effective anymore as they used up any available training data. New data is generated slowly, and is massively poisoned with AI generated data, so it might be useless.
That's a lie people repeat because they want it to be true.
People evaluate dataset quality over time. There's no evidence that datasets from 2022 onwards perform any worse than ones from before 2022. There is some weak evidence of an opposite effect, causes unknown.
It's easy to make "model collapse" happen in lab conditions - but in real world circumstances, it fails to materialize.
>This looks like they're not training the single big model but instead have gone off to develop special sub models and attempt to gloss over them with yet another model. That's what you resort to only when doing the end-to-end training has become too expensive for you.
The corollary to the bitter lesson strikes again: any hand crafted system will out perform any general system for the same budget by a wide margin.
The bitter lesson doesn't say that you can't split your solution into multiple models. It says that learning from more data via scaled compute will outperform humans injecting their own assumptions about the task into models.
A broad generalization like "there are two systems of thinking: fast, and slow" doesn't necessarily fall into this category. The transformer itself (plus the choice of positional encoding etc.) contains inductive biases about modeling sequences. The router is presumably still learned with a fairly generic architecture.
Sure, all of machine learning involves making assumptions. The bitter lesson in a practical sense is about minimizing these assumptions, particularly those that pertain to human knowledge about how to perform a specific task.
I don't agree with your interpretation of the lesson if you say it means to make no assumptions. You can try to model language with just a massive fully connected network to be maximally flexible, and you'll find that you fail. The art of applying the lesson is separating your assumptions that come from "expert knowledge" about the task from assumptions that match the most general structure of the problem.
"Time spent thinking" is a fundamental property of any system that thinks. To separate this into two modes: low and high, is not necessarily too strong of an assumption in my opinion.
I completely agree with you regarding many specialized sub-models where the distinction is arbitrary and informed by human knowledge about particular problems.
so many people at my work need it just switch. they just leave it on 4o. you can still set the model yourself if you want. but this will for sure improve the quality of output for my non technical workmates who are confused by model selection.
I'm a technical person, who has yet to invest the time in learning proper model selection too. This will be good for all users who don't bring AI to the forefront of their attention, and simply use it as a tool.
I say that as a VIM user who has been learning VIM commands for decades. I understand more than most how important it is to invest in one's tools. But I also understand that only so much time can be invested in sharpening the tools, when we have actual work to do with them. Using the LLMs as a fancy auto complete, but leaving the architecture up to my own NS (natural stupidity) has shown the default models to be more than adequate for my needs.
> The ultimate reason for this is Moore's law, or rather its generalization of continued exponentially falling cost per unit of computation
Is it though? To me it seems like performance gains are slowing down and additional computation in AI comes mostly from insane amounts of money thrown at it.
Yes, custom hand crafted model will always outperform general statistical models when given the same compute budget. Given that we've basically saturated the power grid at this point we may have to do the unthinkable and start thinking again.
We already did this for Object/Face recognition, it works but it's not the way to go. It's the way to go only if you don't have enough compute power (and data, I suspect) for a E2E network
No, it's what you do if your model architecture is capped out on its ability to profit from further training. Hand-wrapping a bunch of sub-models stands in for models that can learn that kind of substructure directly.
You could train that architecture end-to-end though. You just have to run both models and backprop through both of them in training. Sort of like mixture of experts but with two very different experts.
I do agree that the current evolution is moving further and further away from AGI, and more toward a spectrum of niche/specialisation.
It feels less and less likely AGI is even possible with the data we have available. The one unknown is if we manage to get usable quantum computers, what that will do to AI, I am curious.
> GPT‑5 is a unified system . . .
OK
> . . . with a smart and fast model that answers most questions, a deeper reasoning model for harder problems, and a real-time router that quickly decides which model to use based on conversation type, complexity, tool needs, and explicit intent (for example, if you say “think hard about this” in the prompt).
So that's not really a unified system then, it's just supposed to appear as if it is.
This looks like they're not training the single big model but instead have gone off to develop special sub models and attempt to gloss over them with yet another model. That's what you resort to only when doing the end-to-end training has become too expensive for you.