> It is frequently suggested that once one of the AI companies reaches an AGI threshold, they will take off ahead of the rest.
This seems to be a result of using overly simplistic models of progress. A company makes a breakthrough, the next breakthrough requires exploring many more paths. It is much easier to catch up than find a breakthrough. Even if you get lucky and find the next breakthrough before everyone catches up, they will probably catch up before you find the breakthrough after that. You only have someone run away if each time you make a breakthrough, it is easier to make the next breakthrough than to catch up.
Consider the following game:
1. N parties take turns rolling a D20. If anyone rolls 20, they get 1 point.
2. If any party is 1 or more points behind, they get only need to roll a 19 or higher to get one point. That is being behind gives you a slight advantage in catching up.
While points accumulate, most of the players end up with the same score.
I ran a simulation of this game for 10,000 turns with 5 players:
Supposedly the idea was, once you get closer to AGI it starts to explore these breakthrough paths for you providing a positive feedback loop. Hence the expected exponential explosion in power.
But yes, so far it feels like we are in the latter stages of the innovation S-curve for transformer-based architectures. The exponent may be out there but it probably requires jumping onto a new S-curve.
> Supposedly the idea was, once you get closer to AGI it starts to explore these breakthrough paths for you providing a positive feedback loop.
I think it does let you start explore the paths faster, but the search space you need to cover grows even faster. You can do research two times faster but you need to do ten times as much research and your competition can quickly catch up because they know what path works.
Basically what we have done the last few years is notice neural scaling laws and drive them to their logical conclusion. Those laws are power laws, which are not quite as bad as logarithmic laws, but you would still expect most of the big gains early on and then see diminishing returns.
Barring a kind of grey swan event of groundbreaking algorithmic innovation, I don't see how we get out of this. I suppose it could be that some of those diminishing returns are still big enough to bridge the gap to create an AI that can meaningfully recursively improve itself, but I personally don't see it.
At the moment, I would say everything is progressing exactly as expected and will continue to do so until it doesn't. If or when that happens is not predictable.
do you consider gpt itself and reasoning models to be two grey swan events? I expect another one of similar magnitude within two years for sure. I think we are searching more efficiently for such ideas than before w/ more compute & funding.
I would say GPT itself is less an event and more the culmination of decades of research and development in algorithms, hardware, and software. Of course, to some degree, this is true for any novel development. In this case, the convergence of development in GPUs, software to utilize them well while being able to work in very high levels of abstractions, and algorithms that can scale is something I'm not sure we will see again so quickly. All this preexisting research is kind of a resource that will be completely exploited at some point. And then the only thing that can drive you forward are truly novel ideas. Reasoning models were a fairly obvious next step too as the concepts of System 1 and 2 have been around for a while.
You are completely right that the compute and funding right now are unprecedented. I don't feel confident making any predictions.
You are forgetting that we are talking about AI. That AI will be used to speed up progress on making next, better AI that will be used to speed up progress on making next, better AI that ...
Consider the research work for five in series breakthroughs: 1, 2, 16, 8, 128 each breakthrough doubles your research power.
If you start at 1 research, you get the first breakthrough after 1/1=1 year. Then you get the second breakthrough after 2/2=1 year. Then you get the third breakthrough after 16/4 = 4 years. The fourth breakthrough after 8/8= year. The fifth breakthrough after 128/16 = 8 years.
If it only takes one year for a competitor to learn your breakthrough, they can catch up despite the fact that your research rate is doubling after every breakthrough.
This seems to be a result of using overly simplistic models of progress. A company makes a breakthrough, the next breakthrough requires exploring many more paths. It is much easier to catch up than find a breakthrough. Even if you get lucky and find the next breakthrough before everyone catches up, they will probably catch up before you find the breakthrough after that. You only have someone run away if each time you make a breakthrough, it is easier to make the next breakthrough than to catch up.
Consider the following game:
1. N parties take turns rolling a D20. If anyone rolls 20, they get 1 point.
2. If any party is 1 or more points behind, they get only need to roll a 19 or higher to get one point. That is being behind gives you a slight advantage in catching up.
While points accumulate, most of the players end up with the same score.
I ran a simulation of this game for 10,000 turns with 5 players:
Game 1: [852, 851, 851, 851, 851]
Game 2: [827, 825, 827, 826, 826]
Game 3: [827, 822, 827, 827, 826]
Game 4: [864, 863, 860, 863, 863]
Game 5: [831, 828, 836, 833, 834]