It's interesting to take the counterfactual, what it looks like when large projects are run poorly. The answer often looks like:
* Poorly defined goals / definition of success
* Overly-complex plans, slowly executed against
* A focus on issues that aren't the real bottleneck
* Large cost and time overruns
* Project is eventually cancelled
I've had the interesting experience of watching the same type of "transformation" project run twice at similar companies. In the first case, the project was bogged down to the extent that I genuinely updated to believe it wasn't possible to achieve. In the second case, I saw incredible progress / pace with a much smaller team, pushing on all the key points with the right planning, and learned some lessons I wish I'd known on take 1.
Github used to publish some pretty interesting postmortems. Maybe they still do. IIRC that they were struggling with scaling their SQL db and were starting to hit the limits. It's a tough position to be in because you have to either to a massive migration to a data layer with much different semantics, or you have to keep desperately squeezing performance and skirting on the edge of outages with a DB that wasn't really meant to handle what you're doing with it now.
The OpenAI blog post on "scaling" Postgres to their current scale has much the same flavor, although I think they're doing it better than Github appears to be doing.
I’d be surprised by this: GitHub pretty famously used Vitess, and I’d be surprised if each shard were too big for modern hardware. Based on previous reporting [0], they’re running out of space in the main data center and new management is determined to move to Azure in a hurry. I’d bet that these outages are a combination of a worsening capacity crunch in the old data center and…well, Azure.
I think most large platforms eventually split the tools out because you indeed can get MUCH better CI/CD, ticket management, documentation, etc from dedicated platforms for each. However when you're just starting out the cognitive overhead and cost of signing up and connecting multiple services is a lot higher than using all the tools bundled (initially for free) with your repo.
I mean, yeah, probably, but also OpenAI literally can't afford to give away this for free. They are losing a lot of money. Open source AI will continue to be a thing and they will have to compete to give you something better than what you can do yourself.
OpenAI is far from the stage of "grinding out more and more profits for investors." It's more like the stage of "most serious observers doubt that it can continue as a going concern"
Wait, you're completely skipping the emergence of reasoning models, though? 4.5 was slower and moderately better than 4o, o3 was dramatically stronger than 4o and GPT5 was basically a light iteration on that.
What's happening now is training models for long-running tasks that use tools, taking hours at a time. The latest models like 4.6 and 5.3 are starting to make good on this. If you're not using models that are wired into tools and allowed to iterate for a while, then you're not getting to see the current frontier of abilities.
(EG if you're just using models to do general knowledge Q&A, then sure, there's only so much better you can get at that and models tapered off there long ago. But the vision is to use agents to perform a substantial fraction of white-collar work, there are well-defined research programmes to get there, and there is stead progress.)
> Wait, you're completely skipping the emergence of reasoning models, though?
o1 was something like 16-18 months ago. o3 was kinda better, and GPT 5 was considered a flop because it was basically just o3 again.
I’ve used all the latest models in tools like Claude code and codex, and I guess I’m just not seeing the improvement? I’m not even working on anything particularly technically complex, but I still have to constantly babysit these things.
Where are the long-running tasks? Cursor’s browser that didn’t even compile? Claude’s C compiler that had gcc as an oracle and still performs worse than gcc without any optimizations? Yeah I’m completely unimpressed at this point given the promises these people have been making for years now. I’m not surprised that given enough constraints they can kinda sorta dump out some code that resembles something else in their training data.
reply