That SWE-bench chart with the mismatched bars (52.8% somehow appearing larger th...

kkukshtel · 2025-08-07T19:39:04 1754595544

You're sort of glossing over the part where this can now be leveraged as a cost-efficient agentic model that performs better than o3. Nobody used o3 for sw agent tasks due to costs and speed, and this now substantially seems to both improve on o3 AND be significantly cheaper than Claude.

synapsomorphy · 2025-08-07T19:46:38 1754595998

o3's cost was sliced by 80% a month or so ago and is also cheaper than Claude (the output is even cheaper than GPT-5). It seems more cost efficient but not by much.

BoorishBears · 2025-08-07T20:22:09 1754598129

This feels revisionist: no one used it because it wasn't as good.

extr · 2025-08-08T05:04:50 1754629490

O3 is fantastic at coding tasks, until today it was smartest model in existence. But it works only in few shot conversational scenarios, it's not good at agentic harnesses.

m3kw9 · 2025-08-08T04:09:05 1754626145

You can use o3 for coding on plus plan almost unlimited or till they throttle

withinboredom · 2025-08-08T10:04:22 1754647462

not anymore

m3kw9 · 2025-08-10T13:37:31 1754833051

what do you mean? For CLI or web codex?

slashdave · 2025-08-07T21:07:39 1754600859

GPT-5 had to be released, in any form. This announcement was not the product of a breakthrough, but the consequence of a business requirement.

zmmmmm · 2025-08-07T22:23:52 1754605432

this is the real answer

it has to be released because it's not much better and OpenAI needs the team to stop working on it. They have serious competition now and can't afford to burn time / money on something that isn't shifting the dial.

IceDane · 2025-08-07T19:41:25 1754595685

The whole presentation was full of completely broken bar charts. Not even just the typical "let's show 10% of the y axis so that a 5% increase looks like 5x" but stuff like the deception eval showing gpt5 vs o3 as 50 vs 47, but the 47 is 3x as big, and then right next to it we have 9 vs 87, more reasonably sized.

It's like no one looked at the charts, ever, and they just came straight from.. gpt2? I don't think even gpt3 would have fucked that up.

I don't know any of those people, but everyone that has been with OAI for longer than 2 years 1.5m bonuses, and somehow they can't deliver a bar chart with sensible at axes?

throwaway_2898 · 2025-08-07T22:33:37 1754606017

TBH Claude Code max pro's performance on coding has been abhorrent(bad at best). The core of the issue is that the plan produced will more often than not use humans as verifiers(correctness, optimality and quality control). This is a fundamentally bad way to build systems that need to figure out if their plan will work correctly, because an AI system needs to test many plans quickly in a principled manner(it should be optimal and cost efficient).

So you might get that initial MvP out the door quickly, but when the complexity grows even just a little bit, you will be forced to stop and look at the plan and try to get it to develop it saying things like: "use Design agent to ultrathink about the dependencies of the current code change on other APIs and use TDD agent to make sure tests are correct in accordance with the requirements I stated" and then one finds that even the all the thinking there are bugs that you will have to fix.

Source: I just tried max pro on two client python projects and it was horrible after week 2.

z7 · 2025-08-07T19:20:37 1754594437

>The actual benchmark improvements are marginal at best

GPT-5 demonstrates exponential growth in task completion times:

https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...

hk__2 · 2025-08-07T20:07:18 1754597238

What do you mean? A single data point cannot be exponential. What the blog post say is that the ability to solve tasks of all LLMs is exponential over time, and GPT-5 fits in that curve.

z7 · 2025-08-07T22:49:05 1754606945

Yes, but the jump in performance from o3 is well beyond marginal while also fitting an exponential trend, which undermines the parent's claim on two counts.

adammarples · 2025-08-07T23:15:33 1754608533

Actually a single data point fits a huge range of exponential functions.

usaar333 · 2025-08-08T00:04:23 1754611463

No it doesn't. If it were even linear compared to o1 -> o3, we'd be at 2.43 hours. Instead we're only at 2.29.

Exponential would be at 3.6 hours

rrrrrrrrrrrryan · 2025-08-07T19:03:41 1754593421

I suspect the vast majority of OpenAI's users are only using ChatGPT, and the vast majority of those ChatGPT users are only using the free tier.

For all of them, getting access to full-blown GPT-5 will probably be mind-blowing, even if it's severely rate-limited. OpenAI's previous/current generation of models haven't really been ergonomic enough (with the clunky model pickers) to be fully appreciated by less tech-savvy users, and its full capabilities have been behind a paywall.

I think that's why they're making this launch is a big deal. It's just an incremental upgrade for the power users and the people that are paying money, but it'll be a step-change in capability to everyone else.

mlsu · 2025-08-07T20:41:36 1754599296

They are selling "AGI"

replacing huge swathes of the white collar workforce

"incremental upgrade for power users" is not at all what this house of cards is built on

Sabinus · 2025-08-07T23:44:26 1754610266

They are selling AGI to investors, but they're just selling intelligence to subscribers. And they just made the intelligence cheaper and better.

m3kw9 · 2025-08-08T04:11:21 1754626281

I’m very seen ppl minds blown on free tier previous to 5. It’s basically 4o which is pretty good for normies

samsullivan · 2025-08-07T22:35:29 1754606129

Thats why they need to pay 300k for a slide designer https://openai.com/careers/creative-lead-presentation-design...