Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

LiveBench (which I like because it tries very hard to avoid contamination) ranks Sonnet 3.5 second only to o1 (which is totally expected).



LiveCodingBench has DeepSeekR1 at #3 after O1-high and O1-medium https://livecodebench.github.io/leaderboard.html


That's more of a leetcode bench than real world coding bench


That's R1-preview released a while back - the real R1 is even better.


no, sonnet 3.5 is #7 on LiveBench, even below DeepSeek V3.


The parent comment was talking about coding specifically, not the average score. I see o1 at 69.69, and Claude 3.5 Sonnet at 67.13.


o1's score looks like exactly what I would expect Elon Musk to aim for with Grok's benchmarks




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: