LiveBench (which I like because it tries very hard to avoid contamination) ranks...

parav · 2025-01-20T17:37:35 1737394655

LiveCodingBench has DeepSeekR1 at #3 after O1-high and O1-medium https://livecodebench.github.io/leaderboard.html

usaar333 · 2025-01-21T05:53:27 1737438807

That's more of a leetcode bench than real world coding bench

svantana · 2025-01-21T08:38:57 1737448737

That's R1-preview released a while back - the real R1 is even better.

behnamoh · 2025-01-20T16:49:49 1737391789

no, sonnet 3.5 is #7 on LiveBench, even below DeepSeek V3.

thegeomaster · 2025-01-20T16:59:49 1737392389

The parent comment was talking about coding specifically, not the average score. I see o1 at 69.69, and Claude 3.5 Sonnet at 67.13.

sebastiennight · 2025-01-21T12:40:20 1737463220

o1's score looks like exactly what I would expect Elon Musk to aim for with Grok's benchmarks