Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
thegeomaster
5 months ago
|
parent
|
context
|
favorite
| on:
DeepSeek-R1
LiveBench (which I like because it tries very hard to avoid contamination) ranks Sonnet 3.5 second only to o1 (which is totally expected).
parav
5 months ago
|
next
[–]
LiveCodingBench has DeepSeekR1 at #3 after O1-high and O1-medium
https://livecodebench.github.io/leaderboard.html
usaar333
5 months ago
|
parent
|
next
[–]
That's more of a leetcode bench than real world coding bench
svantana
5 months ago
|
parent
|
prev
|
next
[–]
That's R1-preview released a while back - the real R1 is even better.
behnamoh
5 months ago
|
prev
[–]
no, sonnet 3.5 is #7 on LiveBench, even below DeepSeek V3.
thegeomaster
5 months ago
|
parent
[–]
The parent comment was talking about coding specifically, not the average score. I see o1 at 69.69, and Claude 3.5 Sonnet at 67.13.
sebastiennight
5 months ago
|
root
|
parent
[–]
o1's score looks like exactly what I would expect Elon Musk to aim for with Grok's benchmarks
Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: