Hacker News new | past | comments | ask | show | jobs | submit login

Lots to be excited about here - in particular new architecture that allows subquadratic scaling of memory needs for long context; looks like 128k+ context is officially now available on a local model. The charts make it look like if you have the RAM the model is pretty good out to 350k or so(!) with RoPE.

In addition, it flavor tests well on chat arena, ELO significantly above yesterday’s best open model, Qwen 2.5 72b, has some pretty interesting properties that indicate it has not spent much of its model weight space on memorization, hopefully implying that it has spent it on cognition and conceptual stuff.

And, oh also vision and 140 languages.

This seems like one worth downloading and keeping; Gemma models have at times not performed quite to benchmark, but I’d guess from all this that this will be a useful strong local model for some time. I’m curious about coding abilities and tool following, and about ease of fine tuning for those.

Thanks open sourcing this, DeepMind team! It looks great.






Gemma is made by Google, not DeepMind.

edit: Sorry, forgot DeepMind was Google's AI R&D, I read it as deepseek in your comment.


Job postings for working on Gemma are under DeepMind in London: https://boards.greenhouse.io/deepmind/jobs/6590957

Hah no worries - when I read your comment I was like “dang how did I mix up deepseek and google?” Then I read your edit.

That’s Google DeepMind to you

Can you link how you fine tune? Does it make a LoRA?



Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: