If it helps anyone, I wrote a detailed analysis here: https://x.com/danielhanche... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

		danielhanchen 1 day ago \| parent \| context \| favorite \| on: Gemma 3 Technical Report [pdf] If it helps anyone, I wrote a detailed analysis here: https://x.com/danielhanchen/status/1899735308180267176 TLDR: 1. 1B text only, 4, 12, 27B Vision + text. 14T tokens 2. 128K context length further trained from 32K. 1B is 32K. 3. Removed attn softcapping. Replaced with QK norm 4. 5 sliding + 1 global attn 5. 1024 sliding window attention 6. RL - BOND, WARM, WARP

Join us for AI Startup School this June 16-17 in San Francisco!
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact