And have you measured the trade-off that could come with embracing such a large ...

alekandreev · 2025-03-12T14:56:30 1741791390

Yes we have measured the tradeoff. We don't see a drop of perplexity in English when introducing multilingual, and there is a slight drop in some English language-specific evals (~1%).

Workaccount2 · 2025-03-12T14:00:51 1741788051

There are enough small model teams competing that I fell confident one of them will try this, and if it just sticking to english gives a large boost, the others will be forced to follow suite.

It would also kind of suck for non-english speakers, because it will just be another feather in the hat of "English eats the world".

mdp2021 · 2025-03-12T15:00:51 1741791651

Some numbers to try and make an idea: if I understand correctly, Gemma3 uses a fixed (in its versions by size) vocabulary 256k entries big; the smallest 1B version has ~300M embedding parameters and ~700M non-embedding parameters; the largest 27B version has ~5x embedding parameters and ~35x non-embedding parameters.

Multilingualism covering 140 languages is quite a big feat. Gemma3 apparently aims to be compact and efficient. The two goals and features put together raise questions. You wonder for example how much does such extensive multilingualism impact the above numbers, on a benchmark of similar results. It may e.g. be a general question to wonder how much multilingualism complicates an embedding space (owing e.g. to omographic collisions), and the question becomes more prominent when you crammed 140 languages in one model.

> non-english speakers

You would produce more specialized models (where it makes sense): Eng; Eng-Fra-Esp-Deu; Man-Can... For a billion weights per model it could probably be financially acceptable.