Hey, I'm from the Gemma team. There's a couple of angles to your question We do ...

refulgentis · 2025-03-13T00:33:46 1741826026

The Ollama stuff is the old llama.cpp stuff that constrains output tokens.

It's great, I've used it to get outputs from as small a model as 1B.

But it's a stark difference in quality from, say, Phi-4's native tool-calling.

If Gemma 3 is natively trained on tool-calling, i.e. y'all are benching on say, Berekley Function Calling leaderboard, that'd be great to know out here.

Tangentially, github.com/ochafik is a Googler who landed an excellent overhaul of llama.cpp's tool-calling, might be worth reaching out to (if you're not working with him already!)