I believe (and some other people on the internet having more knowledge in LLM be...

lastLinkedList · 2025-03-12T22:09:33 1741817373

I’m curious to hear more about phone-local assistants. I rather assumed only the latest hardware ( iPhone 15+, not sure on Android side) could do local inference. Is there a way to get something going on hardware a couple years old?

genewitch · 2025-03-13T12:49:05 1741870145

FUTO voice typing runs local on my galaxy 20, so, yes. Also there are SPA that claim to load local that I have but I haven't tried that. There are small models, one I know of is 380M parameter, rather than 15B or 800B...

simne · 2025-03-12T22:48:53 1741819733

> Is there a way to get something going on hardware a couple years old?

Tensor accelerators are very recent thing, and GPU/WebGPU also recent. RAM was also limited, 4Gb was long time barrier.

So, model should run on CPU and within 4Gb or even 2Gb.

Oh, I forget one important thing - couple years old mobile CPUs was also weak (and btw exception was iphone/ipad).

But, if you have gaming mobile (or iphone), which at that time was comparable to Notebooks, may run something like Llama-2 quantized to 1.8Gb at about 2 tokens per second, not very impressive, but could work.

simne · 2025-03-12T23:55:45 1741823745

Unfortunately, I could not remember, when median performance of mobile CPU become comparable to business Notebooks.

I think, Apple entered race for speed with iPhone X and iPad 3. For Androids things even worse, looks like median achieved Notebooks speed at about Qualcomm snapdragon 6xx.