I believe (and some other people on the internet having more knowledge in LLM believe too) that open source local models are the future.
Probably big models with API and chat like OpenAI is doing will have its niche toot but it is very costly and it is not AGI and it will not be in the near future.
On the other hand with rise of NPU chips and small models you can have your own assistant on your phone using your own data almost instaneously with almost no cost. Whoever will build the best OS model will win this race. As the winner you will be able to set the standard.
Basically it is why we have Linux on the severs not Windows and why even browsers are free you still get one from every tech giant.
I’m curious to hear more about phone-local assistants. I rather assumed only the latest hardware ( iPhone 15+, not sure on Android side) could do local inference. Is there a way to get something going on hardware a couple years old?
> Is there a way to get something going on hardware a couple years old?
Tensor accelerators are very recent thing, and GPU/WebGPU also recent.
RAM was also limited, 4Gb was long time barrier.
So, model should run on CPU and within 4Gb or even 2Gb.
Oh, I forget one important thing - couple years old mobile CPUs was also weak (and btw exception was iphone/ipad).
But, if you have gaming mobile (or iphone), which at that time was comparable to Notebooks, may run something like Llama-2 quantized to 1.8Gb at about 2 tokens per second, not very impressive, but could work.
Unfortunately, I could not remember, when median performance of mobile CPU become comparable to business Notebooks.
I think, Apple entered race for speed with iPhone X and iPad 3. For Androids things even worse, looks like median achieved Notebooks speed at about Qualcomm snapdragon 6xx.
FUTO voice typing runs local on my galaxy 20, so, yes. Also there are SPA that claim to load local that I have but I haven't tried that. There are small models, one I know of is 380M parameter, rather than 15B or 800B...
Those are certainly benefits, but it's most likely a prophylactic move.
LLMs will be (are?) a critical piece of infrastructure. Commoditizing that infrastructure ensures that firms like Google and Meta won't be dependent on any other (OpenAI) for access to that infrastructure.
Meta in particular has had this issue wrt Ads on iOS. And Google wrt paying Apple to be the default search engine.
See also: Joel Spoelsky's famous Strategy Letter V [0].
Unfortunately, this is known business model, most known example was Eclipse IDE, which killed all small IDE businesses. Other example, MySQL from Oracle.
Yes, idea, to make basically free something, on which small-medium businesses could survive and grow to something big, so making big death valley between small and big businesses.
Only exception are tiny businesses, living in tiny niches, but for them nearly impossible to overcome gap from tiny to big.
And you should understand, "open models" are in reality open-weight models, as they not disclose sources from which trained, so community cannot remake model from scratch.
Headhunting is sure important, but big business typically are so much finance powerful, so they could just buy talents.
- Headhunting with reputation is really important for small businesses, because they typically very limited in finances.
Medium business typically between small and big, but as I said at beginning, making some strategic things free, create death valley, so it become very hard to be medium.
Reputation is good thing for all, but again, top corporations are powerful non-proportional to size, so in many cases for them is relatively cheap to just maintain neutral reputation, they don't need to spend much to whitening.
There are certain demands and if you don't do anything, those will be taken over by competitors and you lose controls. This is especially important for Google as they see LLM as a significant portion of future Cloud business and probably want to have a smooth, exclusive transition path to their proprietary models.