What about TPUs? They are more efficient than nvidia GPUs, a huge amount of inference is done with them, and while they are not literally being sold to the public, the whole technology should be influencing the next steps of Nvidia just like AMD influenced Intel
TPUs can be more efficient, but are quite difficult to program for efficiently (difficult to saturate). That is why Google tends to sell TPU-services, rather than raw access to TPUs, so they can control the stack and get good utilization. GPUs are easier to work with.
I think the software side of the story is underestimated. Nvidia has a big moat there and huge community support.
My understanding is all of Google's AI is trained and run on quite old but well designed TPUs. For a while the issue was that developing these AI models still needed flexibility and customised hardware like TPUs couldn't accomodate that.
Now that the model architecture has settled into something a bit more predictable, I wouldn't be surprised if we saw a little more specialisation in the hardware.