This misconception is repeated time and time again; software support of their datacenter-grade hardware is just as bad. I've had the displeasure of using MI50, MI100 (a lot), MI210 (very briefly.) All three are supposedly enterprise-grade computing hardware, and yet, it was a pathetic experience with a myriad of disconnected components which had to be patched, & married with a very specific kernel version to get ANY kind of LLM inference going.
Now, the last of it I bothered with was 9 months ago; enough is enough.
What a load of nonsense. MI210 effectively hit the market in 2023, similarly to H100. We're talking about datacenter-grade, two-year out of date card, and it's already "ancient history?"
The quantity of people on this site now that care about GPUs all of a sudden because of the explosion of LLMs, who fail to understand that GPUs are _graphics_ processors that are designed for _graphics_ workloads is insane. It almost feels like the popular opinion here is that graphics is just dead and AMD and NVIDIA should throw everything else they do in the bin to chase the LLM bag.
AMD make excellent graphics hardware, and the graphics tools are also fantastic. AMD's pricing and market positioning can be questionable but the hardware is great. They're not as strong with machine learning tasks, and they're in a follower position for tensor acceleration, but for graphics they are very solid.
The quantity of people on this site now that think they understand modern GPUs because back in the day they wrote some opengl...
1. Both AMD and NVIDIA have "tensorcore" ISA instructions (ie real silicon/data-path, not emulation) which have zero use case in graphics
2. Ain't no one playing video games on MI300/H100 etc and the ISA/architecture reflects that
> but for graphics they are very solid.
Hmmm I wonder if AMD's overfit-to-graphics architectural design choices are a source of friction as they now transition to serving the ML compute market... Hmmm I wonder if they're actively undoing some of these choices...
AMD isn't overfit to graphics. AMD's GPUs were friendly to general purpose compute well before Nvidia was. Hardware-wise anyway. AMD's memory access system and resource binding model was well ahead of Nvidia for a long time. When Nvidia was stuffing resource descriptors into special palettes with addressing limits, AMD was fully bindless under the hood. Everything was just one big address space, descriptors and data.
Nvidia 15 years ago was overfit to graphics. Nvidia just made smarter choices, sold more hardware and re-invested their winnings into software and improving their hardware. Now they're just as good at GPGPU with a stronger software stack.
AMD has struggled to be anything other than a follower in the market and has suffered quite a lot as a result. Even in graphics. Mesh shaders in DX12 was the result of NVIDIA dictating a new execution model that was very favorable to their new hardware while AMD had already had a similar (but not perfectly compatible) system since the Vega called primitive shaders.
This feels backwards to me when GPUs were created largely because graphics needed lots of parallel floating point operations, a big chunk of which are matrix multiplications.
When I think of matrix multiplication in graphics I primarily think of transforms between spaces: moving vertices from object space to camera space, transforming from camera space to screen space, ... This is a big part of the math done in regular rendering and needs to be done for every visible vertex in the scene - typically in the millions in modern games.
I suppose the difference here is that DLSS is a case where you primarily do large numbers of consecutive matrix multiplications with little other logic, since it's more ANN code than graphics code.
You could argue it's all the nice GPU debugging tools nVidia provides which makes GPU programming accessible.
There are so many potential bottlenecks (normally just memory access patterns, but without tools to verify you have to design and run manual experiments).