It is odd that he talks about Larabee so much, but doesn’t mention the Xeon Phis. (Or is it Xeons Phi?).
> As a general trend, CPU designs are diverging into those optimizing single-core performance (performance cores) and those optimizing power efficiency (efficiency cores), with cores of both types commonly present on the same chip. As E-cores become more prevalent, algorithms designed to exploit parallelism at scale may start winning, incentivizing provision of even larger numbers of increasingly efficient cores, even if underpowered for single-threaded tasks.
I’ve always been slightly annoyed by the concept of E cores, because they are so close to what I want, but not quite there… I want, like, throughput cores. Let’s take E cores, give them their AVX-512 back, and give them higher throughput memory. Maybe try and pull the Phi trick of less OoO capabilities but more threads per core. Eventually the goal should be to come up with an AVX unit so big it kills iGPUs, haha.
I've always wondered if you could use iGPU compute cores with unified memory as "transparent" E-cores when needed.
Something like OpenCL/CUDA except it works with pthreads/goroutines and other (OS) kernel threading primitives, so code doesn't need to be recompiled for it. Ideally the OS scheduler would know how to split the work, similar to how E-core and P-core scheduling works today.
I don't do HPC professionally, so I assume I'm ignorant to why this isn't possible.
It is an instance of Larrabee in the same sense as AMD Zen 4 is an instance of Larrabee.
The "Larrabee New Instructions" is an instruction set that has been designed before AVX and also its first hardware implementation has been introduced before AVX, in 2010 (AVX was launched in 2011, with Sandy Bridge).
Unfortunately while the hardware design of Sandy Bridge with the inferior AVX ISA has been done by the Intel A team, the hardware implementations of Larrabee have been done by some C or D teams, which were also not able to design new CPU cores for it, but they had to reuse some obsolete x86 cores, initially a Pentium core and later an Atom Silvermont core, to which the Larrabee instructions were grafted.
"Larrabee New Instructions" have been renamed to "Many Integrated Cores" ISA, then to AVX-512, while passing through 3 generations of chips, Knights Ferry, Knights Corner and Knights Landing. A fourth generation, Knights Mill, was only intended for machine learning/AI applications. The successor of Knights Landing has been Skylake Server, when the AVX-512 ISA has come to standard Xeons, marking the disappearance of Xeon Phi.
Already in 2013, Intel Haswell has added to AVX a few of the more important instructions that were included in the Larrabee New Instructions, but which were missing in AVX, e.g. fused multiply-add and gather instructions. The 3-address FMA format, which has caused problems to AMD, who had implemented in Bulldozer a 4-address format, has also come to AVX from Larrabee, replacing the initial 4-address specification.
At each generation until Skylake Server, some of the original Larrabee instructions have been deleted, by assuming that they might be needed only for graphics, which was no longer the intended market. However a few of those instructions were really useful for some applications in which I am interested, e.g. for computations with big numbers, so I regret their disappearance.
Since Skylake Server, there have been no other instruction removals, with the exception of those introduced by Intel Tiger Lake, which are now supported only by AMD Zen 5. A few days ago Intel has committed to keeping complete compatibility in the future with the ISA implemented today by Granite Rapids, so there will be no other instruction deletions.
> It is an instance of Larrabee in the same sense as AMD Zen 4 is an instance of Larrabee.
This is an odd claim. Clearly Xeon Phi is the shipping version of Larrabee, while Zen 4 is a completely different chip design that happens to run AVX-512. The first shipping Xeon Phi (Knights Corner) used the exact same P54C cores as Larrabee, while as you point out later versions of Xeon Phi switched to Atom.
It is extremely common to refer to all these as Larrabee, for example the Ian Cutress article on the last Xeon Phi chip was entitled "The Larrabee Chapter Closes: Intel's Final Xeon Phi Processors Now in EOL" [1]. Pat Gelsinger's recent interview at GTC [2] also refers to Larrabee. The section from around 44:00 has a discussion of workloads becoming more dynamic, and at 53:36 there's a section on Larrabee proper.
I think it is not right to say that Larrabee and Phi are as distant as Larrabee and Zen. But, they did retreat a bit from the “graphics card” like functionality, and to scale back the ambitions to become something a bit more familiar.
> As a general trend, CPU designs are diverging into those optimizing single-core performance (performance cores) and those optimizing power efficiency (efficiency cores), with cores of both types commonly present on the same chip. As E-cores become more prevalent, algorithms designed to exploit parallelism at scale may start winning, incentivizing provision of even larger numbers of increasingly efficient cores, even if underpowered for single-threaded tasks.
I’ve always been slightly annoyed by the concept of E cores, because they are so close to what I want, but not quite there… I want, like, throughput cores. Let’s take E cores, give them their AVX-512 back, and give them higher throughput memory. Maybe try and pull the Phi trick of less OoO capabilities but more threads per core. Eventually the goal should be to come up with an AVX unit so big it kills iGPUs, haha.