The title is missing "Latency" which would show many other results on searching. My go to is this one[0] because it's plain text and shows "Syscall" and "Context switch".
Latency numbers every programmer should know
L1 cache reference ......................... 0.5 ns
Branch mispredict ............................ 5 ns
L2 cache reference ........................... 7 ns
Mutex lock/unlock ........................... 25 ns
Main memory reference ...................... 100 ns
Syscall on Intel 5150 ...................... 105 ns
Compress 1K bytes with Zippy ............. 3,000 ns = 3 µs
Context switch on Intel 5150 ............. 4,300 ns = 4 µs
Send 2K bytes over 1 Gbps network ....... 20,000 ns = 20 µs
SSD random read ........................ 150,000 ns = 150 µs
Read 1 MB sequentially from memory ..... 250,000 ns = 250 µs
Round trip within same datacenter ...... 500,000 ns = 0.5 ms
Read 1 MB sequentially from SSD* ..... 1,000,000 ns = 1 ms
Disk seek ........................... 10,000,000 ns = 10 ms
Read 1 MB sequentially from disk .... 20,000,000 ns = 20 ms
Send packet CA->Netherlands->CA .... 150,000,000 ns = 150 ms
Assuming ~1GB/sec SSD
I don't get how expressing these numbers in time unit is useful ?
I've been a developer for embedded systems in the telecom industry for nearly two decades now, and I had never met anyone using something else than "cycles" or "symbols" until today... Except obviously for the mean RTT US<->EU.
then, why not just using qualifiers ? from slowest to fastest. You might not know that, but you can develop bare metals solution for HPC that are used in several industries like telecommunication. Calculation based on cycles are totally accurate whether the number of cores...
> then, why not just using qualifiers ? from slowest to fastest.
Because whether something is 5x slower or 5000x slower matters. Is it better to wait for 10 IOs, random access memory 10000x, or do a network transaction? We can figure out the cost of the memory/memory bandwidth, etc, but we also need to consider latency.
I've done plenty of work counting cycles; but it's a lot harder and less meaningful now. Too many of the things here happen in different clock domains. While it was a weekly way to look at problems for me a couple of decades ago, now I employ it for far less: perhaps once a year.
> Calculation based on cycles are totally accurate whether the number of cores...
No, they're not, because cores contend for resources. We contend for resources within a core (hyperthreading, L1 cache). We contend for resources within the package (L2+ cache lines and thermal management). And we contend for memory buses, I/O, and networks. These things can sometimes happen in parallel with other work, and sometimes we have to block for them, and often this is nondeterministic. In turn, the cycle counts for doing anything within the larger system are really nondeterministic.
Counting cycles works great to determine execution time on a small embedded system or a 1980s-1990s computer, or for a trivial single threaded loop running by itself on a 2020s computer. But most of the time now we need to think account for how much of some other scarce resource we're using (cache, memory bandwidth, network bandwidth, a lock, power dissipated in the package, etc), and think about how various kinds of latencies measured in different clock domains compose.
Not to take away from your point, but I'd argue that counting cycles is usually misleading even for small embedded systems now. It's very difficult to build a system where cycles aren't equally squishy these days.
Things like Cortex-M-- stuff's deterministic. Sure, we might have caches on the high end (M55/88), and contention for resources with DMA, but we can reason about them pretty well.
A few years ago I was generating NTSC overlay video waveforms with SPI from a cortex-M4 while controlling flight dynamics and radio communications on the same processor. RMS Jitter on the important tasks was ~20 nanoseconds-- 3-4 cycles, about a factor of 100x better than the requirement.
But I guess you're right: you could also consider something like a dual-core Cortex-A57 quite small, where all the above complaints are true.
Because it's something very different. I was expecting standalone numbers that would hint to the user something is wonky if they showed up in unexpected places - numbers like 255 or 2147483647.