floating point rounding errors are still deterministic. Parallelism dynamics can...

simonw · 2025-07-01T17:51:17 1751392277

Here's something that isn't deterministic:

   a = 0.1, b = 0.2, c = 0.3
   a * (b * c) = 0.006
   (a * b) * c = 0.006000000000000001

If you are running these operations in parallel you can't guarantee which of those orders the operations will complete in.

When you're running models on a GPU (or any other architecture that runs a whole bunch of matrix operations in parallel) you can't guarantee the order of the operations.

zelphirkalt · 2025-07-01T19:45:14 1751399114

The order of completion doesn't necessarily influence the overall result of a parallelized computation. This depends on how the results are aggregated. For example for reducing floating point error in calculating a sum of floating point numbers, you could have a sorting step before calculating the sum and then start summing up from the lowest values to the higher ones. Then it doesn't matter at all which of the values is calculated first, because you need them all anyway, to sort them and once they are sorted, the result will always be the same, given same input values.

So you can see, completion time is a completely orthogonal issue, or can be made one.

And even libraries like tensorflow can be made to give reproducible results, when setting the corresponding seeds for the underlying libraries. Have done that myself, speaking from experience in a machine learning setting.