From the paper, "Notably, for multiplying two 4 × 4 matrices, applying the algor...

lf88 · 2025-05-14T19:38:23 1747251503

By googling "4x4 matrices multiplication 48" I ended up on this discussion on math.stackexchange https://math.stackexchange.com/questions/578342/number-of-el... , where in 2019 someone stated "It is possible to multiply two 4×4 matrix A,B with only 48 multiplications.", with a link to a PhD thesis. This might mean that the result was already known (I still have to check the outline of the algorithm).

alexnovikov · 2025-05-15T17:14:29 1747329269

One of the authors here. We are aware of the Winograd scheme, but note that it only works over commutative rings, which means that it's not applicable recursively to larger matrices (and doesn't correspond to a rank 48 factorization of the <4,4,4> matrix multiplication tensor). The MathOverflow answer had a mistake corrected in the comments by Benoit Jacob.

More details: the Winograd scheme computes (x1+ y2 )(x2+ y1 ) + (x3+y4)(x4+y3)-Ai-Bj, and relies on y2y1 (that comes from expanding the first brackets) cancelling with y1y2 in Bj=y1y2 + y3y4. This is fine when working with numbers, but if you want to apply the algorithm recursively to large matrices, on the highest level of recursion you're going to work with 4x4 block matrices (where each block is a big matrix itself), and for matrices Y2Y1 != Y1Y2 (for general matrices).

Here is a website that tracks fastest (recursively applicable) matrix multiplication algorithms for different matrix sizes, and it stands at 49: https://fmm.univ-lille.fr/4x4x4.html

UPD: s/fields/rings/ and fixed equation rendering

robinhouston · 2025-05-14T20:53:59 1747256039

From some conversations on Twitter, it seems plausible that the rank-48 decomposition of the 4×4 matrix multiplication tensor really is new; and that perhaps where things have gone awry is attempting to summarise this result in a more lay-friendly manner: the algorithm in that post apparently doesn't constitute or imply a rank-48 tensor decomposition.

On the other side, it's claimed here that an algorithm that uses only 46 multiplications has been known since 1970: https://mathstodon.xyz/@fredrikj/114508287537669113

looofooo0 · 2025-05-15T09:58:20 1747303100

Ironically their AI can cite the relevant paper with 46 steps if asked: https://gemini.google.com/share/b0d5d6a76c87

wbhart · 2025-05-14T21:44:53 1747259093

As already noted in a post by fdej further down, Waksman's algorithm from 1970, which works over the complex numbers, requires only 46 multiplications (and I guess, divisions by 2, which may or may not be relevant depending on your actual ring).

nemonemo · 2025-05-14T22:27:06 1747261626

The answer says "For rings in which division by 2 is permitted". Is there the same constraint for AlphaEvolve's algorithm?

Edit2: Z_2 has characteristics 2.

Edit: AlphaEvolve claims it works over any field with characteristic 0. It appears Waksman's could be an existing work. From the AlphaEvolve paper: "For 56 years, designing an algorithm with fewer than 49 multiplications over any field with characteristic 0 was an open problem. AlphaEvolve is the first method to find an algorithm to multiply two 4 × 4 complex-valued matrices using 48 multiplications."

fdej · 2025-05-14T22:36:09 1747262169

If you don't want to allow division by 2 then there is Winograd's algorithm from 1967 which works over any commutative ring and uses 48 multiplications for 4 x 4.

wbhart · 2025-05-14T22:47:17 1747262837

Z_2 has characteristic 2, not 0.

nemonemo · 2025-05-14T23:07:27 1747264047

Thank you. Updated my comment again.

mik09 · 2025-05-20T09:47:54 1747734474

So, did LLM (namely Gemini-Flash) helepd with the combinatorial optimization process? I'm sure not all of their discoveries (one on kissing numbers, etc.) have previous solutions in some other form, but yeah these findings looks more like very large combinatorial optimization tasks.

jackpirate · 2025-05-14T19:11:27 1747249887

It seems like you have some misconceptions about Strassen's alg:

1. It is a standard example of the divide and conquer approach to algorithm design, not the dynamic programming approach. (I'm not even sure how you'd squint at it to convert it into a dynamic programming problem.)

2. Strassen's does not require complex valued matrices. Everything can be done in the real numbers.

pontus · 2025-05-14T21:03:19 1747256599

I think the OP was pointing out that the reason Strasssen's algorithm works is that it somehow uncovered a kind of repeated work that's not evident in a simple divide and conquer approach. It's by the clever definition of the various submatrices that this "overlapping" work can be avoided.

In other words, the power of Strasssens algorithm comes from a strategy that's similar to / reminiscent of dynamic programming.

kenjackson · 2025-05-14T19:16:58 1747250218

I think the original poster was referring to the AlphaEvolve variant of Strassen's, not the standard Strassen (with respect to complex values).

meindnoch · 2025-05-15T08:49:39 1747298979

>What I find interesting is that it seems the extra savings requires complex values. There must be something going on in the complex plane that is again over-counting with the naive approach.

"The rank of a tensor depends on the field over which the tensor is decomposed. It is known that some real tensors may admit a complex decomposition whose rank is strictly less than the rank of a real decomposition of the same tensor."

https://en.wikipedia.org/wiki/Tensor_rank_decomposition#Fiel...

Zr01 · 2025-05-15T07:42:46 1747294966

A complex multiplication is "worth" at least 3 real multiplications.

alexnovikov · 2025-05-16T08:05:00 1747382700

Fair point! A single complex multiplication `(a+bi)(c+di)` indeed requires at least 3 real multiplications to be implemented.

However, when researchers (and systems like AlphaEvolve in this context) analyze fast matrix multiplication algorithms like Strassen's, the primary goal is usually to improve the asymptotic complexity (and understand the space of these algorithms better). This complexity is determined by the number of multiplications in the field over which the matrices are defined. * For real matrices, we count real scalar multiplications. * For complex-valued matrices (as in the 4x4 example where AlphaEvolve found a solution with 48 scalar multiplications), "scalar multiplication" refers to a complex scalar multiplication.

The key is that these are the operations you recurse on. Additions, or the constant factor cost of implementing one field's multiplication, don't change the exponent in the `N^(log_base(multiplications))` complexity. They are constant factors.

Of course, for practical performance on a specific 4x4 matrix, one would absolutely dive into the real operations, additions, memory layout, etc., but making 4x4 matrix multiplication practically faster on some particular hardware was not the focus in this section. (We do improve practical implementation of large matrix multiplications on the target hardware in the “Enhancing AI training and inference” section of the blog post.)

(Disclaimer: one of the authors.)

Davidzheng · 2025-05-15T00:58:12 1747270692

Are you sure the saving needs complex values? I think their algorithm works over any char 0 field. Probably needs to just divide by some divisor of 4!=24 if I had to guess.

meindnoch · 2025-05-15T08:47:14 1747298834

Their decomposition of the (4,4,4) matrix multiplication tensor is explicitly listed in their Colab notebook, and it contains complex numbers.