Which doesn't imply anything about how fast the operations can be implemented for fixed size numbers.
Also, did you even read the article you linked? The lowest complexity for Multiplication is an open question, absurd to say that it "has the same big-O complexity".
You last point there is nonsense. We can reduce division to multiplication, so we can say division is O(M(n)), where M(n) is the complexity of multiplication, even if we don't know what M(n) is. Note that big-O is an upper asymptotic bound, not an exact asymptotic bound (that's big-Theta).
In fact, one can also show that M(n) is O(D(n)) (where D(n) is the time needed to divide n bit numbers). So they really are big-theta of each other.
(See Aho, Hopcroft, and Ullman, The Design and Analysis of Computer Algorithms, 1976, section 8.2)
As explained in the Wikipedia article, we use the Newton–Raphson method method with O(log n) steps to find the reciprocal of the divisor, then we multiply this reciprocal by the dividend. Each step of the reciprocal method reqires a single multiplication, with the precision doubling at each step. So the steps take M(1), M(2), M(4), ..., M(n/2), M(n) time. Since multiplication isn't sublinear, M(cn) ≥ c·M(n) for c ≥ 1 and "most" n (very roughly); thus, M(n) + M(n/2) + M(n/4) + ... ≤ M(n) + 1/2·M(n) + 1/4·M(n) + ... ≤ 2·M(n). Adding the final multiplication, we get an upper bound of 3·M(n), or O(M(n)).
See the book referenced. Basically, if you implement division by Newton-Raphson, using multiplication as a subroutine, the complexity is O(M(n)).
In the other direction, one can use division to do multiplication by: (1) doing reciprocal with division, (2) doing squaring with reciprocal, and (3) doing multiplication with squaring. All these operations (division, multiplication, reciprocal, and squaring) turn out to be equivalent in complexity up to a constant factor. All this is asymptotically in n, the number of bits.
You don't understand what "reduce to" means in this context. In the intended meaning (where, X reduces to Y means that if we can do Y in O(T(n)) time, we can do X in O(T(n)) time) those aren't known. We can't reduce division to a constant number of additions of the same size.
Adding on to this, and related somewhat, binary addition even with look ahead is a multi clock cycle process for non trivial word sizes. Arithmetic is not trivial by any means. We all just take it for granted.
By "non-trivial word sizes," I presume you mean larger than a single word of most computers? My understanding is that modern BDD/ZDD techniques have gone a long way to showing how to do the entire operation in a single pass for most sized numbers. Such that propagating the carry information isn't really how it is done, anymore. (This is a genuine question, btw. I have been a long way from my VLSI classes back in college.)
I'm talking about constructing boolean circuits. The usual parallel prefix algorithm yields a circuit of depth O(log n) with O(n log n) gates. Using some trickery, however, this can be reduced to O(n) gates, with a constant factor increase in depth.
Not for cryptographically useful group like elliptic curves. Multiplication is defined as additions and can be made fast by repeatedly doubling and add. The inverse function, division, is incomputable for large curves.
Some operations have much more complex inverses, in a computational sense, so I don't think it's so simple. The key to showing this is to reduce the problem to its inverse: https://en.wikipedia.org/wiki/Reduction_(complexity)
https://en.wikipedia.org/wiki/Computational_complexity_of_ma...