Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Why is division so much more complex than other arithmetic operations? (scicomp.stackexchange.com)
112 points by rongenre on Aug 11, 2023 | hide | past | favorite | 93 comments



Division has the same big-O complexity as multiplication.

https://en.wikipedia.org/wiki/Computational_complexity_of_ma...


Which doesn't imply anything about how fast the operations can be implemented for fixed size numbers.

Also, did you even read the article you linked? The lowest complexity for Multiplication is an open question, absurd to say that it "has the same big-O complexity".


You last point there is nonsense. We can reduce division to multiplication, so we can say division is O(M(n)), where M(n) is the complexity of multiplication, even if we don't know what M(n) is. Note that big-O is an upper asymptotic bound, not an exact asymptotic bound (that's big-Theta).

In fact, one can also show that M(n) is O(D(n)) (where D(n) is the time needed to divide n bit numbers). So they really are big-theta of each other.

(See Aho, Hopcroft, and Ullman, The Design and Analysis of Computer Algorithms, 1976, section 8.2)


>> We can reduce division to multiplication

How? I mean sure, if taking the reciprocal is O(1) which it certainly is not.


As explained in the Wikipedia article, we use the Newton–Raphson method method with O(log n) steps to find the reciprocal of the divisor, then we multiply this reciprocal by the dividend. Each step of the reciprocal method reqires a single multiplication, with the precision doubling at each step. So the steps take M(1), M(2), M(4), ..., M(n/2), M(n) time. Since multiplication isn't sublinear, M(cn) ≥ c·M(n) for c ≥ 1 and "most" n (very roughly); thus, M(n) + M(n/2) + M(n/4) + ... ≤ M(n) + 1/2·M(n) + 1/4·M(n) + ... ≤ 2·M(n). Adding the final multiplication, we get an upper bound of 3·M(n), or O(M(n)).


See the book referenced. Basically, if you implement division by Newton-Raphson, using multiplication as a subroutine, the complexity is O(M(n)).

In the other direction, one can use division to do multiplication by: (1) doing reciprocal with division, (2) doing squaring with reciprocal, and (3) doing multiplication with squaring. All these operations (division, multiplication, reciprocal, and squaring) turn out to be equivalent in complexity up to a constant factor. All this is asymptotically in n, the number of bits.


And we can reduce multiplication to additions, so something does not add up.


Not to a constant number of additions (or, more precisely, to a set of additions whose sizes sum to O(n)), no that is not known how to do that.


> We can reduce division to multiplication

And I guess we can reduce divison to multiplication too.


I guess this is some sort of remainder joke? But I don't get it.


and to subtraction or multiplication and further to counting


You don't understand what "reduce to" means in this context. In the intended meaning (where, X reduces to Y means that if we can do Y in O(T(n)) time, we can do X in O(T(n)) time) those aren't known. We can't reduce division to a constant number of additions of the same size.


Adding on to this, and related somewhat, binary addition even with look ahead is a multi clock cycle process for non trivial word sizes. Arithmetic is not trivial by any means. We all just take it for granted.


By "non-trivial word sizes," I presume you mean larger than a single word of most computers? My understanding is that modern BDD/ZDD techniques have gone a long way to showing how to do the entire operation in a single pass for most sized numbers. Such that propagating the carry information isn't really how it is done, anymore. (This is a genuine question, btw. I have been a long way from my VLSI classes back in college.)


my ignorant imagination says to look for the sweetspot between a lookup table and chunks as large as possible.


Addition can be performed with circuits of depth O(log n), where n is the number of bits, and with a little more work also of size O(n).


I'm confused. Since O(log n) is faster than O(n) why would O(n) require a little more work? Did I misread something or did you miswrite?


I'm talking about constructing boolean circuits. The usual parallel prefix algorithm yields a circuit of depth O(log n) with O(n log n) gates. Using some trickery, however, this can be reduced to O(n) gates, with a constant factor increase in depth.

https://en.wikipedia.org/wiki/Prefix_sum https://en.wikipedia.org/wiki/Adder_(electronics)


Constant depth requires unbounded fan-in, which may not be available in practice


Yes, I was talking about using gates of bounded fan-in.


Not for cryptographically useful group like elliptic curves. Multiplication is defined as additions and can be made fast by repeatedly doubling and add. The inverse function, division, is incomputable for large curves.


Sorry, I should have specified this was about integer multiplication.


It should- division is just reverse multiplication.


Some operations have much more complex inverses, in a computational sense, so I don't think it's so simple. The key to showing this is to reduce the problem to its inverse: https://en.wikipedia.org/wiki/Reduction_(complexity)


But "reverse" is a division, and anyway, one operation and its "reverse" are not necessarily of symetric complexities.

It is very easy to multiply an integer by itself, not so much to compute the square root of an integer.

It is very easy to multiply 3x3x5, no so much to find the prime factors of 45.


Apple’s M1 has a monstrously fast divide instruction.

There are some benchmarks here: https://ridiculousfish.com/blog/posts/benchmarking-libdivide...

I noticed it is very fast when I was experimenting with algorithms for unbiased bounded random numbers https://dotat.at/@/2022-04-20-really-divisionless.html - the fancy nearly and really divisionless algorithms had much less advantage on an Apple M1 than on an old Intel CPU.


It's actually not more "complicated" in the sense of operation count. You divide in the standard algorithm by: shifting your divisor so it's within one bit of the dividend (or one "place" if you're doing non-binary math), subtracting (or doing a one-digit divide for human numbers), and then shifting the result left by one place and repeating until you're out of divisor.

Multiplication is completely symmetric: shift off the lowest bit (digit), multiply (which is just logical AND for binary), add to the result, shift the result right, and repeat.

The reason multiplication is faster than division isn't about complexity it's about dependency. The division steps depend on the previous iteration, where multiplication can be parallelized.


It's a slight tangent, but in the general realm of division being complicated, divide by zero is a longstanding annoyance. Does anyone know (or can refer to) a reasonable definition of +,-,*,/,% (in their usual meanings) on rational numbers such that A op B always evaluates to a rational?

A/1 divide 0/1 being stored as 1/0 and then propagated around seems reasonable but I keep putting off the bookwork of finding out what arithmetic relations still hold at that point. I think one loses multiply by zero folding to zero, for example A * 0 is now only 0 for A known to have non-zero denominator.

Context is a language with arbitrary precision rationals as the number type, and all I'm really looking for is a way to declare that the type of division is a rational for any rational arguments. Searching for this mostly turns up results on the naturals or integers which I'm not particularly interested in.

Long shot but worth asking. Thanks


Does anyone know (or can refer to) a reasonable definition of +,-,*,/,% (in their usual meanings) on rational numbers such that A op B always evaluates to a rational?

Given the usual definition of multiplication (and division as its inverse), this is not possible:

Assume some multiplicative inverse 0⁻¹ of 0 exists. Then, writing · for multiplication, since

a·0 = b·0

for all rational a and b, we have

a·0·0⁻¹ = b·0·0⁻¹

and therefore

a = b

for all rational a and b, which is absurd.


As for extending the rationals by 0⁻¹ and giving up properties, a·0 = 0 can still hold for all a ≠ 0⁻¹.

To get an idea of what operations do and don't make sense when extending the rationals in the way you propose, define ∞ as our 0⁻¹ and refer to

https://en.wikipedia.org/wiki/Projectively_extended_real_lin...

substituting ℚ and ℚ̂ for ℝ and ℝ̂.

For a similar construction that maintains ordering properties, see

https://en.wikipedia.org/wiki/Extended_real_number_line#Arit...

Finally, note that the complex number equivalent to your construction is really interesting and useful:

https://en.wikipedia.org/wiki/Riemann_sphere

https://www-users.cse.umn.edu/~arnold/moebius/


The Riemann sphere has always existed beyond my grasp of mathematics. I remember the maths undergraduates being excited about it. Thank you for the references, it looks like I need to learn more before proceeding with certainty.


In ordinary arithmetic and algebra, there isn't any reasonable definition of n/0 because if it is allowed, then one can prove contradictory things.

Outside of proof systems/pure math, like in software you write, you can define n/0 to be anything you like, you just can't depend on being able to derive mathematically consistent results.

There are also more exotic math systems. Infinity is not literally a number (it would lead to contradictions), but mathematicians can happily deal with numbers systems augmented with infinity as an extra member -- one just has to step carefully, since it still isn't a number. The "number system" has one non-number member requiring special handling.

Similarly there are modern approaches to infinitesimals where there's a thing "e" which is not zero, but e^2 is zero. That is not normal arithmetic but when handled carefully it can be useful.


The standard arithmetic classes of Isabelle/HOL show that you can extend division with the equation `x / 0 = 0` and things (seemingly) work out.

https://lawrencecpaulson.github.io/2021/12/01/Undefined.html

https://www.hillelwayne.com/post/divide-by-zero/

https://xenaproject.wordpress.com/2020/07/05/division-by-zer...

You'll need to do some thinking/proof to find out if it works for you.


Huh. That's surprising but quite persuasive, thank you for the references. I like the general view that a non-axiomatic divide can't introduce unsoundness so define it however is useful.

It has somewhat kicked the can down the road to defining the multiplicative inverse, but that's still a good step forward. Thank you


Not a very satisfying answer (feels like if a satisfying answer to this question was possible, maths would look very different) but I think the trick to not getting annoyed by it is to shift your perspective rather than try and redefine maths. It's not that division is a badly behaved operator; it's that division isn't an operator. It's a shorthand for multiplication by the reciprocal. Since the reciprocal of 0 doesn't exist, we can no more multiply by it than we can compute 5 + the sound of one hand clapping. And we don't fret over how the addition operator's inadequacy there is unsatisfying!


It's worth noting that you can generally divide much faster if you know the divisor ahead of time, because you can turn it into a multiplication problem. But doing this takes some effort, as compared to turning subtraction into addition which is trivial.


Its worth noting that this asymmetry between addition and multiplication happens because our usual representations of numbers (i.e. on a piece of paper or in a a computer) favour addition over multiplication. You can come up with different representations of rationals, for example you could store prime exponents as integers so the number 20 gets stored as (2,0,1,0...) because its 2^2 × 3^0 × 5^1, or 5/6 would be (-1, -1, 1,...). With a representation like this, or just by taking logs if you like, you can come up with a representation where multiplication and division are very cheap but addition and subtraction are more expensive.


Nitpick, 50 is 2^1 × 5^2, not the other way around. But anyway, thanks a ton for this post, I felt my brain shift at a new way of thinking about numbers, seriously. I have to wonder what serious and useful implementations of this method might have been done somewhere.


By the way, if you're wondering if anyone has done anything useful with the method of describing numbers as prime exponents I mentioned the answer is yes, depending on your definition of "useful".

The encoding in terms of prime exponents is known as the Godel encoding, and its a fairly important step in proving things like Godel's incompleteness theorems and the undecidability of the Halting problem.

Essentially it's a one-to-one map between natural numbers and finite sequences of natural numbers (since you can encode (a,b,c,d...) as 2^a × 3^b × 5^c × 7^d etc), which turns out to be mathematically a convenient thing to have.


Thanks :) I edited my comment so the arithmetic is (hopefully) right now


Stated very broadly, I think of the mathematical “design space” as the set {operations, representations}.

Classic examples include:

- time domain / frequency domain

- lookup table / functional form

- traditional binary representation / gray codes


Before mechanical calculators, large tables of logarithms were computed by hand and used for exactly this purpose.


Makes me wonder... Is it known whether there is some intermediate representation that makes both fast?


I don't think that anything exists which is going to be better than the normal representations we use in anything except fantastically niche situations.


Yeah you store the log of every number.


Could you explain a little more of post a reference?


If

    A × B = C 
then

    log A + log B = log C

And if

    D^E = F
then

    E × log D = log F
Mostly a tongue in cheek comment but if you don't know log rules at all check out log tables [0]. And you'll see that if you use log values then multiplication and division become addition and subtraction and equally simply.

Relies on you either having stored the needed data or efficiently calculating logirithms and antilogirithims.

https://www.cuemath.com/algebra/log-table/


That's essentially how IEE 754 floating-point works. You can think of it as a piece-wise linear approximation of the base-2 logarithm.

For non-negative values, the exponent (upper) bits give the integer part of the logarithm and the significand (lower) bits approximate the fractional part (i.e., the mantissa).

In other words, you can do:

    float approxLog2( float value )
    {
        assert( value > 0.0f );
        auto bits = std::bit_cast< int32_t >( value );
        auto exponent = ( bits >> 23 ) - 127;
        auto significand = bits & ( ( 1 << 23 ) - 1 );
        return exponent + significand / static_cast< float >( 1 << 23 );
    }
This will be exact at powers of two, and within 0.0860748291 of the true value in between (always being slightly too low).


Ah, I get it now. All we need is a logarithm oracle and we're good to go :)


Actually, everything will work out fine if we just could pick the optimal algorithm for each situation. How? Use an oracle to pick the right algorithm!


Indeed, with lookup tables, if you've seen one, you've seen them all


Unfortunately, if you've seen them all, you also need to store them all :-)


I was going to add a less eloquent comment along these lines, but you got here first!


Please share it if you can recall. As someone who just figured out what Base-2 and Base-10 meant, I’m eager for all sorts of Mathematica.


Dr. Douglas Jones wrote a pretty great article on the subject, https://homepage.cs.uiowa.edu/~jones/bcd/divide.html. Depending on the divisor it can be a bit messy, since reciprocal multiplication sometimes needs extra precision. For example, on an 8-bit machine, x/6 == (x*42>>8), which is nice because 42 fits in 8 bits. But x/105 == (x*313>>15) needs 9 bits to hold 313. On some systems (thinking 8-bit AVR), it'd probably be faster to branch on the 3 cases rather than perform a full 16-bit multiplication.


To dive into this a bit more without using too much math jargon, under modulo every integer division is a trivial integer multiplication but the number you multiply by has to figured out before hand.

The above is also not true of factors of the modulo itself but then that's just a right shift so that part is easy.

eg. Under mod 35 if i want to divide by 3 i can just multiply by 12 since 12x3 = 1 under mod 35.

eg2. Under mod 35 if i want to divide by 11 i can just multiply by 16 since 16x11 = 1 under mod 35.

I can do this for any number that doesn't share factors with 35 since there's always a multiple to that number that will give 1. I could also create base 2 examples similarly. It's called the multiplicative inverse.


To clarify, there are two separate notions of division-as-multiplication discussed here.

Modular division is straightforward to convert to a multiplication; however this is mainly for specialized applications, like cryptography and number theory. It's uncommon to want divide-by-2 to make the value larger.

Ordinary flooring division can also be converted to a multiplication problem when the dividend is bounded. This uses the "magic number" approach where you multiply by a rounded scaled reciprocal, and then do some work to correct the error from rounding.


Can you explain what do you mean with:

>It's worth noting that you can generally divide much faster if you know the divisor ahead of time

Division is just multiplication with 1/x so i don't understand


For flooring integer division with a bounded dividend and known constant, the idea is to multiply by a scaled reciprocal. For example with 32 bit division by 3, you might precompute 2^32 / 3, multiply by that, and then flooring-divide by 2*32 (right shift).

It gets tricky because 2^32/3 is not an integer, so you must round it. This introduces error; the idea is to show that the error is wiped out by the flooring divide. I did the monster math here:

https://ridiculousfish.com/blog/posts/labor-of-division-epis...


Exactly. If you know you’re going to later be dividing by 23.7, you can precompute 1/23.7 and store that as a constant. Then you can later multiply by it.


Yes but 1/x is just another division, so it only helps to speed up the final division of you're able to calculate 1/x beforehand.


For integer division, you may find that rounding is not identical after multiplying by the reciprocal.


Doesn't the risk of lack of precision apply to both floating point and fixed-point? I'd think the reciprocal would need to be exactly represented in the type, or to have more bits than the original type to produce a correct result in the last bit (or two bits?).

Also, with integers, a signed right shift is rounding down (towards negative infinity), whereas the division operator/instruction in many languages/hardware is rounding towards 0.

To adjust the rounding, you'd add the sign-bit to the first fractional bit before shifting the last step. Let's say that 'x' is a signed long, and a signed long has 64 bits, then:

result = ((x >> amount-1) + ((unsigned long)x >> 63)) >> 1;


> Doesn't the risk of lack of precision apply to both floating point and fixed-point? I'd think the reciprocal would need to be exactly represented in the type, or to have more bits than the original type to produce a correct result in the last bit (or two bits?).

Yes, but this was historically okay on an x87 FPU which had more precise representation than the common external formats.


He is talking about the fact that compilers can turn division by a constant into multiplications and bit shifts which are much faster.

Here's an example: https://godbolt.org/z/8vG637fb5

It compiles x/6 to some mad multiplication, bitshift and add.


How do you calculate 1/x without using division?


Not being snarky, at the simplest level by knowing x ahead of time you can change your code from y = n/2 into y = n * 0.5. Not wildly important by itself, but in a loop - maybe… I suspect compilers optimize for that already.


I still don't get it. How do you know that 1/x = 0.abc... without performing the division? I mean in general case, not in something matching binary tricks. Such as 1/3 for example. Unless you mean you somehow know the value of 1/x ahead of time. But where does it come from?


If you know the divisor x ahead of time you can pre-compute 1/x at compile time, so that your actual compiled code never does the division--only your compiler does. Your actual compiled code just does the multiplication by a pre-computed constant (and the compiled code doesn't have to know that that constant was pre-computed as the reciprocal of x).


Ah, thanks for clarifying.


Often, you can amortize a division or reciprocal by calculating it once and then reusing it. Frequently the divisor is dynamic, but reused locally.

For example, if you want to normalize a 3D vector you could do:

    mag = sqrt(x*x + y*y + z*z)
    x /= mag
    y /= mag
    z /= mag
That's three divisions with the same divisor. But you could instead do:

    invMag = 1.0 / sqrt(x*x + y*y + z*z)
    x *= invMag
    y *= invMag
    z *= invMag
There's still a single division (or reciprocal) done here. But you've eliminated at least the other two. (And it's even better if you have an rsqrt function.)


so basically just precompute all the answers you will need and then just look up the values later?


There’s no lookup needed if the value is constant



I enjoyed the economics answer and comment by Mark Booth:

> If generic floating point division were more important to modern CPU's then it might make sense to dedicate enough silicon area to make it single cycle, however most chip makers have obviously decided that they can make better use of that silicon by using those gates for other things. ..

> CPU manufacturers would only dedicate a large swath of silicon to a low latency floating point divide unit if they couldn't use that silicon more effectively to speed up the CPU in other ways. Generally speaking though, having more long latency FDIVs are a more efficient use of silicon than fewer shorter latency FDIVs


But you'd have to cram so many levels logic into that single cycle that you'd have to decrease the clock frequency to meet timing constraints, meaning that code that is not dominated by division would slow down. And a multi-cycle division can often happen in parallel with other operations.


If you simply try to take a design and make it take a single cycle while using the same area, yes. But usually you can trade off area (and power) for latency, by essentially having redundant logic throughout the operation. That's the point being made: the CPU manufacturers don't really consider it worth pushing this tradeoff (which is often quite non-linear).


Remember the Cray-1. It had no division instruction but could perform reciprocals. Division was performed as 1 reciprocal and 2 or 3 multiplications (depending on the precision needed). Thus division was only a little more complex than multiplication. (A reciprocal instruction took about twice the time of a multiply.)

See https://www.ed-thelen.org/comp-hist/CRAY-1-HardRefMan/CRAY-1...

So really, it's tradition that division (which is needed relatively rarely) is not performed quickly in hardware. The old time versus silicon tradeoff strikes again - you can have it quick but it will be expensive, how much do you want it?

Re the original question - (almost) anything can be done in enough silicon. Both multiplication and division could be done as very fast "ROM" lookups, but that would use an enormous amount of silicon.


I find it interesting that the arithmetic operations which are more difficult for school children are also more difficult for computers (addition/subtraction < multiplication < division).


Unless you’re talking about floating point in which case it’s multiplication < addition < division. But your point stands.


Multiplication/division include addition as a special case, at least if you allow adding 1:

a+b = a + a(b/a) = a(1 + (b/a))


Division is a variation of the packing problem which (without constraint) is NP Hard.


If you set up a logarithmic number system, division would be easy but addition and subtraction would be very difficult, worth noting that difficulty is a function of the representation.


I assume for most programming languages there is no speed difference in writing

   y = x/5.0
vs.

   y = 0.2 * x
Is that true?


Yes, but computing an integer into it's reciprocal (which is what you're doing in decimal) is the computation that makes it more complex, you're just offloading that from the machine to your head. Try doing this with x/31, show me the operation to convert 31 into it's reciprocal in decimal and you'll see where the complexity lies.


Division it's just the inverse of multiplication, which is another multiplication.


conceptually yes, computationally no. For starters, no tricky rules to observe when multiplying, but when dividing? oh boy, we best not put zeros wherever we please or we shit the bed real quick. On top of that, you ever multiplied integers into non-integers? I haven't. Real easy to get real fuck numbers REAL fast if you're not careful when dividing. Add in floating points? forget about it. Wanna multiply to some big number using only small primes? easy! wanna divide down to smallest possible prime factors? good luck, buddy. Division is a whole different ball game. Cryptography relies on that fact.


The result you get can be reversed with multiplication, yes. But the operation of devision is not simply reversing of the operation of multiplication. Subtraction is the exact opposite operation of addition. Division is multiplication by a reciprocal.


Entropy.

Kind of like recovering the creamer from your coffee, only with symbols.


Probably because add, sub, and mul are int -> int and div is int -> real. The fundamental operation maps from ℵ0 to ℵ1, which the other operations don't do. Intuitively, the complexity comes from this fact.


1. We're discusing here floored integer division, which outputs an integer, not something more general like a real.

2. Even if we were discussing exact division, the ratio of two integers is a rational. The rationals are countable. More generally, any image of a countable set is countable, and the cartesian product of two countable sets is countable; there is no way you could get something uncountable from something like this.

3. Aleph_1 is not (necessarily) the same thing as the cardinality of the reals, which is 2^(aleph_0). The statement that these are the same is known as the continuum hypothesis and is undecideable in standard mathematics.

...and, also, none of this really relates to the complexity of performing integer division, even ignoring the fact that these statements are false.


It’s not closed on the reals


Computers can't represent reals, and it's closed on IEEE floating point values (with Inf and NaN).




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: