You'd think every computer should be able to divide...

You'd think every computer should be able to divide two numbers, but early microprocessors didn't have division instructions. The Intel 8086 (1978) was one of the first with division. Let's look at how it implemented division and why division is so hard.

Like 8 Apr 2023 at 15:43 | Open on oldbytes.space

11 comments

Ken Shirriff

Computers can divide by performing long division, just like grade school except using binary. This needs a subtract-and-shift loop. For early microprocessors, you'd implement the loop in assembly code. The 8086 implemented the loop in microcode, much faster and more convenient.

8 Apr 2023 at 15:44 | Open on oldbytes.space

Ken Shirriff

Many CPUs use microcode internally: a level of code even lower than machine code. Microcode specifies each step of a machine instruction. Each 8086 micro-instruction is 21 bits long, performing a data move and an action in parallel. Microcode is low-level & hard to understand.

8 Apr 2023 at 15:45 | Open on oldbytes.space

Ken Shirriff

Here's the microcode for the division loop inside the 8086. It does a lot of subtracts and bit rotates (rotate carry left, RCL). An internal 4-bit counter loops through the bits. The photo shows the counter on the die.

8 Apr 2023 at 15:46 | Open on oldbytes.space

Ken Shirriff

Dividing two signed (positive or negative) integers uses more microcode. This microcode makes the divisor and dividend positive, but keeps track of the final sign in an internal flag called F1. After dividing, the quotient's sign is adjusted according to F1.

8 Apr 2023 at 15:46 | Open on oldbytes.space

Ken Shirriff

Later chips use a faster algorithm called SRT. It uses a table to estimate quotient bits two or four at a time. Intel's Pentium chip (1993) missed a few table entries so it occasionally got the answer wrong, the famous FDIV bug. Replacing the bad chips cost Intel $475 million.

8 Apr 2023 at 15:47 | Open on oldbytes.space

Ken Shirriff

Division on the 8086 was very very slow, up to 184 clock cycles due to all the looping. Modern Intel processors are much faster, but division is still slow compared to addition or multiplication. While you can now multiply every clock cycle, divisions need 6-10 clock cycles.

8 Apr 2023 at 15:47 | Open on oldbytes.space

Ken Shirriff

This 8086 die photo shows the main functional blocks. The Arithmetic/Logic Unit (ALU) performs the subtractions and shifts. Microcode is in the ROM at the right. I removed the metal and polysilicon layers for this image so you can see the silicon transistors underneath.

8 Apr 2023 at 15:48 | Open on oldbytes.space

Graham Spookyland🎃/Polynomial

@kenshirriff hah, I never knew it was 1.3337

8 Apr 2023 at 18:55 | Open on chaos.social

Urethramancer🐀

@kenshirriff $475 million is either a firing offence, or a very expensive way to learn to be careful.

9 Apr 2023 at 0:25 | Open on toot.cat

Minoru Saba

@kenshirriff Thanks for jogging distant memories of microcode programming the instruction set for a prototype minicomputer for a subsidiary of a British conglomerate now long gone. Vaguely remember that implementing multiplication was relatively easy; trying to reduce the microcode steps in the division loop to make it go faster was hard.

10 Apr 2023 at 1:08 | Open on toad.social

Dave Bittner

@kenshirriff I have a vague recollection of the 6809 having an advantage over the 6502 when it came to being able to divide more quickly. Interesting stuff!

8 Apr 2023 at 16:06 | Open on hachyderm.io