Email or username:

Password:

Forgot your password?
Ken Shirriff

Intel launched the Pentium processor in 1993. Unfortunately, dividing sometimes gave a slightly wrong answer, the famous FDIV bug. Replacing the faulty chips cost Intel $475 million. I reverse-engineered the circuitry and can explain the bug. 1/9

A die photo of the Pentium processor with the main functional blocks labeled including the caches, instruction fetch and decode, integer execution, and floating point. The image consists of complex patterns of rectangular regions in reddish and brownish colors. The image zooms in on a small part of the floating point unit giving a detail of an adder and PLA circuit.
54 comments
Ken Shirriff

The Pentium uses a division algorithm called SRT. It generates two bits at a time, making division twice as fast. SRT's secret is quotient digits can be negative: -2, -1, 0, 1, 2. A 2048-entry table gives the digit for a particular divisor and remainder. Unfortunately, 5 entries (red) were wrong. 2/9

A large table with 2048 entries in a 16 by 128 grid. Most of the entries are 0, but there are two sloped bands of 1's, and two sloped bands of 2's. Five cells with 0's are highlighted in red. The axes are labeled in binary fractions. The X axis is from 1 to 2 and the Y axis is from -8 to 8.
Zooming in on the table shows two of the bad entries highlighted in red.
Ken Shirriff

The table is stored in a circuit called a PLA (Programmable Logic Array). A PLA stores logic equations in two grids of transistors: the "AND plane" and the "OR plane". Logic equations are defined by putting a transistor (or not) at each grid point. This is much more compact than a ROM: 112 rows instead of 2048. 3/9

A closeup die image showing the PLA (Programmable Logic Array). I removed the metal layers to reveal the silicon transistors underneath, which appear as dark brown regions. The image is labeled showing the AND plane transistors in a grid, the OR plane transistors in a smaller grid, and driver circuitry in the middle.
Andrew Zonenberg

@kenshirriff I kinda want to see if you can patch the fdiv bug with a FIB edit now...

Aaron Sawdey, Ph.D.

@azonenberg @kenshirriff The question I had was, if those 16 entries had been specified correctly in the input to the code that derived the PLA equations ... would that still have fit in the same size (112 rows) of PLA? If not, you'd need more than a FIB to fix this.

Andrew Zonenberg

@acsawdey @kenshirriff Yep, that's exactly the question. How extensive the edits are.

Aaron Sawdey, Ph.D.

@azonenberg @kenshirriff hadn't considered that, yeah maybe it fits but you have to change some large percentage of the logic terms.

Andrew Zonenberg

@acsawdey @kenshirriff The other thing is, you can't FIB a transistor into being.

It's easy (ish) to FIB a metal rom in either direction, and to delete a transistor in an active-programmed ROM.

But you can't make new ones.

Ken Shirriff

@azonenberg I'd have to study the PLA equations carefully to see if zapping a few transistors would expand the "2" region enough to cover the missing cells. Without looking, I'd give it 50-50 odds of working since it depends on the exact bit patterns.

Ken Shirriff

@azonenberg I did some analysis and yes, you could patch the fdiv bug with about 6 FIB edits. By removing transistors, you can expand existing PLA terms to cover the missing table entries. What makes it work is that the unused table entries don't need to be 0, so you have a lot of flexibility. If you needed to change just the bad entries, you'd be stuck.

Ken Shirriff

I studied the transistor grids under a microscope and extracted the pattern. From this, I reverse-engineered the lookup table for division. The photos show a small part of the grids. A transistor is formed by a polysilicon line crossing doped silicon. No crossing, no transistor. 4/9

A closeup of the left side of the PLA showing the "OR plane". Horizontal lines of doped silicon and polysilicon are tightly packed. If the silicon is extended to pass under the polysilicon, a transistor is formed. The transistor, if turned on, will pull one of the output lines to ground. In this photo, I removed the metal layers so the output lines are indicated with vertical lines.
A closeup of the AND plane on the right side of the PLA. I partially removed the bottom metal layer, so horizontal metal lines are visible in regions. The circuit is arranged at a grid with transistors at some points but not others.
Ken Shirriff

Smart mathematicians figured out Pentium's division algorithm and the missing entries in 1995 by examining the pattern of errors. But I can confirm it in silicon. Moreover, I see 16 missing entries in the table, not just 5, but 11 of them don't cause errors due to luck. 5/9

Ken Shirriff

Intel claimed the bug was due to an error in a script to download the entries into the PLA. But due to the 16 missing entries, I think they made a mathematical error in constructing the table, misjudging the effect of a 7-bit adder. Here's the adder, just above the PLA. 6/9

A closeup of the adder and test circuitry just above the division PLA. I removed the metal layers to show the silicon and polysilicon. Transistors are visible as dark regions. The circuitry is mostly organized as repeating blocks, one for each bit. At the top are 8 blocks for the 8 bit adder's sum, generate, and propagate signals. (Only 7 bits of the adder are used.) Below, complex carry lookahead circuitry computes carries in parallel to make addition fast. Below that, 8 XOR gates apply the carries. Next, multiplexers select values for testing, fed into an 11-bit shift register (LFSR) and a 13-bit shift register to test the PLA. At the bottom, larger transistors (including bipolar ones) implement drivers to send signals throughout the adder and to the rest of the processor.
Ken Shirriff

You'd expect that Intel fixed the problem by adding the 5 missing entries. Instead, they filled *all* the unused entries with 2's. This made the table easier to store in a PLA, shrinking it by 1/3. The fixed PLA has lots of unused rows at the bottom. 7/9

A closeup of the PLA circuit for the fixed Pentium showing numerous unused rows at the bottom.
Ken Shirriff

Intel said the FDIV bug was unimportant, but the public disagreed. Newspapers and TV discussed the bug. Intel claimed the bug would happen every 27,000 years; IBM said every 24 days and stopped selling Pentiums. Intel gave in and replaced Pentiums at a cost of $475 million. 8/9

Screenshot of a New York Times article in the front of the business section titled "Flaw Undermines Accuracy of Pentium Chips."
Ken Shirriff

I hope to have a blog post with more details on the Pentium FDIV bug soon. Until then, you can read about the Pentium Navajo rug: oldbytes.space/@kenshirriff/11
9/9

Ted Spence

@kenshirriff fascinating! Love to hear about the PLA space saving techniques.

Mark T. Tomczak

@kenshirriff I remember this happening.

There was this odd little movie Intel put together that was advertainment for the whole project; for some reason, I saw it at the local science museum on the big IMAX screen when I was, what, eight?

The plot, hilariously, revolved around aliens trying to disrupt human technological progress by... Messing with the chip blueprint before it's fabricated. They're caught out by the hero-kids who save the day.

We always thought it was a wild coincidence that IRL the chip went into production with a significant design flaw analogous to the one in the fiction.

@kenshirriff I remember this happening.

There was this odd little movie Intel put together that was advertainment for the whole project; for some reason, I saw it at the local science museum on the big IMAX screen when I was, what, eight?

The plot, hilariously, revolved around aliens trying to disrupt human technological progress by... Messing with the chip blueprint before it's fabricated. They're caught out by the hero-kids who save the day.

Jo

@kenshirriff

But I made that bug happen a bunch of times in Lotus 123 (I don't think I had Excel at the time) when I was a kid.

So pretty often if you tried! And I remember getting a clockspeed upgrade (60 -> 90 MHz iirc) when Intel sent us a new CPU.

Ken Shirriff

@ElsaPreme The Pentium division bug is deterministic, so you can make it happen all day long if you do a particular division. The lesser-known 386 multiplication bug, on the other hand, was a circuitry issue that depended on the voltage, frequency, and temperature, so it was unpredictable.

John Carlsen 🇺🇸🇳🇱🇪🇺

@kenshirriff

I was buying computers for the video game developer I worked at. One department used 3D Studio, and we saw the effects of the Pentium defect clearly on the screen.

At first Intel downplayed the problem, saying nobody would be affected. Then they said they'd replace CPUs only for affected customers. Ultimately, everyone could get a replacement.

Fortunately, we were in Austin and I had been buying from Dell, which dispatched someone to our session office to replace our Pentiums.

Years later, I had interviewed a job candidate who had been at Intel when the problem occurred. He described that someone simply made a mistake, but the person assigned to check their work neglected to do the job, and the manager above neglected to make sure it was done. Apparently the person who made the honest mistake was spared, but the checker and a line of managers to nearly the top were all fired for dereliction of duty.

@kenshirriff

I was buying computers for the video game developer I worked at. One department used 3D Studio, and we saw the effects of the Pentium defect clearly on the screen.

At first Intel downplayed the problem, saying nobody would be affected. Then they said they'd replace CPUs only for affected customers. Ultimately, everyone could get a replacement.

Dr. Juande Santander-Vela

@johnlogic @kenshirriff that was surprisingly just for a big corporation, if true!

John Carlsen 🇺🇸🇳🇱🇪🇺

@juandesant @kenshirriff

Yes; I was impressed with this interviewee's story alleging that Intel had had an internal lightning strike.

Ken Shirriff

@johnlogic I've talked with a few people who worked on the Pentium and I don't think anyone got fired over it. In "The Pentium Chronicles", the error is blamed on a flawed formal proof that misled the testers into thinking a change was safe.

penguin42

@kenshirriff I can imagine they didn't want to move any other block, given that may have meant relaying stuff out and having to do some timing checking etc, so expanding the PLA might have been hard

James Just James

@kenshirriff Fascinating details, thanks! Is there any chance there could have been an intentional reason to create this bug? For example, would it weaken any encryption algorithms at the time, make the chips cheaper to produce, make money on stock shorts for the uncovered failure or any other scenario that was on purpose?

Solarbird :flag_cascadia:

@kenshirriff oooooo that's a neat bit of trivia

Cool work, thanks for sharing :D

Brett Wilson

@kenshirriff I went to a talk c. 2003 from a sr. Intel person who gave a description I have not heard anywhere else:

There was a die size push and somebody said "I can make the divider smaller and here is a mathematical proof that it's correct." People were so impressed by the proof they didn't notice that it was wrong and didn't care that the space savings were inconsequential (he described it as "taking out Missouri doesn't make the US smaller" 😂).

This meshes nicely with you missing entries.

John Carlsen 🇺🇸🇳🇱🇪🇺

@kenshirriff this reminds me of some old HP PCBs I once bought at a surplus shop. They had rows and columns of traces on top and bottom, and diodes placed to form a ROM pattern.

On the semiconductor, I would expect that these would be MOS transistors each with one side connected to its base, making each effectively a diode.

William D. Jones

@kenshirriff I implemented restoring/non-restoring dividers earlier this year for an HDL library.

They are both naive impls; the restoring divider is better on Just About Every metric. But the SRT algorithm is based on the non-restoring division by extending quotient digits from (-1, 1) to (-2, -1, 0, 1, 2), among other choices. So I keep the non-restoring divider around so I can use it as a base for whenever I implement SRT.

See here for derivations if interested: smolarith.readthedocs.io/en/la

@kenshirriff I implemented restoring/non-restoring dividers earlier this year for an HDL library.

They are both naive impls; the restoring divider is better on Just About Every metric. But the SRT algorithm is based on the non-restoring division by extending quotient digits from (-1, 1) to (-2, -1, 0, 1, 2), among other choices. So I keep the non-restoring divider around so I can use it as a base for whenever I implement SRT.

Robin Green

@cr1901 @kenshirriff
I'd recommend getting the book "Digital Arithmetic" by Ercegovac et al. for a great chapter on high radix division/sqrt that goes through radix-2, radix-4 and radix-8 quotient digit selection table generation using P-D diagrams, fully worked with the quotient in both adder and carry-save form (slightly different tables).

William D. Jones

@fatlimey @kenshirriff That will have to wait until I re-derive how to do square roots by hand again. For the 4th time.

Would be nice to commit it to memory and my ego refuses to look it up :D.

Robin Green

@cr1901 @kenshirriff The way to think about it is the quotient is trying to be the square of your current output, and the difference between that and your input guides selection of your next digit.

Val Packett 🧉

@kenshirriff omg flag of argentina !! But with extra green bars 0.o

linear cannon

@kenshirriff@oldbytes.space huh, neat! i note that this looks similar to (but not quite the same as) the balanced base representation which is found sometimes in algorithms used in prime number hunting to perform arithmetic on large numbers

base 10 in the balanced representation would instead have digits from -5 to -4, for example. my understanding is that this reduces the likelihood of having to deal with cascading carry operations, which saves time

Clark Breyman (he/him)

@kenshirriff - How does one acquire the skills to even start doing this? How does one then use those skills to be able to afford to do this :)?

Ken Shirriff

@clark Learning to do this is mostly a matter of patience and reading old VLSI books. Also, you need a metallurgical microscope, which shines light down through the lens. A regular biological microscope won't work since the light comes from below.

DougMerritt (log😅 = 💧log😄)

@kenshirriff @clark
Ken no doubt has a long list of "old VLSI books" to recommend, but undoubtedly one should start with the extremely accessible (IMHO) classic by Mead and Conway:

Mead–Conway VLSI chip design revolution
en.wikipedia.org/wiki/Mead%E2%

"Introduction to VLSI Systems" by Carver Mead, Lynn Conway; 1979 (or 1980?)
ISBN-10: 0201043580 ISBN-13: 978-0201043587
amazon.com/Introduction-VLSI-S

Archive.org borrow-able online copy
archive.org/details/introducti

I imagine it's on the other usual online book sources as well.

@kenshirriff @clark
Ken no doubt has a long list of "old VLSI books" to recommend, but undoubtedly one should start with the extremely accessible (IMHO) classic by Mead and Conway:

Mead–Conway VLSI chip design revolution
en.wikipedia.org/wiki/Mead%E2%

"Introduction to VLSI Systems" by Carver Mead, Lynn Conway; 1979 (or 1980?)
ISBN-10: 0201043580 ISBN-13: 978-0201043587
amazon.com/Introduction-VLSI-S

curved-ruler

@kenshirriff
Maybe this was the origin of the joke

An Intel and a Motorola chip talks to each other
Motorola: What is 3*6?
Intel: 47
Motorola: Wrong.
Intel: But I was fast, wasn't I?

Anders Norén

@curved_ruler
How many Intel engineers does it take to change a light bulb? 1.9999999875.
@kenshirriff

Simon Tatham

@kenshirriff at university I had an account on one of the first dual-processor Linux boxes. Of its CPUs, one had that FDIV bug, and one didn't.

That meant a userland process could detect when it was migrated between CPUs, by doing a hardware division operation that the two CPUs would answer differently.

The admin of the machine spent a lot of time in ytalks with Linus Torvalds, because Linux was just starting to develop its SMP support at the time, and he could provide useful statistical data!

Third spruce tree on the left

@simontatham @kenshirriff this whole thread was fascinating but this tidbit is the icing on the cake.

Stylus

@kenshirriff As always, thanks for covering such topics!

Is there any implementation of the buggy fdiv algorithm in a high level language, so that you could run a numeric program and get the "buggy pentium" result rather than the correct result?

When I hear that 5 of 2048 table entries are wrong, that makes me expect that 5 out of 2048 results would be wrong if the table is consulted just once per division (and isn't it consulted multiple times?)

Is there an intuitive way to understand why the proportion of wrong results (about 1 in 8.77 billion double precision operand pairs according to one source) is so small relative to 5/2048?

@kenshirriff As always, thanks for covering such topics!

Is there any implementation of the buggy fdiv algorithm in a high level language, so that you could run a numeric program and get the "buggy pentium" result rather than the correct result?

When I hear that 5 of 2048 table entries are wrong, that makes me expect that 5 out of 2048 results would be wrong if the table is consulted just once per division (and isn't it consulted multiple times?)

Ken Shirriff

@stylus It's kind of complicated and depends on a very unlikely sequence of carries. The bad cells are almost but not quite impossible to reach. The divider uses a carry save adder, which holds the carry bits instead of propagating them . If these bits are just right, you hit the bug.

John Francis

@kenshirriff I received a nice 486Dx2 at the start of uni, and had the luck of being a poor student during the Pentium's prime years. By the time I was working and had money for a new computer, the Pentium II was out, and I was happily edge-slotting it into a new system.

Andy Kaplan-Myrth :mapleleaf:

@kenshirriff At a textiles exhibit at the National Gallery of Canada in Ottawa last weekend, I saw this piece by Navajo/Dine artist Marilou Schultz, who was contracted by Intel to weave a replica of the Pentium CPU in 1994.

Full transcript of the info card is in the Alt text.

It was intended for a publicity campaign in which the Silicon
Valley company proposed — not for the first time — affinities between Native American aesthetics and advanced technologies.

Photo of a woven wallhanging, mostly shades of browns and ochres, made of seemingly random vertical and horizontal lines, with a dotted border.

The info card beside it said:

MARILOU SCHULTZ

Navajo/Diné, b. 1954

Replica of a Chip 1994
wool

In 1994, the Intel Corporation commissioned Schultz to weave a replica of their Pentium microprocessor using the traditional techniques she learned as a child on the Navajo/Diné reservation. It was intended for a publicity campaign in which the Silicon Valley company proposed — not for the first time — affinities between Native American aesthetics and advanced technologies. Specifically, Intel aligned the expertise of the skilled textile makers with the dexterity of the Indigenous female workforce it planned to hire to assemble circuit boards in a factory newly constructed on Navajo/Diné land.

American Indian Science and Engineering Society
Chip35

@kenshirriff I think I wrote an ML program com file using debug to test for that in less than 10 bytes. The windows patch couldn't stop it.

Moise

@kenshirriff i dont understand half of it but its rlly impressive :blobcatspace:

Go Up