Intel launched the Pentium processor in 1993. Unfortunately,...

Intel launched the Pentium processor in 1993. Unfortunately, dividing sometimes gave a slightly wrong answer, the famous FDIV bug. Replacing the faulty chips cost Intel $475 million. I reverse-engineered the circuitry and can explain the bug. 1/9

A die photo of the Pentium processor with the main functional blocks labeled including the caches, instruction fetch and decode, integer execution, and floating point. The image consists of complex patterns of rectangular regions in reddish and brownish colors. The image zooms in on a small part of the floating point unit giving a detail of an adder and PLA circuit.

Like 6 December at 16:48 | Open on oldbytes.space

54 comments

Ken Shirriff

The Pentium uses a division algorithm called SRT. It generates two bits at a time, making division twice as fast. SRT's secret is quotient digits can be negative: -2, -1, 0, 1, 2. A 2048-entry table gives the digit for a particular divisor and remainder. Unfortunately, 5 entries (red) were wrong. 2/9

A large table with 2048 entries in a 16 by 128 grid. Most of the entries are 0, but there are two sloped bands of 1's, and two sloped bands of 2's. Five cells with 0's are highlighted in red. The axes are labeled in binary fractions. The X axis is from 1 to 2 and the Y axis is from -8 to 8.

Zooming in on the table shows two of the bad entries highlighted in red.

6 December at 16:52 | Open on oldbytes.space

Ken Shirriff

The table is stored in a circuit called a PLA (Programmable Logic Array). A PLA stores logic equations in two grids of transistors: the "AND plane" and the "OR plane". Logic equations are defined by putting a transistor (or not) at each grid point. This is much more compact than a ROM: 112 rows instead of 2048. 3/9

A closeup die image showing the PLA (Programmable Logic Array). I removed the metal layers to reveal the silicon transistors underneath, which appear as dark brown regions. The image is labeled showing the AND plane transistors in a grid, the OR plane transistors in a smaller grid, and driver circuitry in the middle.

6 December at 16:56 | Open on oldbytes.space

Andrew Zonenberg

@kenshirriff I kinda want to see if you can patch the fdiv bug with a FIB edit now...

6 December at 16:58 | Open on ioc.exchange

Aaron Sawdey, Ph.D.

@azonenberg @kenshirriff The question I had was, if those 16 entries had been specified correctly in the input to the code that derived the PLA equations ... would that still have fit in the same size (112 rows) of PLA? If not, you'd need more than a FIB to fix this.

6 December at 17:05 | Open on fosstodon.org

Andrew Zonenberg

@acsawdey @kenshirriff Yep, that's exactly the question. How extensive the edits are.

6 December at 17:10 | Open on ioc.exchange

Aaron Sawdey, Ph.D.

@azonenberg @kenshirriff hadn't considered that, yeah maybe it fits but you have to change some large percentage of the logic terms.

6 December at 17:10 | Open on fosstodon.org

Andrew Zonenberg

@acsawdey @kenshirriff The other thing is, you can't FIB a transistor into being.

It's easy (ish) to FIB a metal rom in either direction, and to delete a transistor in an active-programmed ROM.

But you can't make new ones.

6 December at 17:13 | Open on ioc.exchange

Ken Shirriff

@azonenberg I'd have to study the PLA equations carefully to see if zapping a few transistors would expand the "2" region enough to cover the missing cells. Without looking, I'd give it 50-50 odds of working since it depends on the exact bit patterns.

6 December at 17:17 | Open on oldbytes.space

Ken Shirriff

@azonenberg I did some analysis and yes, you could patch the fdiv bug with about 6 FIB edits. By removing transistors, you can expand existing PLA terms to cover the missing table entries. What makes it work is that the unused table entries don't need to be 0, so you have a lot of flexibility. If you needed to change just the bad entries, you'd be stuck.

16 December at 21:46 | Open on oldbytes.space

Ken Shirriff

I studied the transistor grids under a microscope and extracted the pattern. From this, I reverse-engineered the lookup table for division. The photos show a small part of the grids. A transistor is formed by a polysilicon line crossing doped silicon. No crossing, no transistor. 4/9

A closeup of the left side of the PLA showing the "OR plane". Horizontal lines of doped silicon and polysilicon are tightly packed. If the silicon is extended to pass under the polysilicon, a transistor is formed. The transistor, if turned on, will pull one of the output lines to ground. In this photo, I removed the metal layers so the output lines are indicated with vertical lines.

A closeup of the AND plane on the right side of the PLA. I partially removed the bottom metal layer, so horizontal metal lines are visible in regions. The circuit is arranged at a grid with transistors at some points but not others.

6 December at 17:00 | Open on oldbytes.space

Ken Shirriff

Smart mathematicians figured out Pentium's division algorithm and the missing entries in 1995 by examining the pattern of errors. But I can confirm it in silicon. Moreover, I see 16 missing entries in the table, not just 5, but 11 of them don't cause errors due to luck. 5/9

6 December at 17:00 | Open on oldbytes.space

Ken Shirriff

Intel claimed the bug was due to an error in a script to download the entries into the PLA. But due to the 16 missing entries, I think they made a mathematical error in constructing the table, misjudging the effect of a 7-bit adder. Here's the adder, just above the PLA. 6/9

A closeup of the adder and test circuitry just above the division PLA. I removed the metal layers to show the silicon and polysilicon. Transistors are visible as dark regions. The circuitry is mostly organized as repeating blocks, one for each bit. At the top are 8 blocks for the 8 bit adder's sum, generate, and propagate signals. (Only 7 bits of the adder are used.) Below, complex carry lookahead circuitry computes carries in parallel to make addition fast. Below that, 8 XOR gates apply the carries. Next, multiplexers select values for testing, fed into an 11-bit shift register (LFSR) and a 13-bit shift register to test the PLA. At the bottom, larger transistors (including bipolar ones) implement drivers to send signals throughout the adder and to the rest of the processor.

6 December at 17:05 | Open on oldbytes.space

Ken Shirriff

You'd expect that Intel fixed the problem by adding the 5 missing entries. Instead, they filled *all* the unused entries with 2's. This made the table easier to store in a PLA, shrinking it by 1/3. The fixed PLA has lots of unused rows at the bottom. 7/9

A closeup of the PLA circuit for the fixed Pentium showing numerous unused rows at the bottom.

6 December at 17:08 | Open on oldbytes.space

Ken Shirriff

Intel said the FDIV bug was unimportant, but the public disagreed. Newspapers and TV discussed the bug. Intel claimed the bug would happen every 27,000 years; IBM said every 24 days and stopped selling Pentiums. Intel gave in and replaced Pentiums at a cost of $475 million. 8/9

Screenshot of a New York Times article in the front of the business section titled "Flaw Undermines Accuracy of Pentium Chips."

6 December at 17:10 | Open on oldbytes.space

Ken Shirriff

I hope to have a blog post with more details on the Pentium FDIV bug soon. Until then, you can read about the Pentium Navajo rug: https://oldbytes.space/@kenshirriff/113063183366751314
9/9

6 December at 17:13 | Open on oldbytes.space

Ted Spence

@kenshirriff fascinating! Love to hear about the PLA space saving techniques.

6 December at 17:17 | Open on indieweb.social

Mark T. Tomczak

@kenshirriff I remember this happening.

There was this odd little movie Intel put together that was advertainment for the whole project; for some reason, I saw it at the local science museum on the big IMAX screen when I was, what, eight?

The plot, hilariously, revolved around aliens trying to disrupt human technological progress by... Messing with the chip blueprint before it's fabricated. They're caught out by the hero-kids who save the day.

We always thought it was a wild coincidence that IRL the chip went into production with a significant design flaw analogous to the one in the fiction.

@kenshirriff I remember this happening.

There was this odd little movie Intel put together that was advertainment for the whole project; for some reason, I saw it at the local science museum on the big IMAX screen when I was, what, eight?

Expand text...

6 December at 20:18 | Open on mastodon.fixermark.com

bijram

@kenshirriff Impressive work!

9 December at 7:01 | Open on graz.social

@kenshirriff

But I made that bug happen a bunch of times in Lotus 123 (I don't think I had Excel at the time) when I was a kid.

So pretty often if you tried! And I remember getting a clockspeed upgrade (60 -> 90 MHz iirc) when Intel sent us a new CPU.

6 December at 17:19 | Open on chaosfem.tw

Ken Shirriff

@ElsaPreme The Pentium division bug is deterministic, so you can make it happen all day long if you do a particular division. The lesser-known 386 multiplication bug, on the other hand, was a circuitry issue that depended on the voltage, frequency, and temperature, so it was unpredictable.

6 December at 17:36 | Open on oldbytes.space

John Carlsen 🇺🇸🇳🇱🇪🇺

@kenshirriff

I was buying computers for the video game developer I worked at. One department used 3D Studio, and we saw the effects of the Pentium defect clearly on the screen.

At first Intel downplayed the problem, saying nobody would be affected. Then they said they'd replace CPUs only for affected customers. Ultimately, everyone could get a replacement.

Fortunately, we were in Austin and I had been buying from Dell, which dispatched someone to our session office to replace our Pentiums.

Years later, I had interviewed a job candidate who had been at Intel when the problem occurred. He described that someone simply made a mistake, but the person assigned to check their work neglected to do the job, and the manager above neglected to make sure it was done. Apparently the person who made the honest mistake was spared, but the checker and a line of managers to nearly the top were all fired for dereliction of duty.

@kenshirriff

I was buying computers for the video game developer I worked at. One department used 3D Studio, and we saw the effects of the Pentium defect clearly on the screen.

At first Intel downplayed the problem, saying nobody would be affected. Then they said they'd replace CPUs only for affected customers. Ultimately, everyone could get a replacement.

Expand text...

6 December at 17:40 | Open on sfba.social

Dr. Juande Santander-Vela

@johnlogic @kenshirriff that was surprisingly just for a big corporation, if true!

6 December at 17:55 | Open on astrodon.social

John Carlsen 🇺🇸🇳🇱🇪🇺

@juandesant @kenshirriff

Yes; I was impressed with this interviewee's story alleging that Intel had had an internal lightning strike.

6 December at 17:59 | Open on sfba.social

Kevin Karhan :verified:

@johnlogic @kenshirriff at least some #conseauenves were taken then...

6 December at 18:03 | Open on infosec.space

Ken Shirriff

@johnlogic I've talked with a few people who worked on the Pentium and I don't think anyone got fired over it. In "The Pentium Chronicles", the error is blamed on a flawed formal proof that misled the testers into thinking a change was safe.

6 December at 18:55 | Open on oldbytes.space

penguin42

@kenshirriff I can imagine they didn't want to move any other block, given that may have meant relaying stuff out and having to do some timing checking etc, so expanding the PLA might have been hard

6 December at 17:39 | Open on mastodon.org.uk

James Just James

@kenshirriff Fascinating details, thanks! Is there any chance there could have been an intentional reason to create this bug? For example, would it weaken any encryption algorithms at the time, make the chips cheaper to produce, make money on stock shorts for the uncovered failure or any other scenario that was on purpose?

7 December at 9:46 | Open on mastodon.social

Solarbird :flag_cascadia:

@kenshirriff oooooo that's a neat bit of trivia

Cool work, thanks for sharing :D

7 December at 1:43 | Open on mastodon.murkworks.net

Brett Wilson

@kenshirriff I went to a talk c. 2003 from a sr. Intel person who gave a description I have not heard anywhere else:

There was a die size push and somebody said "I can make the divider smaller and here is a mathematical proof that it's correct." People were so impressed by the proof they didn't notice that it was wrong and didn't care that the space savings were inconsequential (he described it as "taking out Missouri doesn't make the US smaller" 😂).

This meshes nicely with you missing entries.

11 December at 0:08 | Open on sfba.social

John Carlsen 🇺🇸🇳🇱🇪🇺

@kenshirriff this reminds me of some old HP PCBs I once bought at a surplus shop. They had rows and columns of traces on top and bottom, and diodes placed to form a ROM pattern.

On the semiconductor, I would expect that these would be MOS transistors each with one side connected to its base, making each effectively a diode.

6 December at 17:47 | Open on sfba.social

William D. Jones

@kenshirriff I implemented restoring/non-restoring dividers earlier this year for an HDL library.

They are both naive impls; the restoring divider is better on Just About Every metric. But the SRT algorithm is based on the non-restoring division by extending quotient digits from (-1, 1) to (-2, -1, 0, 1, 2), among other choices. So I keep the non-restoring divider around so I can use it as a base for whenever I implement SRT.

See here for derivations if interested: https://smolarith.readthedocs.io/en/latest/impl.html#division

@kenshirriff I implemented restoring/non-restoring dividers earlier this year for an HDL library.

Expand text...

6 December at 16:57 | Open on mastodon.social

Robin Green

@cr1901 @kenshirriff
I'd recommend getting the book "Digital Arithmetic" by Ercegovac et al. for a great chapter on high radix division/sqrt that goes through radix-2, radix-4 and radix-8 quotient digit selection table generation using P-D diagrams, fully worked with the quotient in both adder and carry-save form (slightly different tables).

6 December at 21:31 | Open on mastodon.gamedev.place

William D. Jones

@fatlimey @kenshirriff That will have to wait until I re-derive how to do square roots by hand again. For the 4th time.

Would be nice to commit it to memory and my ego refuses to look it up :D.

6 December at 22:15 | Open on mastodon.social

Robin Green

@cr1901 @kenshirriff The way to think about it is the quotient is trying to be the square of your current output, and the difference between that and your input guides selection of your next digit.

6 December at 23:04 | Open on mastodon.gamedev.place

Val Packett 🧉

@kenshirriff omg flag of argentina !! But with extra green bars 0.o

6 December at 17:48 | Open on social.treehouse.systems

linear cannon

@kenshirriff@oldbytes.space huh, neat! i note that this looks similar to (but not quite the same as) the balanced base representation which is found sometimes in algorithms used in prime number hunting to perform arithmetic on large numbers

base 10 in the balanced representation would instead have digits from -5 to -4, for example. my understanding is that this reduces the likelihood of having to deal with cascading carry operations, which saves time

6 December at 18:01 | Open on nya.social

GhostOnTheHalfShell

@kenshirriff

It’s all about the Pentiums baby.

6 December at 20:05 | Open on masto.ai

europlus :autisminf:

@kenshirriff ha! Literal edge cases! 🤣

6 December at 20:50 | Open on social.europlus.zone

Clark Breyman (he/him)

@kenshirriff - How does one acquire the skills to even start doing this? How does one then use those skills to be able to afford to do this :)?

6 December at 17:06 | Open on hachyderm.io

Ken Shirriff

@clark Learning to do this is mostly a matter of patience and reading old VLSI books. Also, you need a metallurgical microscope, which shines light down through the lens. A regular biological microscope won't work since the light comes from below.

6 December at 17:19 | Open on oldbytes.space

DougMerritt (log😅 = 💧log😄)

@kenshirriff @clark
Ken no doubt has a long list of "old VLSI books" to recommend, but undoubtedly one should start with the extremely accessible (IMHO) classic by Mead and Conway:

Mead–Conway VLSI chip design revolution
https://en.wikipedia.org/wiki/Mead%E2%80%93Conway_VLSI_chip_design_revolution

"Introduction to VLSI Systems" by Carver Mead, Lynn Conway; 1979 (or 1980?)
ISBN-10: 0201043580 ISBN-13: 978-0201043587
https://www.amazon.com/Introduction-VLSI-Systems-Carver-Mead/dp/0201043580

Archive.org borrow-able online copy
https://archive.org/details/introductiontovl00mead

I imagine it's on the other usual online book sources as well.

@kenshirriff @clark
Ken no doubt has a long list of "old VLSI books" to recommend, but undoubtedly one should start with the extremely accessible (IMHO) classic by Mead and Conway:

Mead–Conway VLSI chip design revolution
https://en.wikipedia.org/wiki/Mead%E2%80%93Conway_VLSI_chip_design_revolution

"Introduction to VLSI Systems" by Carver Mead, Lynn Conway; 1979 (or 1980?)
ISBN-10: 0201043580 ISBN-13: 978-0201043587
https://www.amazon.com/Introduction-VLSI-Systems-Carver-Mead/dp/0201043580

Expand text...

6 December at 17:53 | Open on mathstodon.xyz

curved-ruler

@kenshirriff
Maybe this was the origin of the joke

An Intel and a Motorola chip talks to each other
Motorola: What is 3*6?
Intel: 47
Motorola: Wrong.
Intel: But I was fast, wasn't I?

6 December at 17:44 | Open on mastodon.gamedev.place

Anders Norén

@curved_ruler
How many Intel engineers does it take to change a light bulb? 1.9999999875.
@kenshirriff

6 December at 22:58 | Open on mastodon.nu

Simon Tatham

@kenshirriff at university I had an account on one of the first dual-processor Linux boxes. Of its CPUs, one had that FDIV bug, and one didn't.

That meant a userland process could detect when it was migrated between CPUs, by doing a hardware division operation that the two CPUs would answer differently.

The admin of the machine spent a lot of time in ytalks with Linus Torvalds, because Linux was just starting to develop its SMP support at the time, and he could provide useful statistical data!

6 December at 19:47 | Open on hachyderm.io

Third spruce tree on the left

@simontatham @kenshirriff this whole thread was fascinating but this tidbit is the icing on the cake.

6 December at 20:01 | Open on mas.to

Stylus

@kenshirriff As always, thanks for covering such topics!

Is there any implementation of the buggy fdiv algorithm in a high level language, so that you could run a numeric program and get the "buggy pentium" result rather than the correct result?

When I hear that 5 of 2048 table entries are wrong, that makes me expect that 5 out of 2048 results would be wrong if the table is consulted just once per division (and isn't it consulted multiple times?)

Is there an intuitive way to understand why the proportion of wrong results (about 1 in 8.77 billion double precision operand pairs according to one source) is so small relative to 5/2048?

@kenshirriff As always, thanks for covering such topics!

Is there any implementation of the buggy fdiv algorithm in a high level language, so that you could run a numeric program and get the "buggy pentium" result rather than the correct result?

Expand text...

6 December at 20:07 | Open on social.afront.org

Ken Shirriff

@stylus It's kind of complicated and depends on a very unlikely sequence of carries. The bad cells are almost but not quite impossible to reach. The divider uses a carry save adder, which holds the carry bits instead of propagating them . If these bits are just right, you hit the bug.

6 December at 20:21 | Open on oldbytes.space

John Francis

@kenshirriff I received a nice 486Dx2 at the start of uni, and had the luck of being a poor student during the Pentium's prime years. By the time I was working and had money for a new computer, the Pentium II was out, and I was happily edge-slotting it into a new system.

6 December at 21:06 | Open on cosocial.ca

SpaceLifeForm

@kenshirriff

Intel was just getting started.

https://en.m.wikipedia.org/wiki/Pentium_F00F_bug

6 December at 22:09 | Open on infosec.exchange

Andy Kaplan-Myrth :mapleleaf:

@kenshirriff At a textiles exhibit at the National Gallery of Canada in Ottawa last weekend, I saw this piece by Navajo/Dine artist Marilou Schultz, who was contracted by Intel to weave a replica of the Pentium CPU in 1994.

Full transcript of the info card is in the Alt text.

It was intended for a publicity campaign in which the Silicon
Valley company proposed — not for the first time — affinities between Native American aesthetics and advanced technologies.