Inside Intel's first product: the 3101 RAM chip held just 64 bits
This article looks inside the 3101 chip and explains how it works. I received two 3101 chips from Evan Wasserman and used a microscope to take photos of the tiny silicon die inside.4 Around the outside of the die, sixteen black bond wires connect pads on the die to the chip's external pins. The die itself consists of silicon circuitry connected by a metal layer on top, which appears golden in the photo. The thick metal lines through the middle of the chip power the chip. The silicon circuitry has a grayish-purple color, but it largely covered by the metal layer. Most of the chip contains a repeated pattern: this is the 16x4 array of storage cells. In the upper left corner of the chip, the digits "3101" in metal identify the chip, but "Intel" is not to be found.
Die photo of the Intel 3101 64-bit RAM chip. Click for a larger image.
Overview of the chip
The 3101 chip is controlled through its 16 external pins. To select one of the chip's 16 words of memory, the address in binary is fed into the chip through the four address pins (A0 to A3). Memory is written by providing the 4-bit value on the data input pins (D1 to D4). Four data output pins (O1 to O4) are used to read memory; these pins are inverted as indicated by the overbar. The chip has two control inputs. The chip select pin (CS) enables or disables the chip. The write enable pin (WE) selects between reading or writing the memory. The chip is powered with 5 volts across the Vcc and ground pins.
The diagram below shows how the key components of the 3101 are arranged on the die. The RAM storage cells are arranged as 16 rows of 4 bits. Each row stores a word, with bits D1 and D2 on the left and D3 and D4 on the right. The address decode logic in the middle selects which row of storage is active, based on the address signals coming from the address drivers at the top. At the bottom, the read/write drivers provide the interface between the storage cells and the data in and out pins.
Block diagram of the 3101 RAM chip.
Transistors
Transistors are the key components in a chip. The 3101 uses NPN bipolar transistors, different from the MOS transistors used in modern memory chips. The diagram below shows one of the transistors in the 3101 as it appears on the die. The slightly different tints in the silicon indicate regions that have been doped to form N and P type silicon with different semiconductor properties. The cross-section diagram illustrates the internal structure of the transistor. On top (black) are the metal contacts for the collector (C), emitter (E), and base (B). Underneath, the silicon has been doped to form the N and P regions that make up the transistor.
A key innovation of the 3101 was using Schottky transistors (details), which made the 3101 almost twice as fast as other memory chips.5 In the cross section, note that the base's metal contact touches both the P and N regions. You might think this shorts the two regions together, but instead a Schottky diode is formed where the metal contacts the N layer.6
The structure of an NPN Schottky transistor inside the Intel 3101 chip.
The 3101 also used many multiple-emitter transistors. While a multiple-emitter transistors may seem strange, they are common in bipolar integrated circuits, especially TTL logic chips. A multiple-emitter transistor simply has several emitter regions embedded in the base region. The die photo below shows one of these transistors with the collector on the left, followed by the base and two emitters.
A multiple-emitter transistor from the Intel 3101 chip.
Driving the data output pins requires larger, high-current transistors. The image below shows one of these transistors. The central rectangle is the base, surrounded by the C-shaped emitter in the middle and the large collector on the outside. Eight of these high-current transistors are also used to drive the internal address select lines.
For the high-current output, the Intel 3101 chip uses larger transistors.
Diodes
While examining the 3101 chip, I was surprised by the large number of diodes on the chip. Eventually I figured out that the chip used DTL (diode-transistor logic) for most of its logic rather than TTL (transistor-transistor logic) that I was expecting. The diagram below shows one of the diodes on the chip. I believe the chip builds diodes using the standard technique of connecting an NPN transistor as a diode.
Resistors
The die photo below shows several resistors on the 3101 die. The long, narrow snaking regions of p-type silicon provide resistance. Resistors in integrated circuits are inconveniently large, but are heavily used in the 3101 for pull-up and pull-down resistors. At the right is a square resistor, which has low resistance because it is very wide.7 It is used to route a signal under the metal layer, rather than functioning a resistor per se.
Resistors inside the 3101 chip.
The static RAM cell
Now that I've explained the individual components of the chip, I'll explain how the circuitry is wired together for storage. The diagram below shows the cell for one bit of storage with the circuit diagram overlaid. Each cell consists of two multi-emitter transistors (outlined in red) and two resistors (at the top). The horizontal and vertical wiring connects cells together. This circuit forms a static RAM cell, basically a latch that can be in one of two states, storing one data bit.
Before explaining how this storage cell works, I'll explain a simpler latch circuit, below. This circuit has two transistors cross-connected so if one transistor is on, it forces the other off. In the diagram, the left transistor is on, which keeps the right transistor off, which keeps the left transistor on. Thus, the circuit will remain in this stable configuration. The opposite state—with the left transistor off and the right transistor on—is also stable. Thus, the latch has two stable configurations, allowing it to hold a 0 or a 1.
To make this circuit usable—so the bit can be read or modified—more complex transistors with two emitters are used. One emitter is used to select which cell to read or write, while the other emitter is used for the read or write data. This yields the schematic below, which matches the storage cell die photo diagram above.
Multiple storage cells are combined into a grid to form the memory memory. One word of memory consists of cells in the same row that share select lines. All the cells in a column store the same bit position; their data lines are tied together. (The bias line provides a voltage level to all cells in the memory.8 )
Note that unlike the simplified cell, the circuit above doesn't have an explicit ground connection; to be powered, it requires a low input on either the select or data/bias lines. There are three cases of interest:
- Unselected: If the negative row select line is low, current flows out through the row select line. The data and bias lines are unaffected by this cell.
- Read: If the negative row select line is higher than the data and bias lines, current will flow out the data line if the left transistor is on, and out the bias line if the right transistor is on. Thus, the state of the cell can be read by examining the current on the data line.
- Write: If the negative row select line is higher and the data and bias lines have significantly different voltages, the transistor on the lower side will switch on, forcing the cell into a particular state. This allows a 0 or 1 to be written to the cell.
Thus, by carefully manipulating the voltages on the select lines, data lines and the bias line, one row of memory can be read or written, while the other cells hold their current value without influencing the data line. The storage cell and the associated read/write circuitry are essentially analog circuits rather than digital since the select, data, and bias voltages must be carefully controlled voltages rather than logic levels.
The address decode logic
The address decode circuitry determines which row of memory cells is selected by the address lines.11 The interesting thing about this circuitry is that you can easily see how it works just by looking at the die photo. The address driver circuitry sends the four address signals along with their complements on eight metal traces through the chip. Each storage row has a four-emitter transistor. In each row you can see four black dots, which are the connections between emitters and address lines. A row will be selected if all the emitter inputs are high.9 A dot on an address line (e.g. A0) will "match" a 1, while a dot on the complemented address line (e.g. A0) will match a 0, so each row matches a unique four-bit address. In the die photo below, you can see the decoding logic counting down in binary for rows 15 down to 11;10 the remainder of the circuit follows the same pattern.
Some systems that used the 3101
The 64-bit storage capacity of the 3101 was too small for a system's main memory, but the chip had a role in many minicomputers. For example, the Burroughs D Machine was a military computer (and the source of the chips I examined). It used core memory for its main storage, but a board full of 3101 chips provided high-speed storage for its microcode. The Xerox Alto used four 3101 chips to provide 16 high-speed registers for the CPU, while the main memory used slower DRAM chips. Interdata used 3101 chips in many of its 16- and 32-bit minicomputers up until the 1980s.12
The 3101 was also used in smaller systems. The Diablo 8233 terminal used them as RAM.13 The Datapoint 2200 was a "programmable terminal" that held its processor stack in fast 3101 chips rather than the slow main memory which was built from Intel 1405 shift registers.
How I created the die photos
To get the die photos, I started with two chips that I received thanks to Evan Wasserman and John Culver. The pins on the chips had been crushed in the mail, but this didn't affect the die photos. The chips had two different lot numbers that indicate they were manufactured a few months apart. Strangely, the metal lids on the chips were different sizes and the dies were slightly different. For more information, see the CPU Shack writeup of the 3101.
Popping the metal lid off the chips was easy—just a tap with a hammer and chisel. This revealed the die inside.
Bitcoin mining on a vintage Xerox Alto: very slow at 1.5 hashes/second
Bitcoin mining on a vintage Xerox Alto computer.
The Alto was a revolutionary computer designed at Xerox PARC in 1973 to investigate personal computing. It introduced high-resolution bitmapped displays, the GUI, Ethernet and laser printers to the world, among other things. In the photo above, the Alto computer is in the lower cabinet. The black box is the 2.5 megabyte disk drive. The Alto's unusual portrait display and an early optical mouse are on top.
How Bitcoin mining works
Bitcoin, a digital currency that can be transmitted across the Internet, has attracted a lot of attention lately. The Bitcoin system can be thought of as a ledger that keeps track of who owns which bitcoins, and allows them to be transferred from one person to another. The revolutionary feature of Bitcoin is there's no central machine or authority keeping track of things. Instead, the "blockchain" is stored across thousands of machines on the Internet, and the system works with nobody in charge.
To ensure everyone agrees on which transactions are valid, Bitcoin uses a process called mining—about every 10 minutes a block of outstanding transactions is mined, which makes the block "official". Bitcoin mining is designed to take an insanely huge amount of computational effort to mine a block, so nobody can take over the mining process. Miners compete against each other, generating trillions and trillions of "hashes" until someone finds a lucky one that succeeds in mining a block. It's hard to visualize just how difficult the hashing process is: finding a valid hash is less likely than finding a single grain of sand out of all the sand on Earth.
Bitcoin mining is based on cryptography, with a "hash function" that converts a block into an essentially random hash value. If the hash starts with 17 zeros,1 the block is successfully mined and is sent into the Bitcoin network. Most of the time the hash isn't successful, so the miner will modify the block slightly and try again, over and over billions of times. About every 10 minutes someone will successfully mine a block, and the process starts over. It's kind of like a lottery, where miners keep trying until someone "wins".
As a side-effect, mining adds new bitcoins to the system. For each block mined, miners currently get 12.5 new bitcoins (currently worth about 30,000ドル) as well as fees, which encourages miners to do the hard work of mining blocks. With the possibility of receiving 30,000ドル every 10 minutes, miners invest in datacenters full of specialized mining hardware using huge amounts of electricity2 .
The diagram above shows what actually goes into a block that is mined. The yellow part is the block header (which gets hashed), and it is followed by the transactions that go into the block. Each block contains the hash of the previous block, causing all the blocks to be linked together forming the blockchain. You can see that for the block above, the hash is successful because it starts with lots of zeros: 0000000000000000e067a478024addfecdc93628978aa52d91fabd4292982a50. The "Merkle root" is a hash of all the transactions that go into the block; this ensures that none of the mined transactions can be changed. The nonce is an arbitrary number; each attempt at mining the block changes the nonce.
To summarize the mining process: you collect new Bitcoin transactions and create a header (as in the diagram above). You do the cryptographic hash of the block. If by some incredible chance the result starts with 17 zeros you send the block into the Bitcoin network and "win" 30,000ドル in bitcoin. Otherwise, you change the nonce and try again. Probably none of the nonce values will work, so you change something else in the header and start over. If someone else succeeds in mining the block, you start over with the new previous block hash and new transactions.
I've simplified a lot of details above. For in-depth information on Bitcoin and mining, see my articles Bitcoins the hard way and Bitcoin mining the hard way.
The SHA-256 hash algorithm used by Bitcoin
Next, I'll discuss the hash function used in Bitcoin, which is based on a standard cryptographic hash function called SHA-256.3 The SHA-256 algorithm is so simple you can literally do it by hand, but it manages to scramble the data entirely unpredictably. The SHA-256 hash algorithm takes input blocks of 512 bits (i.e. 64 bytes), combines the data cryptographically, and generates a 256-bit (32 byte) output.
The SHA-256 algorithm consists of a simple round repeated 64 times. The diagram below shows one round, which takes eight 4-byte inputs, A through H, performs a few operations, and generates new values for A through H. As can be seen from the diagram above, only A and E are changed in a round, while the others are just shifted over. Even so, after 64 rounds the input data will be completely scrambled, generating the unpredictable hash output.
The SHA-256 algorithm is pretty simple, about a page of pseudocode and can be easily implemented on a computer, even one as old as the Alto, using simple arithmetic and logic operations.5
The dark blue boxes mix up the values in non-linear ways that are hard to analyze cryptographically. (If you could figure out a mathematical shortcut to generate successful hashes, you could take over Bitcoin mining.) The Ch "choose" box chooses bits from F or G, based on the value of input E. The Σ "sum" boxes rotate the bits of A (or E) to form three rotated versions, and then sum them together. The Ma "majority" box looks at the bits in each position of A, B, and C, and selects 0 or 1, whichever value is in the majority. The red boxes perform 32-bit addition, generating new values for A and E. The input data enters the algorithm through the Wt values. The Kt values are constants defined for each round.4
Implementing SHA-256 in BCPL
I implemented SHA-256 in BCPL, a programming language that was a precursor to C. It's a lot like C with syntax changes, except the only type is 16-bit words. My SHA-256 code is in sha256.bcpl. The snippet below (the choose function) will give you an idea of what BCPL looks like. Each value is two words; BCPL does array access with !1 instead of [1]. Like C++, comments are indicated with a double slash. Unlike C, BCPL uses words for xor and not.
// ch := (e and f) xor ((not e) and g) ch!0 = (e!0 & f!0) xor ((not e!0) & g!0) ch!1 = (e!1 & f!1) xor ((not e!1) & g!1)
The mining is done in bitcoin.bcpl: it creates a Bitcoin header (from hardcoded values), substitutes the nonce, and calls the SHA-256 code to hash the header twice. One interesting feature of the code is the structure definition for the Bitcoin header in BCPL (below), similar to a C struct. It defines a two word field for version, 16 words for prevHash, and so forth; compare with the Bitcoin structure diagram earlier.
Interestingly, ^1,16 indicates an array with indices from 1 to 16 inclusive.
BCPL is not 0-indexed or 1-indexed, but lets you start array indices at arbitrary values.7
structure HEADER: [ version^1,2 word prevHash^1,16 word merkleRoot^1,16 word timestamp^1,2 word bits^1,2 word nonce^1,2 word ]
The line shows how structures are accessed in BCPL; it initializes one word of the header, using the slightly strange BCPL syntax.
>>HEADER sort of casts the header variable to the HEADER structure described earlier. Then .prevhash^1 accesses the first word of the prevhash field.
Also note that #13627 is an octal value; BCPL inconveniently doesn't include hex constants.6
header>>HEADER.prevHash^1 = #13627
The screenshot below shows the output as the program runs. The number on the left is each nonce in sequence as it is tested. The long hex number on the right is the resulting hash value. Each nonce results in a totally different hash value, due to the cryptographic hash algorithm.
On the Alto screen, each line shows a nonce value and the resulting hash.
Performance
The Alto can hash about 1.5 blocks per second, which is exceedingly slow by Bitcoin standards. At this speed, mining a single block on the Alto would take about 5000 times the age of the universe The electricity would cost about 2x10^16 dollars. And you'd get 12.5 bitcoins (₿12.5) worth about 30,000ドル. Obviously, mining Bitcoin on a Xerox Alto isn't a profitable venture.
In comparison, a USB stick miner performs 3.6 billion hashes per second. The Alto cost about 12,000ドル to manufacture in 1973 (about 60,000ドル in current dollars), while the stick miner costs 200ドル. The Alto used hundreds of watts, while the stick minter uses about 4 watts. The enormous difference in performance is due to huge increase in computer speed since the 1970s described by Moore's law as well as the giant speed gain from custom Bitcoin mining hardware.
The Alto wasn't a particularly fast machine, performing about 400,000 instructions per second. The Alto's instruction set lacks many of the operations you'd find on a modern processor. For instance, the SHA-256 algorithm makes heavy use of Boolean operations including exclusive-OR and OR. These are pretty basic instructions that you'd find on even something as primitive as the 6502, but the Alto doesn't have them. Instead, these operations are implemented with an inefficient subroutine call that does a sequence of operations with the same effect.
In addition, SHA-256 heavily uses bit shift and rotate operations. Modern processors typically have a "barrel shifter" that lets you shift by as many bits as you want in one step. The Alto's shift instructions, on the other hand, only shift a single bit. Thus, to shift by, say 10 bits, the Alto code calls a subroutine that performs 10 separate shift instructions. The result is a shift operation is much slower than you might expect.
You can see the Alto's arithmetic-logic board below. The Alto didn't use a microprocessor but instead built a CPU from simple TTL chips. You can see that even providing single-bit shifts required 8 separate chips—it's not surprising that the Alto doesn't do more complex shift operations.
I should point out that I'm not trying to write the best possible mining code for the Alto, and there are plenty of optimizations that one could do.8 For instance, writing the code in microcode would speed it up considerably, but Alto microcode is very hard to understand, let along write. My blog post on generating the Mandelbrot set on the Alto discussed Alto performance optimizations in detail, so I won't say more about optimization here.
Conclusion
The screenshot below shows a successful hash, ending in a bunch of zeros9 . (I also display an image to show off the Alto's high-resolution bitmapped display.) Since the Alto would take well beyond the lifetime of the universe to find a successful hash, you might wonder how I found this. For this demonstration I simply used as input a block that had been successfully mined in the past, specifically block #286819. Thus, the algorithm succeeded quickly, but since it was an old block, I didn't make any money off it.
My code is on github if you want to look at BCPL code or try it out.
Notes and references
-
At current difficulty, about 1 in 3x10^21 hashes will be successful at mining a block; a valid hash must start with approximately 17 zeros. The mining difficulty changes periodically to keep the time between blocks at approximately 10 minutes. As mining hardware gets faster, the difficulty factor is automatically updated to make mining more difficult so miners can never really catch up. ↩
-
A while back I estimated that Bitcoin mining uses about as much electricity as the entire country of Cambodia. One paper puts mining's energy consumption comparable to Ireland's electricity usage. ↩
-
Bitcoin uses "double SHA-256" which simply consists of applying the SHA-256 function twice. ↩
-
The K constants used in the SHA-256 algorithm are provided by the NSA. You might worry that the NSA cleverly designed these constants to provide a backdoor. However, to prove that these are just arbitrary random constants, the NSA simply used the cube roots of the first 64 primes. ↩
-
While SHA-256 is easy to implement, that's not the case for all the cryptography used by Bitcoin. To create a Bitcoin transaction, the transaction must be signed with elliptic curve cryptography. This requires 256-bit modular arithmetic, which is about as hard as it sounds. Even a simple implementation is 1000 lines of C. I decided that porting this to BCPL was too much effort for me. ↩
-
I wrote a simple Python script to convert the many 32-bit hexadecimal constants used by SHA-256 to 16-bit octal constants. It's a good thing that hex has almost entirely replaced octal, as it is much better. ↩
-
Some people claim that BCPL arrays are 0-based. However, arrays in BCPL structures can start at an arbitrary value. I start with 1 because that's what the Alto code typically does. (This caused me no end of confusion until I realized the indices weren't zero-based.) ↩
-
The code could be made 33% faster by taking advantage of an interaction between SHA-256 and the Bitcoin header structure. Bitcoin performs a SHA-256 hash twice on the 80-byte header. Since SHA-256 only handles 64 bytes at a time, the first hash requires two SHA-256 cycles. The second hash takes a third SHA-256 cycle. However, when mining, the only thing that changes from one attempt to the next is the nonce value in the header. It happens to be in the second half of the header, which means the SHA-256 cycle performed on the first half of the header can be done once and then reused. Thus, the double SHA-256 hash can be done with two SHA-256 cycles instead of three. Bitcoin mining usually performs this optimization, but I left it out of the code to make the code less confusing. ↩
-
You might wonder why Bitcoin successful hashes start with a bunch of zeros, while the displayed hash ends with a bunch of zeros. The reason is that Bitcoin reverses the byte order of the SHA-256 output. If you look closely, you'll see that the displayed hash matches the hash in the Bitcoin block diagram if you reverse bytes (i.e. pairs of hex digits). ↩