This article has been updated, see Version 2
Cell Architecture Explained (Version 1) - Updates, Clarifications and Missing Bits
This is to clarify a few points which have appeared in discussions and correct any errors. Where necessary the original has been modified to make the points below clearer or correct them.
Updates
14/05/05 : Version 2.0 should be ready shortly covering the real chip, (this covers the patent). I believe my guess for the PS3 layout (see below) is wrong.
The ISSCC conference is starting in the US on 6th Feb and more information is expected to be made public on the Cell shortly thereafter. This article will be updated once I've read everything.
Some details of the real Cells are out now at
Electronics Weekly and
EETimes and many other places, Hannibal gets back into form and does a good write up here
Part1 and
Part2.
For the uber techies out there, there is a very detailed article at Real World Tech.
I'm updating this as I get more details so some things may change:
Note: Acronyms have changed but they haven't made up their mind what to!
APU = SPE or SPU.
PU = PPE.
Multiple simultaneous operating systems including Linux.
Apple may be involved as VMX (aka AltiVec) was an Apple initiative, with this the PPE core should boot OS X with little modification - though it will need some optimisation.
Looks like some architectural changes have been made since the patent application:
To cache or not to cache
I'm going to go out on a limb with a theory here (so, no change there then...):
I think the designers have been very clever and made the local storage look like a cache to the other SPEs. By doing this they have made SPE to SPE transfers possible without going to RAM, this will speed up steam processing considerably. I think they have implemented cache logic but it is separate from the local storage.
IF they've done this they will have got a local storage and the functionality of a cache without compromising either.
I (don't) have the POWER
The PowerPC PPE is not a POWER5 derivative, this came as something of a surprise but not 100% as I had suggested in Part 1 that IBM could always have something different up their sleeves. The PPE is said to be a dual issue in-order powerPC with VMX. This is opposite of other modern PowerPCs which are now complex out-of-order CPUs.
A simple design fits perfectly within the design of the rest of the Cell. The simplicity of this core means it will run some stuff slowly but some tasks scale with clock speed and these will run faster. Simpler VMX stuff should also go faster.
Interestingly the PPE is said to be "dual issue", this would appear to lend some credence to the rumour about the XBox 2 a while back which mentioned it having dual issue PowerPC CPUs. Further rumours indicate the PPE core may be derived from the forthcoming PowerPC 300 series.
Clarifications
Arstechnica ran a piece critiquing this article, given it's content and tone I felt I should write a rebuttal. I have not seen this sort of critisim from anywhere else so I assume this is a combination of a misunderstanding (see "Technical Analysis" below) and someone having a really bad day.
I'm too Enthusiastic
I am very enthusiastic about this technology and I believe I have good reason to be.
I wrote this because after the recent announcement were made, it was clear that not many people knew how the Cell operated. I had already read the original Sony patent once and had a good idea of how it worked. So, with nothing better to do I decided to dig it out and write an explanation.
Whilst going over it again I was quite amazed at the aggressiveness of the design, I had recently been reading about Cray's designs and this appears to be very heavily influenced by them. This is important because Cray was very successful in constantly making the world's fastest computers, other companies threw hundreds of people and millions of dollars at trying to beat him but they could never do it. If this influence has carried through to the final designs they are going to be monsters. The reported 4.6GHz clock figure indicates to me that this has happened.
Even if the Cell can only attain a fraction of it's potential processing power it's still going to be the fastest CPU on the market by a very wide margin.
Cell Will Demolish The PC On The First Day
One point that has been made to me is that claiming the Cell will demolish the PC on the first day is inviting criticism.
This claim is based on the fact that the similar technology in GPUs is already demolishing general purpose CPUs by hundreds of percent margins. This is not in theory, this is happening in real life applications today. GPUs are used in a variety of ways to accelerate different research applications, they can be used for a lot more than graphics.
The above claim may seem shocking to many PC users but if you are aware of what GPUs are capable of you'll not find it surprising at all. Your graphics chip can already outgun your CPU by a massive margin. You could consider Cell a more general purpose version of the same technology.
Technical Analysis?
A few people have been making comments about the quality of this "Analysis". If you read the first 4 paragraphs at the beginning of part 1 it's pretty clear that this is not a technical analysis.
Numbers
All numbers are either direct quotes or are derived from the patent or the 4.6GHz figure. These are the only figures I have to go on so I couldn't use anything else. This should also be clear given what I wrote at the beginning of part 1.
Cell Equivalent To 5 Dual Opterons
Slashdot liked the article so much they posted it twice, unfortunately first time they used the dodgiest quote in the article...
This comparison is based on the Cell reaching it's theoretical maximum computing power. We will not know IF this is possible until the chip becomes available and even then it will need a "perfect" application which can use all the APUs at full power simultaneously.
SETI in 5 Minutes
This is something of a "calculated guess", again based on the theoretical maximum computing power being achieved (in 4 Cells). It makes a lot of assumptions which may be in error so I will not be the least bit surprised if this figure is miles out...
I've added the actual calculation to part 5.
I've thought about this a bit more and wondered how to optimise an FFT (which SETI uses extensively) for a Cell. It looks to me like Cell will be a very good architecture for running FFT based applications.
The Cell Compiler Will Magically Make The Code Parallel
This is not true and I didn't say this. You still have to break up problems into software Cells. I still cannot figure out why people think I said this.
IBM processor
This article was based on the original Sony patent, there have been other patents but I have not looked at them in detail, I understand there may be caching mechanisms to allow transfers between the different processing units.
The CPUs have been designed by STI (Sony / Toshiba / IBM) the first CPUs should be available from IBM this year.
Maximum Memory Size
While the patent specifies 8 bank controllers with 8 MB each it does not specify the maximum memory of an individual Cell. So 64 MB is an "ideal" rather than an absolute. I suspect the Cells in the PS3 may be hardwired for 64 MB but not others.
Memory Protection And Paging
The APUs will work on specific chunks of RAM set aside specifically. They can either access it or not, nothing more. There is no paging system for the APUs as there is no need for it.
The PU will likely have a normal memory protection system with paging as it'll be dealing with more complex operations (i.e. running an OS).
Replacing PCs / Emulation
A single Cell will not emulate a top end PC at the same level of performance. What it will be able to emulate is a low end PC, this is useful because many heavy processing tasks can be off-loaded to the APUs so you wont need a high end PC. This will not happen for every possible task but will be a pretty common scenario.
The emulation itself may not work on the APUs unless the instructions are vectorisable or are already in vector form (MMX, SSE).
You Program These Things In Assembly?
No, you don't program virtual machines directly either, the compiler handles that for you. This will work the same way, but for real speed you'll probably want assembler level control.
IBM already has Cell compilers in beta.
This is just hype like the PS2 chip
From reading the patent on the PS2 chip/s, the vector processing engines in the PS2 are powerful but also somewhat specialised. The Cell contains general purpose vector units, that is anything which can be vectorised will run. According to one reader the PS2 processors could have been useful for general purpose vector processing but Sony never made it available. The difference this time is the Cell will not just be used in PS3s.
Untested, Radical New Architecture
Actually it's not. look up Cray XMP or YMP and you'll see another parallel vector processor. The difference is they've put this on a single chip with a massively higher clock rate. What is new is the way the programs (apulets) are distributed and the fact we've never seen anything as powerful as this on the desktop.
Missing Bits
Cell V's Java
I explained the differences in part 2 but I thought of this afterwards (of course):
What would happen if you were to combine them?
The result would be a application which would work on (almost) every Platform / OS and take full advantage of the Cells.
Going further, this could lead to applications in which the GUI and logic is written in a scripting language and the heavy processing in a more complex language. Such applications will be quick and easy to develop and automatically cross platform.
Dissapointed!
This thing gets views by lots and lots of 133t haX0Rz and nobody notices the hidden diagram! I was worried about bandwidth so I removed this from the article but it was still sitting in the same dir for anyone to pick up. In the event surprisingly little bandwidth was used so, here it is:
14/05/05: I'm *not* expecting to see 4 Cells in the PS3. 1 is much more likely, 2 if your're lucky.
Guess at the PS3 architecture diagram
Introduction and Index
Part 1: Inside The Cell
Part 2: Again Inside The Cell
Part 3: Cellular Computing
Part 4: Cell Vs the PC
Part 5: Conclusion and References
Updated: 25/01/05 (1)
© Nicholas Blachford 2005.