Re: [PLUG] Hardware question

"William H. Magill" <magill@mcgillsociety.org> · Mon, 9 Jun 2003 13:00:46 -0400

On Monday, June 9, 2003, at 08:07 AM, Paul wrote:

Forge wrote:

Doesn't a dual CPU system require twice the amount of RAM?

No, not at all.

Actually, the answer is - It depends on the workload.
Really? So two CPUs share the same RAM? If both are working under 
the same load, will they split their memory use in half? In other 
words, if you have 512MB of RAM, will each CPU only have 256MB to work 
with? (Not that that isn't a good amount of RAM.)

There are, historically, two primary schemes for "multiple" CPU 
utilization - tightly coupled and loosely coupled. In a loosely coupled 
environment, a process is assigned to a single CPU and stays there for 
its entire life. In a tightly coupled scenario, each INSTRUCTION can be 
executed on the next available CPU. Note that this is a hardware 
feature, and has nothing to do with threading, which is a software 
feature.
In FreeBSD, as used by Apple in Rhapsody days, portions of the Kernel 
code were always executed on one CPU. So, if you needed to do I/O, it 
didn't matter which CPU was executing the program, it had to wait for 
the I/O CPU to "free" to get the I/O done. I've been told this 
deficiency has been "fixed" but I don't know. (It's one of the issues 
in micro kernel vs monolithic kernel.)
Resources are shared based upon the compiler's capabilities and the OS' 
multi-threading features. Today, if you "roll your own" the 
optimization for your hardware setup pretty much happens automagically. 
If you use pre-compiled binaries - "ya pays yer money and ya takes yer 
chances." Some apps can be "optimized" much more than others by simple 
virtue of what it is that they do. Others cannot be optimized at all. 
And with others, it's simply not worth the effort to optimize them.
All this goes back to what made a given "super-computer" super. The 
reality was, each of the different so called super-computers were 
pretty close to each other in raw crunch power. It was only when you 
invoked the particular feature and optimized your program to utilize it 
that the "super" part of the computer was used -- some were array 
processors, some vector processors, etc. The nasty thing was, if you 
optimized your code for a vector processor, you had to re-optimize for 
an array processor, which normally meant extensive re-coding.
This is not unlike the concepts behind a Beowulf cluster as compared to 
a Digital VMS or Tru64 cluster. They are both "clusters," but they have 
entirely different functionalities and concepts behind them. VMS 
clusters aimed at robustness and zero downtime. The original VMS 
cluster in Maynard Mass ran for something like 30+ years without ever 
being shutdown for any reason. Every single piece of hardware in the 
cluster was replaced and the OS upgraded many times (including the 
transition from VAX to Alpha), but the cluster kept running, completely 
available to the user community 24x7x365x30.
With proprietary Unix systems additional CPUs provide a roughly linear 
power increase up to 4 CPUs, at which point you have about 3.5 times 
the processing power of a single CPU. The hardware and kernel designs, 
"just do it." I don't know how the Linux kernel fits into this equation 
today. However, beyond 4 CPUs, the numbers change radically by vendor 
and chip architecture. (I've seen numbers as low as 4.5X for an 8 CPU 
box!)
Also with proprietary Unix (at least with Tru64 Unix from 
Digital/Compaq/HP) it is quite simple to allocate a fixed amount of 
memory to individual CPUs. It's called memory partitioning. However, it 
is a feature associated with VLM systems - Very Large Memory - which 
tend to cost big bucks.
With today's hardware designs, basically the Alpha and Power4, SMP 
technology is now on the chip level. SPARC, PARC and IA64 would like to 
be able to do this but can not. If the rumors are true that IA64 circa 
2005 is really Alpha Inside, then it will also. Once the SMP technology 
is reduced to the chip level, the associated memory and I/O management 
and access technologies are also of necessity transformed. NUMA - Non 
Uniform Memory Access - suddenly becomes a BIG issue. Grace Hopper 
loved to hold up a 12 inch long piece of copper wire while intoning, 
"This is a nanosecond." So when you have a VLM system, the time to 
access physically close memory is less than that necessary to access 
"distant" memory. The end result is that you have very different memory 
management problems and hence models in serious SMP systems than you 
find in single processor boxes.
The other thing to remember, there is a VERY big difference between the 
way a multi-user time-sharing system works when compared to a system 
dedicated to a single application. You can optimize/maximize a single 
application seven-ways-from-Tuesday, to really get maximum performance 
every time you run it on a given hardware configuration. But that takes 
work. With a multi-user system, you don't have consistent or 
predictable resources or demands with which to begin your 
optimizations. So you make some gross assumptions and let-er-rip.

You trade off the work necessary to fine-tune for maximum performance
High Performance Technical Computing, aka Super Computing is fun, but 
it is a very different animal than mail and web serving.
T.T.F.N.
William H. Magill
# Beige G3 - Rev A motherboard - 768 Meg
# Flat-panel iMac (2.1) 800MHz - Super Drive - 768 Meg
# PWS433a [Alpha 21164 Rev 7.2 (EV56)- 64 Meg]- Tru64 5.1a
magill@mcgillsociety.org
magill@acm.org
magill@mac.com

_________________________________________________________________________
Philadelphia Linux Users Group -- http://www.phillylinux.org
Announcements - http://lists.netisland.net/mailman/listinfo/plug-announce
General Discussion -- http://lists.netisland.net/mailman/listinfo/plug