[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
Re: [PLUG] Hardware question
On Monday, June 9, 2003, at 08:07 AM, Paul wrote:
Forge wrote:
Doesn't a dual CPU system require twice the amount of RAM?
No, not at all.
Actually, the answer is - It depends on the workload.
Really? So two CPUs share the same RAM? If both are working under
the same load, will they split their memory use in half? In other
words, if you have 512MB of RAM, will each CPU only have 256MB to work
with? (Not that that isn't a good amount of RAM.)
There are, historically, two primary schemes for "multiple" CPU
utilization - tightly coupled and loosely coupled. In a loosely coupled
environment, a process is assigned to a single CPU and stays there for
its entire life. In a tightly coupled scenario, each INSTRUCTION can be
executed on the next available CPU. Note that this is a hardware
feature, and has nothing to do with threading, which is a software
feature.
In FreeBSD, as used by Apple in Rhapsody days, portions of the Kernel
code were always executed on one CPU. So, if you needed to do I/O, it
didn't matter which CPU was executing the program, it had to wait for
the I/O CPU to "free" to get the I/O done. I've been told this
deficiency has been "fixed" but I don't know. (It's one of the issues
in micro kernel vs monolithic kernel.)
Resources are shared based upon the compiler's capabilities and the OS'
multi-threading features. Today, if you "roll your own" the
optimization for your hardware setup pretty much happens automagically.
If you use pre-compiled binaries - "ya pays yer money and ya takes yer
chances." Some apps can be "optimized" much more than others by simple
virtue of what it is that they do. Others cannot be optimized at all.
And with others, it's simply not worth the effort to optimize them.
All this goes back to what made a given "super-computer" super. The
reality was, each of the different so called super-computers were
pretty close to each other in raw crunch power. It was only when you
invoked the particular feature and optimized your program to utilize it
that the "super" part of the computer was used -- some were array
processors, some vector processors, etc. The nasty thing was, if you
optimized your code for a vector processor, you had to re-optimize for
an array processor, which normally meant extensive re-coding.
This is not unlike the concepts behind a Beowulf cluster as compared to
a Digital VMS or Tru64 cluster. They are both "clusters," but they have
entirely different functionalities and concepts behind them. VMS
clusters aimed at robustness and zero downtime. The original VMS
cluster in Maynard Mass ran for something like 30+ years without ever
being shutdown for any reason. Every single piece of hardware in the
cluster was replaced and the OS upgraded many times (including the
transition from VAX to Alpha), but the cluster kept running, completely
available to the user community 24x7x365x30.
With proprietary Unix systems additional CPUs provide a roughly linear
power increase up to 4 CPUs, at which point you have about 3.5 times
the processing power of a single CPU. The hardware and kernel designs,
"just do it." I don't know how the Linux kernel fits into this equation
today. However, beyond 4 CPUs, the numbers change radically by vendor
and chip architecture. (I've seen numbers as low as 4.5X for an 8 CPU
box!)
Also with proprietary Unix (at least with Tru64 Unix from
Digital/Compaq/HP) it is quite simple to allocate a fixed amount of
memory to individual CPUs. It's called memory partitioning. However, it
is a feature associated with VLM systems - Very Large Memory - which
tend to cost big bucks.
With today's hardware designs, basically the Alpha and Power4, SMP
technology is now on the chip level. SPARC, PARC and IA64 would like to
be able to do this but can not. If the rumors are true that IA64 circa
2005 is really Alpha Inside, then it will also. Once the SMP technology
is reduced to the chip level, the associated memory and I/O management
and access technologies are also of necessity transformed. NUMA - Non
Uniform Memory Access - suddenly becomes a BIG issue. Grace Hopper
loved to hold up a 12 inch long piece of copper wire while intoning,
"This is a nanosecond." So when you have a VLM system, the time to
access physically close memory is less than that necessary to access
"distant" memory. The end result is that you have very different memory
management problems and hence models in serious SMP systems than you
find in single processor boxes.
The other thing to remember, there is a VERY big difference between the
way a multi-user time-sharing system works when compared to a system
dedicated to a single application. You can optimize/maximize a single
application seven-ways-from-Tuesday, to really get maximum performance
every time you run it on a given hardware configuration. But that takes
work. With a multi-user system, you don't have consistent or
predictable resources or demands with which to begin your
optimizations. So you make some gross assumptions and let-er-rip.
You trade off the work necessary to fine-tune for maximum performance
High Performance Technical Computing, aka Super Computing is fun, but
it is a very different animal than mail and web serving.
T.T.F.N.
William H. Magill
# Beige G3 - Rev A motherboard - 768 Meg
# Flat-panel iMac (2.1) 800MHz - Super Drive - 768 Meg
# PWS433a [Alpha 21164 Rev 7.2 (EV56)- 64 Meg]- Tru64 5.1a
magill@mcgillsociety.org
magill@acm.org
magill@mac.com
_________________________________________________________________________
Philadelphia Linux Users Group -- http://www.phillylinux.org
Announcements - http://lists.netisland.net/mailman/listinfo/plug-announce
General Discussion -- http://lists.netisland.net/mailman/listinfo/plug