Showing posts with label T1. Show all posts
Showing posts with label T1. Show all posts

Thursday, April 11, 2013

Solaris: Massive Internet Scalability


[SPARC processor, courtesy Oracle SPARC T5/M5 Kick-Off]
 Solaris: Massive Internet Scalability
Abstract:
Computing systems started with single processors. As computer requirements increased, multiple processors were lashed together, using technology called SMP (Symmetric Multi-Processing) to add more computing power into a single system, breaking up tasks into processes and threads, but the transition to multi-threaded computing was a long process. The lack of scalability for some problems produced MPP (Massively Parallel Processing) platforms, lashing systems together using special software to load-balance jobs to be processed. MPP platforms were very difficult to program general purpose applications, so massively Multi-Core and Multi-Threaded processors started to appear. Oracle recently released the SPARC T5 processor and systems - producing an SMP platform scalable with massive sockets, cores, and threads into a single chassis - leveraging existing multi-threaded computing software, reducing the need for MPP in real-world applications, while placing tremendous pressure upon the Operating System layer.

[SPARC logo, courtesy SPARC.org]
SPARC Growth Rate:
The SPARC processors started a growth rate, with a movement to massively threaded software.
SPARCCoresGHzThreadsSocketsTotal-CoresTotal-Threads
T181.4321832
T281.6641864
T2+81.664432256
T3161.6128464512
T48364432256
T5163.612881281024
M563.648321921536

The movement to massively threaded processors meant that applications needed to be re-written to take advantage of the new higher throughput. Certain applications were already well suited for this workload (i.e. web servers) - but many were not.

[DTrace infrastructure and providers]
Application Challenges:
The movement to massively threaded software, to take advantage of the higher overall throughput offered by the new processor technology, was difficult for application programmers. Technologies such as DTrace were added to advanced operating systems such as Solaris to assist developers and systems administrators in pin-pointing their code hot-spots for later re-write.

When the SPARC T4 was released, there was a feature called "Critical Thread API" in the S3 core, to assist application programmers who could not resolve some single thread bottlenecks. The S3 core could automatically switch into a single-threaded mode (with the sacrifice of throughput) to address hot-spots. The T4 (and T5) faster S3 core was also clocked at a higher rate, providing an overall boost to single threaded workflows over previous processors - even at the same number of cores and threads. The ability to perform out-of-order instruction handling in the S3 also increased speed in the execution of single-threaded applications.

The SPARC T4 and T5 processors finally offered application developers a no-compromise processor. For heavy single-threaded workloads, the SPARC M5 processor was released from Oracle, driving inreasing scales of higher single-threaded workloads, without having to rely upon systems produced by long-time SPARC partner & competitor - Fujitsu.


[Solaris logo, courtesy Sun Microsystems]
Operating System Challenges:

A single system scaling to 192 cores and 1536 threads offers incredible challenges to Operating System designers. Steve Sistare from Oracle discusses some of these challenges in a Part 1 article and solutions in a Part 2 article. Some of the challenges overcome by Solaris included:
CPU scaling issues include: •increased lock contention at higher thread counts
•O(NCPU) and worse algorithms
Memory scaling issues include:
•working sets that exceed VA translation caches
•unmapping translations in all CPUs that access a memory page
•O(memory) algorithms
•memory hotspots

Device scaling issues include:
•O(Ndevice) and worse algorithms
•system bandwidth limitations
•lock contention in interrupt threads and service threads
Clearly, the engineering team at Oracle were up for the tasks created for them by the Oracle SPARC engineering team. Innovation from Sun Microsystems continues under Oracle. It will take years for other Operating System vendors to "catch up".
Network Management Applications:

In the realm of Network Management, many polling applications used threads to scale, where network communication to edge devices was latency bottlenecked - making the SPARC "T" processors an excellent choice in the carrier based environment.
The data returned by the massively mult-threaded pollers needed to be placed in a database, in a consistent fashion. This offered a problem during the device "discovery" process. This is normally a single-threaded process, which experienced massive slow-downs under the "T" processors - until the T4 was released. With processors like the SPARC T4 and SPARC T5 - Network Management applications gain the proverbial "best of both worlds" with massive hardware thread scalability for pollers and excellent single-threaded throughput during discovery bottlenecks with the "Critical Thread API."

The latest SPARC platforms are optimal platforms for massive Network Management applications. There is no other platform on the planet which compares to SPARC for managing "The Internet".
Posted by at No comments:
Labels: , , , , , , , , ,

Sunday, December 5, 2010

CoolThreads UltraSPARC and SPARC Processors


[UltraSPARC T3 Micrograph]

CoolThreads UltraSPARC and SPARC Processors

Abstract:

Processor development takes an immense quantity of time, to architect a high-performance solution, and an uncanny vision of the future, to project market demand and acceptance. In 2005, Sun embarked on a bold path moving toward many cores and many threads per core. Since the purchase of Sun by Oracle, the internal SPARC road map from Sun had clarified.


[UltraSPARC T1 Micrograph]
Generation 1: UltraSPARC T1
A new family of SPARC processors was announced by Sun on 2005 November 14.
  • Single die
  • Single socket
  • 64 bits
  • 4, 6, 8 integer cores
  • 4, 6, 8 crypto cores
  • 4 threads/core
  • 1 shared floating point core
  • 1.0 GHz - 1.4 GHz clock speed
  • 279 million transisters
  • 378 mm2
  • 90 nm CMOS (TI)
  • 1 JBUS port
  • 3 Megabyte Level 2 Cache
  • 1 Integer ALU per Core
  • ??? Memory Controllers
  • 6 Stage Integer Pipeline per Core
  • No embedded Ethernet into CPU
  • Crypto Algorithms: ???
Platform designed as a front-end server for web server applications. With a massive number of cores, it was designed to provide web-tier performance similar to existing quad-socket systems leveraging a single socket.

To understand the ground-breaking advancement in this technology, most processors were single core, with an occasional dual core processor (with cores glued together through a more expensive process referred to as a multi-chip module, driving higher software licensing costs for those platforms.)


Generation 2: UltraSPARC T2
The next generation of the CoolThreads processor was announced by Sun on 2007 August.
  • Single die
  • Single Socket
  • 64 bits
  • 4, 6, 8 integer cores
  • 4, 6, 8 crypto cores
  • 4, 6, 8 floating point units
  • 8 threads/core
  • 1.2 GHz - 1.6 GHz clock speed
  • 503 million transisters
  • 342 mm2
  • 65 nm CMOS (TI)
  • 1 PCI Express port (1.0 x8)
  • 4 Mageabyte Level 2 Cache
  • 2 Integer ALU per Core
  • 4x Dual Channel FBDIMM DDR2 Controllers
  • 8 Stage Integer Pipeline per Core
  • 2x 10 GigabitEthernet on-CPU ports
  • Crypto Algorithms: DES, Triple DES, AES, RC4, SHA1, SHA256, MD5, RSA-2048, ECC, CRC32
This processor was designed for higher compute intensive requirements and incredibly efficient network capacity. Platform made an excellent front-end server for applications as well as Middleware, with the ability to do 10 Gigabit wire-speed encryption with virtually no CPU overhead.

Competitors started to build Single-Die dual-core CPU's with Quad-Core processors by gluing dual-core processors into a Multi-Chip Module.


[UltraSPARC T2 Micrograph]
Generation 3: UltraSPARC T2+
Sun quickly released the first CoolThreads SMP capable UltraSPARC T2+ in 2008 April.
  • Single die
  • 1-4 Sockets
  • 64 bits
  • 4, 6, 8 integer cores
  • 4, 6, 8 crypto cores
  • 4, 6, 8 floating point units
  • 8 threads/core
  • 1.2 GHz - 1.6 GHz clock speed
  • 503 million transisters
  • 342 mm2
  • 65 nm CMOS (TI)
  • 1 PCI Express port (1.0 x8)
  • 4 Megabyte Level 2 Cache
  • 2 Integer ALU per Core
  • 2x? Dual Channel FBDIMM DDR2 Controllers
  • 8? Stage Integer Pipeline per Core
  • No embedded Ethernet into CPU
  • Crypto Algorithms: DES, Triple DES, AES, RC4, SHA1, SHA256, MD5, RSA-2048, ECC, CRC32
This processor allowed the T processor series to move from the Tier 0 web engines and Middleware to Application tier. Architects started to understand the benefits of this platform entering the Database tier. This was the first Coolthreads processor to scale past 1 and up to 4 sockets.

By this time, competition really started to understand that Sun had properly predicted the future of computing. The drive toward single-die Quad-Core chips have started with Hex-Core Multi-Chip Modules being predicted.


Generation 4: SPARC T3
The market became nervous with Oracle purchasing Sun. The first Oracle branded CoolThreads SMP capable UltraSPARC T3 was launched in in 2010 September.
  • Single die
  • 1-4 Sockets
  • 64 bits
  • 16 integer cores
  • 16 crypto cores
  • 16 floating point units
  • 8 threads/core
  • 1.67 GHz clock speed
  • ??? million transisters
  • 377 mm2
  • 40 nm
  • 2x PCI Express port (2.0 x8)
  • 6 Megabyte Level 2 Cache
  • 2 Integer ALU per Core
  • 4x DDR3 SDRAM Controllers
  • 8? Stage Integer Pipeline per Core
  • 2x 10 GigabitEthernet on-CPU ports
  • Crypto Algorithms: DES, 3DES, AES, RC4, SHA1, SHA256/384/512, Kasumi, Galois Field, MD5, RSA to 2048 key, ECC, CRC32
This processor was more than what the market was anticipating from Oracle. This processor took all the features of the T2 and T2+ combined them into the new T3 with an increase in overall features. No longer did the market need to choose between multiple sockets or embedded 10 GigE interfaces - this chip has it all plus double the cores.

The market, immediately before this release, the competition was releasing single die hex-core and octal-core CPU's using multi-chip modules, by gluing them together. The T3 was a substantial upgrade over the competition by offering double the cores on a single die.


Generation 5: SPARC T4
Oracle indicated in December 2010 that they had thousands of these processors in the lab and predicted this processor will be released end of 2011.

After the announcement, a separate press release indicated processors will have a renovated core, for higher single threaded performance, but the socket will offer half the cores.

Most vendors are projected to have 8 core processors available (through Multi-Chip modules) by the time the T3 is released, but only the T4 should be on a single piece of silicon during this period.


[2010-12 SPARC Solaris Roadmap]
Generation 6: SPARC T5

Some details on the T5 were announced with the T4. Processors will use the renovated T4 core, with a 28nm process. This will return to 16 cores per socket again. This processor may be the first Coolthreads T processor able to scale from 1-8 processors. It is projected to appear in early 2013.

Some vendors are projecting to have 12 core processors on the market using Multi-Chip Module technology, but when the T5 is released, this should still be the market leader in 16 cores per socket.

Network Management Connection

Consolidating most network management stations in a globalized environment works very well with the Coolthreads T-Series processors. Consolidating multiple slower SPARC platforms onto single and double socket T series have worked well over the past half decade.

While most network management polling engines will scale linearly with these highly-threaded processors, there are some operations which are bound to single threads. These type of processes include event correlation, startup time, and syncronization after a discovery in a large managed topology.

The market will welcome the enhanced T4 processor core and the T5 processor, when it is released.

Monday, February 8, 2010

IBM Power 7 and eDRAM Cache



IBM Power 7 and eDRAM Cache

Welcome IBM to the world of 64 Bit Octal-Core Computing!

On February 8th, 2010, Timothy Prickett Morgan wrote about the IBM Power 7 chip launch in The Register, "Sparc T 64-threaded T2 and T2+... quad-core, eight-threaded Tukwilas... the Power7 chip has 32 threads"'

It is nice to see the trail which first generation OpenSPARC T1 had blazed with 32 threads is being followed by IBM Power and Intel Itanium, both applying different technology to compete with Sun's second and second and third generation 64 threaded OpenSPARC processors.

Possible Architecture Trade-offs to eDRAM in Cache

Timothy Prickett Morgan also wrote, "The effect of this eDRAM on the Power7 design, and its performance, is two-fold. First, by adding the L3 cache onto the chip..."

The use of embedded DRAM, to reduce transistors, squeeze more cores, and reduce latency was a great idea, even with the refresh logic added onto the chip!

Every benefit comes with a drawbacks. The discourse on possible trade-offs have been silent, which confuses me from the media.

The use of Static RAM has been traditionally beneficial to the chip manufacturers, since they could get fast and regular access to the memory cells, without having to wait for a slow refresh signal to propagate across the RAM. It is interesting that no one (and I mean NO ONE) is talking about the impact of performance for the CPU cores needing to wait for refresh on the eDRAM.

I wonder what the ratio of performance hit to reduction in latency was in moving to eDRAM?

Multi-Ported Static RAM allows for fast (simultaneous) access from multiple cores into cache. With multi-process heavy workloads, where data in the cache may not be simultaneously accessed from different cores or hardware strands, eDRAM may be a good fit. With software multi-threaded heavy workloads, where the data in the cache will be accessed simultaneously by multiple cores and hardware strands, eDRAM may suffer in comparison to multi-ported SDRAM due to excessive inefficient re-loads from main memory and inefficient sharing.

I wonder what the ratio of benefit to performance hit in throughput for moving to eDRAM was in comparison under various real-world workloads where multi-threaded applications need to share the instructions & data in the cache?

I wonder if the performance of eDRAM will be as linear as SDRAM, as the processors get loaded up? (This reminds me of the Intel 50MHz 80486 vs Intel 66Mhz (33MHz bus) 80486 tradeoff from years past...)

Connection to Network Management

Network Management traditionally deals with extremely highly threaded workloads. Managing tens of thousands of devices with hundreds of thousands of managed resources often requires thousands of threads in a single process with very regular (1-5 minute) polling intervals required tremendous throughput.

The use of Power 7 in these types of managed device facing highly threaded workloads is yet to be measured - it may be one of the most fabulous chips on the market, or it may be mediocre, for the network management space. Power is not a substantial player in the Network Management world, so I would not really expect engineers to tune the CPU for this type of workload.

I would expect that engineers tuned Power for the Database market. Network Management does require long term storage requirements of data, so this may be a very good back-end platform.

Conclusion

The move to eDRAM is very interesting by IBM, almost as interesting as OpenSPARC moving to highly threaded octal cores many years ago.

Will other vendors emulate IBM in the move to eDRAM cache, the same way IBM, Intel, and AMD are moving to 64 bit octal-core as OpenSPARC did years ago?

U P D A T E ! ! !

Another article has come out to discuss the use of eDRAM by IBM.

First in the chain is the 32KB L1 data cache, which has seen its latency cut in half, from four cycles in the POWER6 to two cycles in POWER7. Then there's the 256KB L2, the latency of which has dropped from 26 cycles in POWER6 to eight cycles in POWER7—that's quite a reduction, and will help greatly to mitigate the impact of the shared L3's increased latency.

The POWER7's L3 is its most unique feature, and, at 32MB, it's positively gigantic. IBM was able to cram such a large L3 onto the chip by making it out of embedded DRAM (eDRAM) instead of the usual SRAM. This decision cost the cache a few cycles of latency

Monday, June 1, 2009

OpenSolaris 2009.06 Release - What's On The Horizon

OpenSolaris 2009.06 Release - What's On The Horizon?

Sun has announced OpenSolaris 2009.06 as well as third-party news related organizations like The Register.

It is nice to see more features getting bundled into OpenSolaris!

OpenSolaris offered robust kernel and file system integrated CIFS for some time (something that no other operating system has done as well, besides Windows) - a beautiful thing for integrating Solaris, Linux, and Windows environments onto a single underlying file system.

Since OpenSolaris is the core infrastructure which the Sun Storage platforms are based upon, adding faster networking and processor enhancements (both CPU throughput as well as power efficiency) provides performance boosts for Sun integrated storage systems.

In the area of integrated storage systems, being able to release OpenSolaris under UltraSPARC T1/T2/T2+ means being able to leverage octal crypto engines for both encrypted network transfers of storage data as well as encrypted disk reads/writes for storage data. Additional performance on encryption from client to disk would give a great boost in performance to the storage line. If Sun decided that this would be of interest to the U.S. DoD. - an UltraSPARC T OpenStorage product would be sensible.

Seeing the inclusion of SPARC RocK code seems to indicate that the next generation silicon is moving forward, otherwise programmers would not have wasted their time including code for a processor that would not be released (to undergo another set of silicon revisions.)

Also, seeing OpenSolaris boot under SPARC is a good indication that Solaris 11 is right around the corner, since OpenSolaris is basically the Solaris 11 release. The GUI install integration of OpenSolaris for SPARC is tantalizing - this would possibly make Solaris 11 one release-away.

The only other thing the market would want, on a future wish-list, is full clustering integrated into ZFS (with Sun's acquisition of HPC clustered file system, it is just a matter of time.) One would hope the market will not have to wait until Solaris 12 to run a zpool command sequence to configure a clustered ZFS file system! :-(

What does this have to do with Network Management?

With wide-spread network management, the need for massive storage systems to hold historical data of ever larger networks drive the need for substantial and redundant storage. Technologies lke ZFS enables this.

The re-engineering of the TCP/IP stack in OpenSolaris is a tremendous boon to network management infrastructure. SNMP will be able to be leveraged more effectively for managed servers, TCP/IP stacks will be faster with better QoS on management servers, and integrated hardware acceleration in UltraSPARC T2 processors will provide management systems substantially increased performance in network management systems with multiple virtual machines.

With Sun historically targeting the Telecommunications Industry, it is good to see this focus has not deviated as Sun has reached out to Storage (purchase of StorageTek, Open Storage initiative, etc.) - rather it is good to see the convergence of the silos as it benefits all communities.

Friday, March 20, 2009

IBM: Where SUN Has Been Competitive

There have been rumors about IBM purchasing SUN for a number of days now.

Where has SUN been competitive, where IBM would want to purchase them?

Since 2007, SUN has dominated performance of SPECWeb 2006 in 1 socket - it takes 4 socket systems to edge out a 1 socket SUN T2 processor - and of course, you just buy a 2, 3, or 4 socket T2 system to crush competing results, by an order of magnitude.

For most of 2008, SUN was at the top of the list for CINT2006 Rates in 1 socket

Since 2008 and so far into 2009, SUN is in the top 3 lists for in CINT2006 Rates in 2 sockets
http://spec.org/cgi-bin/osgresults?conf=rint2006&op=fetch&proj-COMPANY=256&proj-SYSTEM=256&proj-CORES=256&proj-CHIPS=256&critop-CHIPS=0&crit-CHIPS=2&proj-CORESCHP=256&proj-THREADS=0&proj-CPU=0&proj-CPU_MHZ=0&proj-CPUCHAR=0&proj-NCPUORD=0&proj-PARALLEL=0&proj-BASEPTR=0&proj-PEAKPTR=0&proj-CACHE1=0&proj-CACHE2=0&proj-CACHE3=0&proj-OCACHE=0&proj-MEMORY=0&proj-OS=0&proj-FS=0&proj-COMPILER=0&proj-HWAVAIL=0&crit2-HWAVAIL=Jan&proj-SWAVAIL=0&crit2-SWAVAIL=Jan&proj-COPIES=256&proj-PEAK=256&proj-BASE=256&proj-400PEAK=0&proj-400BASE=0&proj-401PEAK=0&proj-401BASE=0&proj-403PEAK=0&proj-403BASE=0&proj-429PEAK=0&proj-429BASE=0&proj-445PEAK=0&proj-445BASE=0&proj-456PEAK=0&proj-456BASE=0&proj-458PEAK=0&proj-458BASE=0&proj-462PEAK=0&proj-462BASE=0&proj-464PEAK=0&proj-464BASE=0&proj-471PEAK=0&proj-471BASE=0&proj-473PEAK=0&proj-473BASE=0&proj-483PEAK=0&proj-483BASE=0&proj-LICENSE=0&proj-TESTER=0&proj-SPONSOR=0&proj-TESTDAT=0&crit2-TESTDAT=Jan&proj-PUBLISH=256&critop-PUBLISH=-1&crit2-PUBLISH=Mar&crit-PUBLISH=2009&proj-UPDATE=0&crit2-UPDATE=Jan&dups=0&duplist=COMPANY&duplist=SYSTEM&duplist=CORES&duplist=CHIPS&duplist=CORESCHP&duplist=THREADS&duplist=CPU&duplist=PARALLEL&duplist=BASEPTR&duplist=PEAKPTR&duplist=CACHE1&duplist=CACHE2&duplist=CACHE3&duplist=OCACHE&duplist=COPIES&dupkey=PUBLISH&latest=Dec-9999&sort1=PEAK&sdir1=-1&sort2=SYSTEM&sdir2=1&sort3=CORESCHP&sdir3=1&format=tab

Since 2008 and so far into 2009, SUN is the top CINT2006 Rates in 4 sockets, even pulling away from the quad hex-core Intel processors by 25%

For most of 2008, SUN was at the top of the list for CFP2006 Rates in 1 socket.

For all of 2008 and so far though 2009, SUN is at the top of the list for CFP2006 Rates in 2 socket.

Since end of 2008 and so far though 2009, SUN is at the top of the list for CFP2006 Rates in 4 sockets.

SPARC has been clearly very competitive FOR YEARS on a field where POWER had decided to not even compete.

SUN has been very competitive in the Applications Arena. OpenOffice is a SUN led OpenSource project, which IBM rebranded as Lotus Symphony, and SUN rebrands as StarOffice.

SUN has been very competitive in the Application Foundation arena. Most cross-platform enterprise applications are written in JAVA.

SUN has been very competitive in Tape Storage. SUN and IBM are basically the only games in town for substantial tape storage. The American Government would probably demand a spin-off of something in the case of a merger.

SUN has been very competitive in the OS arena. There is nothing in the market like Open Source Solaris, only Windows has a more complete CIFS Kernel API set than OpenSolaris for CIFS/SMB. Only OpenSolaris offers a very full featured file system like ZFS (other OS's like Apple are starting to port and adopt pieces into MacOSX.) Systems administrators being able to trace third party applications programaticaly using DTrace is unheard of in the industry.

SUN has been very competitive in the Ultra-Thin Client arena. With third party manufacturers making ultra-thin laptops, SUN making ultra-thin desktops, significant power consumption savings from these units (better than any competing platform), and demonstrated savings over thens of thousand of users (SUN's policy of Eat Your Own Dogfood as well as U.S. Department of Defense) places SUN in a very unique position to share Solaris, Windows, and Linux applications securely over WiFi, Ethernet, and Fiber with zero desktop management (unless you consider unplugging and throwing out your box and moving to a different station to pick up from where you left off a problem!)

Considering the size of SUN and the resources of the competition, SUN has done fairly well, even if they have not been able to compete everywhere as effectively as those companies with deeper pockets such as IBM.
Subscribe to: Comments (Atom)

AltStyle によって変換されたページ (->オリジナル) /