UltraSPARC T1

From Wikipedia, the free encyclopedia

Jump to: navigation, search
UltraSPARC T1
Produced 2005 - current
Max. CPU clock 1.0 GHz to 1.4 GHz
Instruction set SPARC V9
Cores 4, 6, 8

Sun Microsystems' UltraSPARC T1 microprocessor, known until its 14 November 2005 announcement by its development codename "Niagara", is a multithreading, multicore CPU. Designed to lower the energy consumption of server computers, the CPU typically uses 72 W of power at 1.4 GHz.

The T1 is a new-from-the-ground-up SPARC microprocessor implementation that conforms to the UltraSPARC Architecture 2005 specification and executes the full SPARC V9 instruction set. Sun has produced two previous multicore processors (UltraSPARC IV and IV+), but UltraSPARC T1 is its first microprocessor that is both multicore and multithreaded. The processor is available with four, six or eight CPU cores, each core able to handle four threads concurrently. Thus the processor is capable of processing up to 32 threads concurrently.

Similar to how high-end Sun SMP systems work, the UltraSPARC T1 can be partitioned. Thus, several cores can be partitioned for running a single or group of processes and/or threads, whilst the other cores deal with the rest of the processes on the system.

Contents

[edit] Cores

Pipeline UltraSPARC T1

The UltraSPARC T1 was designed from scratch as a multi-threaded, special-purpose processor, and thus introduces a whole new architecture for obtaining high performance. Rather than try to make each core as intelligent and optimized as they can, Sun's goal was to run as many concurrent threads as possible, and maximize utilization of each core's pipeline.

The T1's cores are less complex than those of current high end processors in order to allow 8 cores to fit on the same die. The cores do not feature out-of-order execution, or a sizable amount of cache. Single-thread processors depend heavily on large caches for their performance because cache misses result in a wait while the data is fetched from main memory. By making the cache larger the probability of a cache miss is reduced, but the impact of a miss is still the same.

The T1 cores largely side-step the issue of cache misses by multithreading. Each core is a barrel processor, meaning it switches between available threads each cycle. When a long-latency event occurs, such as cache miss, the thread is taken out of rotation while the data is fetched into cache in the background. Once the long-latency event completes, the thread is made available for execution again. Sharing of the pipeline by multiple threads may make each thread slower, but the overall throughput (and utilization) of each core is much higher. It also means that the impact of cache misses is greatly reduced, and the T1 can maintain high throughput with a smaller amount of cache. The cache no longer needs to be large enough to hold all or most of the "working set", just the recent cache misses of each thread.

Benchmarks demonstrate this approach has worked very well on commercial (integer), multithreaded workloads such as Java application servers, Enterprise Resource Planning (ERP) application servers, email (such as Lotus Domino) servers, and web servers. These benchmarks suggest each core in the UltraSPARC T1 is more powerful than the circa 2001, single-core, single-threaded UltraSPARC III, and at a chip to chip comparison, significantly outperforms other processors on multithreaded integer workloads.

At the time of its release in December 2005, a single-chip, eight-core, 32-thread, 1.2 GHz UltraSPARC T1 server performed similarly to a two-socket, four-core, eight-thread, 1.9 GHz IBM POWER5 server, performed similarly to a four-socket, eight-core, sixteen-thread 3.0 GHz Intel Xeon "Paxville MP" server, and exceeded the performance of a four-socket, four-core, four-thread 1.6 GHz Intel Itanium server. Arguably, this made the UltraSPARC T1 the world's most powerful general-purpose commercial server processors, when considering multithreaded commercial workloads.

Studies by Intel show that even under full load, a typical x86 server CPU is idle 50 to 60% of the time.[verification needed] This is due to cache misses which all CPU architectures suffer from; they must wait for data to arrive from RAM. That is also why modern CPUs have larger cache, complex prefetch logic, etc. However, CPUs belonging to the T1 family do not suffer from this problem. Instead, as soon a T1 thread stalls due to a cache miss, the T1 switches to another in the next clock cycle and continues to do work while waiting for the data to return for the previous thread. Typically on a modern CPU, a thread switch takes a much longer time than 1 clock cycle. This is the reason a T1 can work 95% of the time and only waits for data 5% of the time. Compare this to an x86 CPU at 3 GHz. Because the x86 CPU can only work at half speed due to cache misses, it can be compared to a 1.5 GHz CPU working at full speed. However, one of the T1 threads can compare to an Intel Pentium 3 CPU at 1 GHz in terms of computing power.

The T1 is slow on single threaded work but shines on multi-threaded work. A common mistake is that the T1 is not fully loaded when testing. When testing, typically it is loaded with small data, 1 GB or so. In that case an x86 CPU easily outperforms the T1. However, when the machine is heavily loaded with lots of data, the T1 will easily outperform the x86 CPU. The x86 CPU will stall but the T1 continues to work. The T1 degrades a magnitude slower than the x86 CPU. To fully take advantage of the T1, it must be loaded heavily. Otherwise it will not show its true potential.

[edit] Systems

SunFire T1000 server

The T1 processor can be found in the following products from Sun and Fujitsu Computer Systems:

[edit] Target market

The UltraSPARC T1 microprocessor is unique in its strength and weaknesses, and as such is targeted at specific markets. Rather than being used for high-end number-crunching and ultra-high performance applications, the chip is targeted at network-facing high-demand servers, such as high-traffic web servers, and mid-tier Java, ERP, and CRM application servers, which often utilize a large number of separate threads. One of the limitations of the T1 design is that a single floating point unit (FPU) is shared between all 8 cores, making the T1 unsuitable for applications performing a lot of floating point mathematics. However, since the processor's intended markets do not typically make much use of floating-point operations, Sun does not expect this to be a problem. Sun provides a tool for analysing an application's level of parallelism and use of floating point instructions to determine if it is suitable for use on a T1 or T2 platform.[1]

In addition to web and application tier processing, the UltraSPARC T1 may be well suited for smaller database applications which have a large user count. One customer has published results showing that a MySQL application running on an UltraSPARC T1 server ran 13.5 times faster than on an AMD Opteron server.[2]

[edit] Virtualization

T1 is the first SPARC processor that supports the Hyper-Privileged execution mode. The SPARC Hypervisor runs in this mode, and it can partition a T1 system into 32 Logical Domains, each of which can run an operating system instance.

Currently, Solaris and Linux are supported, and FreeBSD support is under development.[3]

[edit] Software licensing issues

Traditionally, commercial software suites like Oracle database charge their customers based on the number of processors the software runs on. In early 2006, Oracle changed the licensing model by introducing the processor factor. With a processor factor of .25 for the T1, an 8-core T2000 requires only a 2-CPU license. [4]

In Q3 2006, IBM introduced the concept of Value Unit (VU) pricing. Each core of the T1 is 30 PVUs instead of the default value of 100 PVUs per core. [5]

[edit] Weaknesses

The T1 is only available in uniprocessor systems, limiting vertical scalability in large enterprise environments; Sun has announced that the follow-on "Victoria Falls" processor will address this.[6]

[edit] Application tuning

Leveraging the massive amount of thread-level parallelism (TLP) available on the CoolThreads platform can require different application development techniques than for traditional server platforms. Utilizing TLP in applications is key to getting good performance. Sun has published a number of Sun BluePrints to assist application programmers in developing and deploying software on T1 or T2-based CoolThreads servers. The main article, Tuning Applications on UltraSPARC T1 Chip Multithreading Systems,[7] addresses issues for general application programmers. There is also a BluePrints article on using the Cryptographic Accelerator Units on the T1 and T2 processors.[8]

[edit] Case studies

A wide range of applications were optimized on the CoolThreads platform, including Symantec Brightmail AntiSpam,[9] Oracle's Siebel applications,[10] and the Sun Java System Web Proxy Server.[11] Sun also documented its experience in moving its own online store onto a T2000 server cluster,[12] and have published two articles on web consolidation on CoolThreads using Solaris Containers.[13][14]

Sun has an application performance tuning page for a range of open source applications, including MySQL, PHP, gzip, and ImageMagick.[15] Proper optimization for CoolThreads systems can result in significant gains: when the Sun Studio compiler is used with the recommended optimization settings, MySQL performance improves by 268% compared to using just the -O3 flag.

[edit] "Rock"

The UltraSPARC T1 is designed for single CPU systems only and is not capable of SMP. Future Sun CMT UltraSPARC processors such as Rock will support multiple chip server architectures. The Rock processor targets traditional data facing workloads such as databases. As such, it is seen as the logical follow-on to Sun's SMP processors such as UltraSPARC IV, rather than a replacement for the UltraSPARC T1 or T2.

Rock also targets floating point workloads, unlike UltraSPARC T1. Sun has publicly disclosed a feature in the Rock processor called hardware scout, which uses multithreaded hardware to perform prefetching.

Rock is the world's first general purpose processor with hardware transactional memory.

[edit] UltraSPARC T2

Formerly known by the codename Niagara 2, the follow-on to the UltraSPARC T1 supports eight threads per core, and each core has its own FPU.

[edit] UltraSPARC T2 Plus

In February 2007, Sun announced at its annual analyst summit that its third-generation simultaneous multithreading design, code-named Victoria Falls, was taped out in October 2006. A two-socket server (2 RU) will have 128 threads, 16 cores, and a 65× performance improvement over UltraSPARC III.[6]

At the Hot Chips 19 conference, Sun announced that Victoria Falls will be in 2-way and 4-way servers. Thus, a single 4-way SMP server will support 256 concurrent hardware threads.[16]

In April 2008, Sun released 2-way UltraSPARC T2 Plus servers, the SPARC Enterprise T5140 and T5240.

In October 2008, Sun released 4-way UltraSPARC T2 Plus SPARC Enterprise T5440 server.[17]

[edit] Niagara 3

In October 2006, Sun disclosed that Niagara 3 will be built with a 45 nm process[citation needed]. According to an article in The Register from June 2008 the processor will have 16 cores with 16 threads each.

[edit] Open design

On March 21, 2006, Sun made the UltraSPARC T1 processor design available under the GNU General Public License via the OpenSPARC project. The published information includes:

  • Verilog source code of the UltraSPARC T1 design;
  • Verification suite and simulation models;
  • ISA specification (UltraSPARC Architecture 2005);
  • The Solaris 10 OS simulation images.

[edit] References

  1. ^ "cooltst: Cool Threads Selection Tool". Workload Characterization blog. Sun Microsystems. April 6, 2006. http://blogs.sun.com/WCP/entry/cooltst_cool_threads_selection_tool. Retrieved on 2008-05-30. 
  2. ^ Thomas Rampelberg; Jason J. W. Williams (2006-05-09). "Cruisin' with a T2k" (PDF). DigiTar. p. 6. http://blogs.digitar.com/media/2/T2000_Experience.pdf. Retrieved on 2007-02-07. 
  3. ^ "FreeBSD/sun4v Project". http://www.freebsd.org/platforms/sun4v.html. Retrieved on 2007-04-09. 
  4. ^ "Multi-core Processors: Impact On Oracle Processor Licensing" (PDF). Oracle. http://www.oracle.com/corporate/pricing/multicore_faq.pdf. Retrieved on 2007-08-12. 
  5. ^ "Processor Value Unit Licensing for Distributed SW". IBM. http://www-142.ibm.com/software/sw-lotus/services/cwepassport.nsf/wdocs/pvu_table_for_customers. Retrieved on 2007-08-11. 
  6. ^ a b Fowler, John (February 6, 2007). "Growth by Design" (PDF). Sun Microsystems. p. 21. http://www.sun.com/events/sas2007/docs/09_fowler_sas_07.pdf. Retrieved on 2007-02-07. 
  7. ^ "Developing and Tuning Applications on UltraSPARC T1 Chip Multithreading Systems". Sun BluePrints Online. Sun Microsystems. http://www.sun.com/blueprints/1205/819-5144.pdf. Retrieved on 2008-01-09. 
  8. ^ "Using the Cryptographic Accelerators in the UltraSPARC T1 and T2 Processors". Sun BluePrints Online. Sun Microsystems. http://www.sun.com/blueprints/0306/819-5782.pdf. Retrieved on 2008-01-09. 
  9. ^ "Tuning Symantec Brightmail AntiSpam on UltraSPARC T1 and T2 Processor-Powered Servers". Sun BluePrints Online. Sun Microsystems. http://www.sun.com/blueprints/1006/820-0132.pdf. Retrieved on 2008-01-09. 
  10. ^ "Optimizing Oracle's Siebel Applications on Sun Fire Servers with CoolThreads Technology". Sun BluePrints Online. Sun Microsystems. http://www.sun.com/blueprints/0607/820-2218.pdf. Retrieved on 2008-01-09. 
  11. ^ "Sun's High-Performance and Reliable Web Proxy Solution". Sun BluePrints Online. Sun Microsystems. http://www.sun.com/blueprints/0607/820-2142.pdf. Retrieved on 2008-01-09. 
  12. ^ "Consolidating the Sun Store onto Sun Fire T2000 Servers". Sun BluePrints Online. Sun Microsystems. October 2007. http://www.sun.com/blueprints/1205/819-5148.pdf. Retrieved on 2008-01-09. 
  13. ^ "Deploying Sun Java Enterprise System 2005-Q4 on the Sun Fire T2000 Server Using Solaris Containers". Sun BluePrints Online. Sun Microsystems. http://www.sun.com/blueprints/0806/819-7663.pdf. Retrieved on 2008-01-09. 
  14. ^ "Web Consolidation on the Sun Fire T1000 using Solaris Containers". Sun BluePrints Online. Sun Microsystems. http://www.sun.com/blueprints/1205/819-5149.pdf. Retrieved on 2008-01-09. 
  15. ^ "Application Performance Tuning". Sun Microsystems. http://wikis.sun.com/display/AppPerfTuning/Application+Performance+Tuning+Home. Retrieved on 2008-01-09. 
  16. ^ Stephen, Phillips (August 21, 2007). "Victoria Falls: Scaling Highly-Threaded Processor Cores" (PDF). Sun Microsystems. p. 24. http://www.opensparc.net/pubs/preszo/07/HC19.sphillips.v1.pdf. Retrieved on 2007-08-24. 
  17. ^ "Sun and Fujitsu's SPARC Enterprise T5440 Server Redefines Midrange Enterprise Computing with Industry-Leading Price Points, Power Management and Multiple World Record Benchmarks". Sun Microsystems. October 13, 2008. http://www.sun.com/aboutsun/pr/2008-10/sunflash.20081013.1.xml. Retrieved on 2008-10-13. 

[edit] External links

Personal tools