Conversations with Oracle Innovators

SPARC T4 Deep Dive With Rick Hetherington

By Diana Reichardt

Rick Hetherington, Oracle’s vice president of hardware development, manages a team of architects and performance analysts who design Oracle’s M- and T-series processors. In this interview, Hetherington describes the technical details of the new SPARC T4 processor and explains why he thinks it is going to be an eye-opener for the industry.

Q: Why are customers interested in the SPARC T4?

A: With the SPARC T4 processor we developed a brand new core, which we call the S3 core. The aim here was to develop a processor core that would provide high-speed, single-thread performance while also addressing the needs of applications that benefit from the high efficiency and throughput of multi-threaded cores. This core design meets the wide range of demands common in modern data centers. It's a more traditional out-of-order core that runs at a much higher clock rate frequency, 3 GHz.

Q: Can you explain what changed in the SPARC T4 design?

A: In 2006-2007, we were looking at a heterogeneous approach—that is two different types of core designs that would address a wide range of applications. But we decided that we could meet this demand with a single S3 core design. We call it dynamically threaded, heterogeneous, or even general purpose.

Q: How is the SPARC T4 different from the previous SPARC T-series processors?

A: The previous SPARC T-series processors (SPARC T1 through SPARC T3) used a single-issue core that was really devoted to driving throughput. It had a very short pipeline with no speculation and that allowed us to get really high levels of throughput efficiency. But we sacrificed single-thread performance as a result. Now with this new core we can get the best of both worlds: it's a single pipeline that is dual-issue, supporting one through eight threads.

The SPARC T3 has 16 cores and each core has two independent pipelines that are single instruction issue and in-order. Single issue means that one instruction can be scheduled down the pipeline at a time. In-order means we execute and retire instructions in program order.

In contrast, the SPARC T4 is a much more speculative with a very effective branch predictor, higher clock rate deeper pipeline and S3 will issue two instructions per clock. So, within the issue constraints we can launch two instructions simultaneously for instruction-level parallelism.

In addition, the SPARC T4 executes instructions out-of-'program'-order, meaning if a load instruction should miss in the cache, we can continue to execute instructions provided they do not depend on the outstanding load. This is a fairly common processor design technique to extract parallelism from memory.

Q: And what would you say is similar between the SPARC T3 and SPARC T4?

A: We intended to get the SPARC T4 to market as quickly as possible. So we developed a new core but we taped it out using the same process technology as the SPARC T3. We are using the same system-on-chip (SOC) components for the T4 that we used on the T3, and that includes the memory controllers, the coherence fabric, the I/O interconnect, and the network interface units. By keeping these components identical as well as the power and thermal demands we were able to use the same systems and get to market much faster.

Q: The SPARC T3 has 16 cores and the SPARC T4 has 8 cores. How do we derive better performance with fewer cores?

A: From a throughput perspective, the total aggregate throughput of 8 cores in a SPARC T4 is essentially equal to the total aggregate throughput of 16 cores in the SPARC T3. The higher clock rate and improved 'smart' factors result in more compute capabilities on a per-core basis. Smart factors include things like a 128KB private L2 cache per core, out-of-order execution, dual issue, and more aggressive speculation.

Q: Is there a virtualization capability required to run different types of applications on the SPARC T4?

A: It is not entirely necessary, but in a virtualized cloud environment having a SPARC processor that can handle any type of application that is assigned to it is obviously a huge advantage for customers. The SPARC T4 supports Oracle VM for SPARC (previously known as LDOMS) and Oracle Solaris Zones, but so does the SPARC T3. Virtualization addresses utilization however the task of extracting this dynamic thread capability belongs to the Oracle Solaris scheduler. A new feature called critical thread will recognize threads marked with the highest priority and assign them to a core all by itself.

Q: Can we talk about Oracle Solaris 11 and the SPARC T4 now?

A: Oracle Solaris 11 will allow customers to take full advantage of the new features in the SPARC T4. We already talked about the critical thread capability. I should also mention that we enhanced the cryptographic functions on the T4 by embedding accelerators within the core that are driven by a new set of instructions. Oracle Solaris 11 is fully optimized to deliver the very best crypto performance and it will offer much faster and safer software upgrades as well as reboot times. Then there is application mobility allowing customers to migrate apps between Solaris 11 servers. And finally very low performance overhead with Oracle Solaris Zones. It all adds up to higher performance, improved security, increased availability, and management.

Q: Can you provide an example of a single-threaded application that Oracle customers are using today that will benefit from the new SPARC T4?

A: A perfect example is the Packet Publisher feature in Coherence. This single thread writes out requested data to the client. It can be easily identified as limiting throughput with our collect/analyzer tool. But by assigning the highest priority that is recognized by the Oracle Solaris scheduler we see improvements in overall throughput of up to 20%.

Q: Will current SPARC/Solaris customers be able to run their existing applications on the new SPARC T4-based systems?

A: Yes, all the traditional apps that have been running on SPARC-based systems and the entire Oracle stack will run and run better on a SPARC T4 system.

Q: Where are we in terms of delivering the T4?

A: We are currently in a production ramp of 1.2 silicon. That means we will ship product with only two revs of the upper metal layers and go into production with the initial base layers. This is really quite extraordinary for a processor of this complexity. We have the best processor and silicon design engineers anywhere. And we are obviously very excited about the new T4 and the next generation of T-series systems based on the S3 core.

Q: Can you comment on how Oracle is executing on its SPARC roadmap?

A: Well, I have been with the "Niagara" family for almost 10 years. We started with just a couple of people back in 2002. And we revolutionized the industry when we first understood the strength of Solaris and SPARC in terms of our chip-multithreaded capabilities. What we were attempting to do all along was increase the thread performance and throughput as we developed the roadmap. And finally, I think we found a way of having the best of both worlds. I think the SPARC T4 really is going to be an eye-opener for the industry.