Using busstat to Monitor Performance Counters for UltraSPARC T2 Plus External Coherency Hub Architecture

Sree Vemuri, July 2009

This document covers the following topics:

Architecture of External Coherency Hub (UltraSPARC T2 Plus Crossbar)

This tech tip discusses the external coherency hub architecture based on UltraSPARC T2 Plus Crossbar, also known as "Zambezi" (see the Zambezi Architecture blog entry). For more information, see the references section below (for example, the white paper Sun SPARC Enterprise T5440 Server Architecture depicts "UST2 Plus XBR" in Figure 10).

This coherency bridge architecture was introduced in the Sun SPARC Enterprise T5440 Server. The motherboard uses four "Zambezi" chips (ASICs from Texas Instruments) to connect four UltraSPARC T2 Plus processors. The Sun SPARC Enterprise T5440 Server is a quad-socket server, with up to four UltraSPARC T2 Plus processors. This architecture is available beginning with the Solaris 10 5/08 OS.

With busstat, you can monitor the performance counters for the chips used in UltraSPARC T2 Plus XBR architecture. The performance counter registers for the Zambezi ASICs are provided in three areas:

  • Link port unit (LPU)
  • General purpose design block (GPD)
  • Address serialization unit (ASU)

LPU Performance Instrumentation

The LPU is responsible for receiving and sending messages over the snoop link for a given port and contains the link framing unit (LFU) and input port sub-blocks.

The events counted by the LPU performance register are as follows:

  • 0x00 = None
  • 0x01 = Clock cycles
  • 0x02 = Cycles in which c2c data was received from Port X
  • 0x03 = Cycles in which memory data was received from Port X
  • 0x04 = Cycles in which WB data was received from Port X
  • 0x05 = Cycles in which NC (non-CSR) data was received from Port X
  • 0x06 = Cycles in which c2c data was received from Port Y
  • 0x07 = Cycles in which memory data was received from Port Y
  • 0x08 = Cycles in which WB data was received from Port Y
  • 0x09 = Cycles in which NC (non-CSR) data was received from Port Y
  • 0x0A = Cycles in which c2c data was received from Port Z
  • 0x0B = Cycles in which memory data was received from Port Z
  • 0x0C = Cycles in which WB data was received from Port Z
  • 0x0D = Cycles in which NC (non-CSR) data was received from Port Z
  • 0x0E = Cycles in which a TID for a WB was retired
  • 0x0F = Cycles in which a TID for an INV was retired
  • 0x10 = Cycles in which a TID for an RTD was retired
  • 0x11 = Cycles in which a TID for an RTO was retired
  • 0x12 = Cycles in which a TID for an RTS was retired
  • 0x13 = Cycles in which an IO_WRM egress message was sent
  • 0x14 = Cycles in which an IO_RD egress message was sent
  • 0x15 = Cycles in which a WB egress message was sent
  • 0x16 = Cycles in which an INV egress message was sent
  • 0x17 = Cycles in which an RTO egress message was sent
  • 0x18 = Cycles in which an RTD egress message was sent
  • 0x19 = Cycles in which an RTS egress message was sent
  • 0x1A = Cycles in which there were no WB credits available
  • 0x1B = Cycles in which there were no read/inv credits available
  • 0x1C = Cycles in which a Cache Hit snoop response was received
  • 0x1D = Cycles in which a Cache Miss snoop response was received
  • 0x1E = Cycles in which an NDR response was received
  • 0x1F = Cycles in which a WB_ACK response was received
  • 0x20 = Cycles in which a READ/INV type snoop response was received
  • 0x21 = Cycles in which a MISS snoop response was received
  • 0x22 = Cycles in which a WB_HIT snoop response was received
  • 0x23 = Cycles in which a HIT_S snoop response was received
  • 0x24 = Cycles in which a HIT_O snoop response was received
  • 0x25 = Cycles in which a HIT_M snoop response was received
  • 0x26 = Count of the number of CRC errors
  • 0x27 = Count of the number of replays sent
  • 0x28 = Count of the number of replays received
  • 0x29 = Count of the number of link retrainings

GPD Performance Instrumentation

The general purpose design block of Zambezi includes the configuration and status controller, low pin count interface controller, Joint Test Action Group (JTAG) controller, debug port, and other miscellaneous functions.

The events counted by the GPD performance register are:

  • 0x00 = None
  • 0x01 = Clock cycles (that is, duration count)

ASU Performance Instrumentation

The address serialization unit in Zambezi is designed to ensure that at most one request is outstanding for a given address.

The events counted by the ASU performance register are as follows:

  • 0x00 = None
  • 0x01 = Clock cycles (that is, duration count)
  • 0x02 = ASU incoming cacheable request packet count
  • 0x03 = ASU FR_ACK count (that is, outgoing cacheable request packet count)
  • 0x04 = ASU pending transaction (that is, CAM hit) count
  • 0x05 = ASU wakeup request transaction dequeue count

Extending from each UltraSPARC T2 Plus processor are four independent coherence planes. There are four Zambezi hubs in the system, each handling a single coherence plane. Each Zambezi ASIC is connected to each of the four UltraSPARC T2 Plus processors over four separate point-to-point serial coherence links. Because planes are independent, there are no connections between the Zambezi chips. Each Zambezi provides four LPUs, one GPD, and one ASU. LPU0-3, GPD0, and ASU0 belong to Zambezi0; LPU4-7, GPD1, and ASU1 belong to Zambezi1; and so on.

Usage Example

On a four-way Sun SPARC Enterprise T5440 Server, busstat lists 16 LPU, 4 GPD, and 4 ASU Zambezi performance counters.

For More Information

Here are some additional resources:


Comments (latest comments first)

Discuss and comment on this resource in the BigAdmin Wiki
 

Unless otherwise licensed, code in all technical manuals herein (including articles, FAQs, samples) is provided under this License.

Left Curve
Popular Downloads
Right Curve
Untitled Document
Left Curve
More Systems Downloads
Right Curve