System Admins and Developers
All System Admin Articles
busstat to Monitor Performance Counters for UltraSPARC T2 Plus External Coherency Hub Architecture
Sree Vemuri, July 2009
This document covers the following topics:
This tech tip discusses the external coherency hub architecture based on UltraSPARC T2 Plus Crossbar, also known as "Zambezi" (see the Zambezi Architecture blog entry). For more information, see the references section below (for example, the white paper Sun SPARC Enterprise T5440 Server Architecture depicts "UST2 Plus XBR" in Figure 10).
This coherency bridge architecture was introduced in the Sun SPARC Enterprise T5440 Server. The motherboard uses four "Zambezi" chips (ASICs from Texas Instruments) to connect four UltraSPARC T2 Plus processors. The Sun SPARC Enterprise T5440 Server is a quad-socket server, with up to four UltraSPARC T2 Plus processors. This architecture is available beginning with the Solaris 10 5/08 OS.
With
busstat, you can monitor the performance counters for the chips used in UltraSPARC T2 Plus XBR architecture. The performance counter registers for the Zambezi ASICs are provided in three areas:
The LPU is responsible for receiving and sending messages over the snoop link for a given port and contains the link framing unit (LFU) and input port sub-blocks.
The events counted by the LPU performance register are as follows:
0x00 = None
0x01 = Clock cycles
0x02 = Cycles in which c2c data was received from Port X
0x03 = Cycles in which memory data was received from Port X
0x04 = Cycles in which WB data was received from Port X
0x05 = Cycles in which NC (non-CSR) data was received from Port X
0x06 = Cycles in which c2c data was received from Port Y
0x07 = Cycles in which memory data was received from Port Y
0x08 = Cycles in which WB data was received from Port Y
0x09 = Cycles in which NC (non-CSR) data was received from Port Y
0x0A = Cycles in which c2c data was received from Port Z
0x0B = Cycles in which memory data was received from Port Z
0x0C = Cycles in which WB data was received from Port Z
0x0D = Cycles in which NC (non-CSR) data was received from Port Z
0x0E = Cycles in which a TID for a WB was retired
0x0F = Cycles in which a TID for an INV was retired
0x10 = Cycles in which a TID for an RTD was retired
0x11 = Cycles in which a TID for an RTO was retired
0x12 = Cycles in which a TID for an RTS was retired
0x13 = Cycles in which an IO_WRM egress message was sent
0x14 = Cycles in which an IO_RD egress message was sent
0x15 = Cycles in which a WB egress message was sent
0x16 = Cycles in which an INV egress message was sent
0x17 = Cycles in which an RTO egress message was sent
0x18 = Cycles in which an RTD egress message was sent
0x19 = Cycles in which an RTS egress message was sent
0x1A = Cycles in which there were no WB credits available
0x1B = Cycles in which there were no read/inv credits available
0x1C = Cycles in which a Cache Hit snoop response was received
0x1D = Cycles in which a Cache Miss snoop response was received
0x1E = Cycles in which an NDR response was received
0x1F = Cycles in which a WB_ACK response was received
0x20 = Cycles in which a READ/INV type snoop response was received
0x21 = Cycles in which a MISS snoop response was received
0x22 = Cycles in which a WB_HIT snoop response was received
0x23 = Cycles in which a HIT_S snoop response was received
0x24 = Cycles in which a HIT_O snoop response was received
0x25 = Cycles in which a HIT_M snoop response was received
0x26 = Count of the number of CRC errors
0x27 = Count of the number of replays sent
0x28 = Count of the number of replays received
0x29 = Count of the number of link retrainings
The general purpose design block of Zambezi includes the configuration and status controller, low pin count interface controller, Joint Test Action Group (JTAG) controller, debug port, and other miscellaneous functions.
The events counted by the GPD performance register are:
0x00 = None
0x01 = Clock cycles (that is, duration count)
The address serialization unit in Zambezi is designed to ensure that at most one request is outstanding for a given address.
The events counted by the ASU performance register are as follows:
0x00 = None
0x01 = Clock cycles (that is, duration count)
0x02 = ASU incoming cacheable request packet count
0x03 = ASU FR_ACK count (that is, outgoing cacheable request packet count)
0x04 = ASU pending transaction (that is, CAM hit) count
0x05 = ASU wakeup request transaction dequeue count
Extending from each UltraSPARC T2 Plus processor are four independent coherence planes. There are four Zambezi hubs in the system, each handling a single coherence plane. Each Zambezi ASIC is connected to each of the four UltraSPARC T2 Plus processors over four separate point-to-point serial coherence links. Because planes are independent, there are no connections between the Zambezi chips. Each Zambezi provides four LPUs, one GPD, and one ASU. LPU0-3, GPD0, and ASU0 belong to Zambezi0; LPU4-7, GPD1, and ASU1 belong to Zambezi1; and so on.
On a four-way Sun SPARC Enterprise T5440 Server,
busstat lists 16 LPU, 4 GPD, and 4 ASU Zambezi performance counters.
Here are some additional resources:
Unless otherwise licensed, code in all technical manuals herein (including articles, FAQs, samples) is provided under this License.