Articles
Server and Storage Development
by Bruce Evans, Julia Harper, and Terry Whatley, March 2012
SPARC T-Series systems have power-saving features designed into the hardware and software. These features allow you to reduce server power consumption, which leads to a cost reduction for environmental cooling and reduced power usage by other infrastructure components. The SPARC T-Series power management (PM) interfaces make it easy to manage these PM features.
|
This article covers the following topics:
This section describes power management policies, power capping, and device power management in Oracle Solaris 10.
There are two power management policies: performance and elastic. When the performance policy is enabled, all hardware power states are set to full power (unless power capping is enabled, as described in the next section). When the elastic policy is enabled, hardware power states are selected based on system utilization.
System power consumption can be reduced by tens to hundreds of watts depending on the system configuration. For example, on a SPARC T4-4 server with 256 gigabytes of memory, we measured a savings of 200 watts (17% of full power).
Use the performance policy when:
Use the elastic policy when:
You can set a power consumption limit for the system.
Use a power cap when:
Power capping works with either the performance policy or the elastic policy.
In Oracle Solaris 10, device power management allows you to configure when to apply low power states to idle devices.
Note: This feature is no longer available in Oracle Solaris 11.
Use device power management when:
Power saved by using device power management is in addition to the power savings from using the elastic policy or power capping.
Below is a summary of the interfaces that allow you to enable the power management features you want. See the appendix for more details on how to access and configure each interface.
The PM policy is managed in ILOM under the /SP/powermgmt target. There are several ways to view or change the policy.
Log in to the ILOM SP as root. Show and set the current policy, as follows. (The ILOM prompt is ->.)
-> show /SP/powermgmt policy -> set /SP/powermgmt policy=elastic -> set /SP/powermgmt policy=performance
This section shows how to set a PM policy using the ILOM BUI.
https://<SP-IP-address>, and log in as user root.
Figure 1. ILOM BUI
On a management system with network access to the ILOM SP, use the snmpget and snmpset commands shown below to read and set the PM policy using the SNMP MIB called SUN-HW-CTRL-MIB.
snmpget -v2c -cprivate <SP-IP-address> sunHwCtrlPowerMgmtPolicy.0
snmpset -v2c -cprivate <SP-IP-address> sunHwCtrlPowerMgmtPolicy.0 3
snmpset -v2c -cprivate <SP-IP-address> sunHwCtrlPowerMgmtPolicy.0 4
On a management system with network access to the ILOM SP, use the ipmitool commands shown below to read and set the PM policy. This requires version 1.8.9 or later of ipmitool. Provide the root password for the SP when prompted by the tool.
ipmitool -I lan -H <SP-IP-address> -U root sunoem cli "show /SP/powermgmt" ipmitool -I lan -H <SP-IP-address> -U root sunoem cli "set /SP/powermgmt policy=performance" ipmitool -I lan -H <SP-IP-address> -U root sunoem cli "set /SP/powermgmt policy=elastic"

Figure 2. Oracle Enterprise Manager Ops Center
The power cap is managed in ILOM under the /SP/powermgmt/budget target. There are several ways to view or change the budget. See the appendix for details about additional properties for advanced control.
First, log in to the SP ILOM as root. Then use the following commands.
-> show /SP/powermgmt/budget
-> show /SP/powermgmt actual_power
-> set /SP/powermgmt/budget pendingpowerlimit=400
-> set /SP/powermgmt/budget commitpending=true
-> set /SP/powermgmt/budget activation_state=enabled
This section shows how to set a power cap budget using the ILOM BUI.
https://<SP-IP-address>, and log in.
Figure 3. ILOM BUI
On a management system with network access to the ILOM SP, use the snmpget and snmpset commands shown below to read and set the power cap using the SNMP MIB called SUN-HW-CTRL-MIB.
snmpget -v2c -cprivate <SP-IP-address> sunHwCtrlPowerMgmtBudget.0
snmpget -v2c -cprivate <SP-IP-address> sunHwCtrlPowerMgmtBudgetPowerlimit.0
snmpset -v2c -cprivate <SP-IP-address> sunHwCtrlPowerMgmtBudgetPendingPowerlimit.0 = 500
snmpset -v2c -cprivate <SP-IP-address> sunHwCtrlPowerMgmtBudgetCommitPending.0 = true
snmpset -v2c -cprivate <SP-IP-address> sunHwCtrlPowerMgmtBudget.0 = enabled
On a management system with network access to the ILOM SP, use the ipmitool commands shown below to read and set the PM policy. Version 1.8.9 or later supports the sunoem cli command. Provide the root password for the SP when prompted by the tool.
ipmitool -I lan -H <SP-IP-address> -U root sunoem cli "show /SP/powermgmt/budget"
ipmitool -I lan -H <SP-IP-address> -U root sunoem cli "set /SP/powermgmt/budget pendingpowerlimit=400"
ipmitool -I lan -H <SP-IP-address> -U root sunoem cli "set /SP/powermgmt/budget commitpending=true"
ipmitool -I lan -H <SP-IP-address> -U root sunoem cli "set /SP/powermgmt/budget activation_state=enabled"
Device PM is managed via the host CLI, using the pmconfig(1M) command and the /etc/power.conf(4) file. See the appendix for more details on how to configure device power management.
power.conf file. See power.conf(4) for detailed information.
autopm (enable | disable}
system-threshold {always-on | <idle_time>}
device-thresholds <physical_path1> {<idle_time> | always-on}
...
device-thresholds <physical_pathx> {<idle_time> | always-on}
cpu-threshold <idle_time>
power.conf with this command:pmconfig
Below are some examples of how to take advantage of the SPARC T-Series power management features.
Enable the elastic policy every evening at 5 p.m. and disable it at 7 a.m. on weekdays. Leave the elastic policy enabled during the weekend. This is accomplished with a ksh script and crontab for the time scheduling. The script uses the SNMP interface to communicate with the ILOM SP to either get or set the PM policy for one or more systems.
Here is a link to the policy.ksh script.
Enable power capping every weekday from 3 p.m. to 7 p.m. during the summer months of June, July, and August because electricity rates go up for high-demand hours. This is accomplished with a Perl script and crontab for the time scheduling. The script uses the SNMP interface to communicate with the ILOM SP to either configure and enable or disable the power cap for one or more systems.
Here is a link to the pwrcap.pl script.
When the power management policy is set to the performance policy, enable device power management, so that at least some type of power management is being done.
/etc/power.conf must be edited to specify the devices that are to be managed and their thresholds.power.conf is edited to specify the devices, the only change that needs to be made to the file is to turn on or off device power management.All these tasks are accomplished with a Python script and crontab for the time scheduling. The script will enable or disable autopm in the /etc/power.conf file.
Here is a link to the devicepm.py script.
This section provides a glossary, describes various hardware states and power management software, and provides an example of power consumption for the performance policy versus the elastic policy.
| Term | Description |
|---|---|
| Chip multithreading (CMT) | A hardware technology that uses chip multiprocessing and hardware multithreading to allow simultaneous execution of multiple software threads on a processor. |
| CKE (ClocK Enable) | A HW clock signal that HW components use for timing and operation. |
| Core | A HW unit on a CPU chip that contains strands. |
| HV (Hypervisor ) | The hyper-privileged firmware in sun4v. |
| ILOM (Integrated Lights Out Manager) | The firmware on the SP. |
| Management Information Base (MIB) | A collection of objects that describe an SNMP manageable entity. |
| PPFE (Pre-charge Power-down Fast Exit) | A memory idle state mode with short latency to exit the idle state. |
| PPSE (Pre-charge Power-down Slow Exit) | A memory idle state mode with long latency to exit the idle state. |
| SNMP (Simple Network Management protocol) | An internet standard protocol for managing devices on IP networks. |
| SP (Service Processor) | A small circuit board-based computer that includes both hardware and software to help control a larger computer system. |
| Strand | An execution unit within a core on the CPU chip. |
There are several hardware features that can be applied to put components into lower power states:
Power management is provided by Oracle VM Server for SPARC, which makes use of the hardware power states to manage power. Below is a description of the behavior of Oracle VM Server for SPARC 2.2.
When elastic policy is enabled, Oracle VM Server for SPARC uses the Clock Cycle Skip feature to keep the CPU utilization of each guest within a target utilization range. Oracle VM Server for SPARC adjusts the cycle skip ratio to a level sufficient to address the utilization needs of all guests sharing the CPU chip. Oracle VM Server for SPARC also disables cores that have no strands assigned to guests.
When elastic policy is enabled, Oracle VM Server for SPARC determines when to apply PPSE or PPFE to groups of DIMMs based on the utilization of those DIMMs. If utilization falls below the target range, Oracle VM Server for SPARC puts the DIMMs in PPSE mode. If utilization rises above the range, the DIMMs are put into PPFE mode. The same mode (PPFE or PPSE) must be applied to all DIMMs whose memory addresses are interleaved together. On a SPARC T4, there are two memory interleaves:
The memory interleave is determined by the power management policy in effect when the system is booted from a powered-off state. The elastic policy causes DIMMs to be interleaved into two groups per CPU, whereas the performance policy interleaves all DIMMs under a CPU into one group. Utilization is less likely to fall below the target in the larger DIMM group, so memory power management will tend to be applied less often.
Oracle VM Server for SPARC uses the Clock Cycle Skip feature to achieve a power cap. When power capping is enabled, the service processor monitors the system power consumption and tells Oracle VM Server for SPARC whether it must reduce power or it is allowed to increase power. Based on this notification, Oracle VM Server for SPARC either increases or decreases the cycle skip level of the CPU chips in the system. When a clock cycle skip level is selected by power capping, CPU power management cannot set a higher level for either the performance or the elastic policy.
We measured a SPARC T4-4 system with 256 gigabytes of memory, comparing the power consumption and behavior of the performance policy to the elastic policy. Figure 4 shows the results as a data warehousing workload is started and stopped on the system.
/SYS/VPS.
Figure 4. Performance Policy Versus Elastic Policy
Figure 4 shows that when the performance policy is enabled, this system consumes about 1160 watts of power when idle. When the elastic policy is enabled, about 200 watts, or 17% of the system power, can be saved. Figure 4 also shows that when the workload is run while the elastic policy is enabled, the system quickly adjusts the power states of the hardware components to provide full performance. No watts are saved with the elastic policy while the workload runs, because it takes advantage of the full performance provided.
This appendix provides details about how to access and configure each interface.
This section shows the policy and power cap ILOM properties and guidelines for setting the power cap properties.
/SP/powermgmt target.
-> show /SP/powermgmt
/SP/powermgmt
Targets:
budget
powerconf
Properties:
actual_power = 272|
permitted_power = 1045
allocated_power = 1045
available_power = 1200
threshold1 = 0
threshold2 = 0
policy = performance
Commands:
cd
set
show
The policy property shows the current power management policy.
/SP/powermgmt/budget target. This shows the power capping configuration.
-> show /SP/powermgmt/budget
/SP/powermgmt/budget
Targets:
Properties:
activation_state = disabled
status = ok
powerlimit = 500 (watts)
timelimit = 100
violation_actions = none
min_powerlimit = 332
pendingpowerlimit = 500 (watts)
pendingtimelimit = 10
pendingviolation_actions = none
commitpending = (Cannot show property)
Commands:
cd
set
show
The powerlimit, timelimit, and violation_action properties are the applied (active) values. These properties are applied to the system whenever activation_state is enabled. The properties starting with pending are values you can configure and then apply by setting commitpending to true. Once committed, the pending values become the applied policies.
The powerlimit determines the point at which the system enables power-savings features to cap power. The system caps power whenever the power consumption exceeds the powerlimit, and it removes the cap when power falls below the powerlimit. The powerlimit can be set to any value between the min_powerlimit property and the /SP/powermgmt allocated_power property. The best way to choose a meaningful powerlimit is to observe the power consumed by the system at idle and when running your workload, and then choose a value between idle power and close to or just above your normal workload power consumption.
The timelimit is expressed in seconds. This property's default value is tuned for best behavior. Unless there is specific requirement for it to be set lower, it is recommended to not change it. Setting the timelimit to zero (0) tells the system to proactively cap power to never exceed the powerlimit (this is called Hard Cap in the BUI). This value is not supported on SPARC systems and will cause the system to immediately generate a violation action as soon as the timelimit is activated.
When the system cannot reduce power to the powerlimit within the specified timelimit, the limit is violated. The system continues to apply the maximum power capping it can when it is in this state. The violation_actions property, which can be set to either none or hardpoweroff, determines the action the system takes when the limit is violated. Rarely will you want to set it to hardpoweroff, because that will power off the system if:
powerlimit is violated for longer than the timelimit.timelimit is zero (0) and the powerlimit is less than allocated_power.If the timelimit is zero, the system will not boot until the powerlimit is set at or above the allocated_power value.
-> set /SP/powermgmt/budget/pendingpowerlimit=<watts> -> set /SP/powermgmt/budget/pendingtimelimit=<seconds> # Recommended to leave at default (10) -> set /SP/powermgmt/budget/pendingviolation_actions=none # Recommended to leave at default (none) -> set /SP/powermgmt/budget/commitpending=true
The SNMP interface is based on the open source net-SNMP client. The snmpget and snmpset commands are standard Oracle Solaris commands, which are documented in the snmpget man page.
In order to use SNMP to manage ILOM power management settings, you need to configure ILOM to accept SNMP commands. See "Configuring SNMP Settings in Oracle ILOM" in the ILOM documentation for details.
SUN-HW-CTRL-MIB.mibs directory to put all .mib files in, for example, /.snmp/mibs.-> cd /SP/services/snmp/mibs -> set dump_uri= scp://<user>:<passwd>@<IP_snmp_host>/<full_path_to_mib_zip_file>
unzip <full_path_to_mib_zip_file>
snmp.conf config file. The recommendation is to put this file in the parent directory for the MIBs, for example, /.snmp.snmp.conf:mibs ALL mibdirs +/.snmp/mibs
snmpd man page or visit SNMP_CONFIG.ENV variable to SNMPCONFPATH="/.snmp/mibs".net-snmp.snmpget and snmpset need to be available from your SNMP client. Set $PATH to include the path to these SNMP commands, for example:PATH=$PATH:/usr/sfw/bin
.mib file, SUN-HW-CTRL-MIB.mib.The SNMP object names for policy and power capping and their allowed values are in this file. You will need to supply the object name and values to the snmpget and snmpset commands.
snmpset -v2c -cprivate <SP-IP-address> sunHwCtrlPowerMgmtBudgetPendingTimelimit.0 = 20000
snmpset -v2c -cprivate <SP-IP-address> sunHwCtrlPowerMgmtBudgetPendingTimelimitActions.0 = none
The IPMI interface is an open industry standard for management of server systems. See the following links for more information on IPMI.
Your server might already have ipmitool installed. Version 1.8.9 is required for setting the power management policy and power capping. Check your version using the following command:
/usr/sbin/impitool -V
You can download the most recent version at this download page.
If you had to download a newer version, follow the README to build and install the new ipmitool. You must provide a password, either when prompted by the tool or in a file provided to the tool. See the man page for details.
In order to use IPMI to manage ILOM power management settings, you need to configure ILOM to accept IPMI commands. See "Before You Begin — ILOM and IPMItool Requirements" in the ILOM documentation for details.
Here are some details on how to set the power management policy and a power cap:
ipmitool has a raw mode where hex bytes are specified for the command parameters. You can use raw mode to set a power cap. See "Manage ILOM Power Budget Interfaces (IPMItool)" in the ILOM documentation for details.sunoem cli command option is a simpler way to set a power cap or power management policy. The results of using the sunoem cli command are the same as the results from using the ILOM command line interface.ipmitool -I lan -H <SP-IP-address> -U root sunoem cli "set /SP/powermgmt/budget pendingtimelimit=20"
ipmitool -I lan -H <SP-IP-address> -U root sunoem cli "set /SP/powermgmt/budget pendingviolation_actions=none"
Device power management is managed via the CLI command pmconfig and the file /etc/power.conf.
/etc/power.conf to set the options for device PM.
system-threshold {<idle_threshold> | always-on}
idle_threshold is specified, its time value is used as the default idle threshold for all devices that do not have an overriding idle threshold set.always-on is specified, there is no default system idle threshold, and all devices that do not have an overriding idle threshold set will be left at full power.disable to turn off all device PM:
autopm {enable | disable}
device-thresholds {<idle_threshold | always-on}
power.conf(4) for more complex configuration.idle_threshold is specified, it overrides the system idle threshold for the specific device named.always-on is specified, this device is not power managed.Here is an example for keeping a boot disk always powered on and nonboot disks power managed:
device-thresholds /dev/dsk/c1t0d0s0 always-on # boot disk device-thresholds /pci@8,600000/scsi@4/ssd@w210000203700c3ee,0 15s # 15 seconds idle threshold device-thresholds /dev/dsk/c1t3d0s0 1m # 1 minute idle threshold
See the Oracle Enterprise Manager Ops Center Quick Start Guide for a discussion about how to configure Oracle Enterprise Manager Ops Center so it will discover and manage your system. Also see the full Oracle Enterprise Manager Ops Center documentation set.
Here are URLs for the resources referenced earlier in this document:
net-SNMP client: http://www.net-snmp.org/snmpget man page: http://docs.oracle.com/cd/E19253-01/816-5166/6mbb1kqi4/index.htmlSNMP_CONFIG: http://www.net-snmp.org/docs/man/snmp_config.htmlipmitool download page: http://ipmitool.sourceforge.net/| Revision 1.0, 02/27/2012 |
Follow us on Facebook, Twitter, or Oracle Blogs.