Monitoring Effective CPU Speed Using the clockrate.py Script
A Python script that can be used to monitor the effective speed of CPUs.
by Julia Harper
Published July 2013
This example script and its purpose of this script is discussed in the article How to Save Power on SPARC T5 and SPARC M5 Servers.
The clockrate.py Python script shown in Listing 1, which was written by an Oracle performance engineer, monitors the effective speed of the CPUs by comparing the tick rate against both the wall clock time and an initial per-CPU clock rate collected via the following kstat command:
kstat -p cpu_info:::current_clock_Hz
Note: The following script might require additional customization and testing by a system administrator. Oracle provides no support or guarantees.
#!/usr/bin/env python
import os
import sys
import time
import errno
import ctypes
import exceptions
import subprocess
__doc__ = '''
measure a SPARC CPU's clock rate
Usage: clockrate.py [options]
Options:
-h, --help show this help message and exit
--chip use the first active CPU found for each chip (default)
--core use the first active CPU found for each core
--allcpus report clock rate for all active CPUs
--cpu=CPUS report clock rate for specified CPUs, regardless of state
-i INTERVAL clock rate sample interval in seconds (default=1.0)
-c SAMPLECOUNT number of samples to collect (default=-1)
-t THRESHOLD report clock rates below threshold (default=100)
This script will cycle through the CPUs indicated and measure the clock
rate in MHz for each CPU. For --chip and --core, a representative CPU
is chosen by locating the first CPU within the chip or core grouping
this either on-line or in the no-intr state. If --allcpus is
specified, all CPUs whose state is on-line or no-intr will be
reported.
Individual CPUs can be specified by identifier using the '--cpu=CPUS'
option. CPUS can be specified using the following methods:
CPU Id List:
--cpu=0,1,4,10,24,4
CPU Id Range:
--cpu=12-30
All CPUs:
--cpu=all
Note: ranges and lists of CPU identifiers can be mixed together, e.g.:
--cpu=0-5,10,11,30-35,7
This is equivalent to:
--cpu=0,1,2,3,4,5,7,10,11,30,31,32,33,34,35
The default options when nothing is specified on the command line are:
clockrate.py --chip -i 1.0 -c -1 -t 100
Warning: Given the single threaded nature of interpreted languages,
python specifically, this script will induce a load on the
host which may be undesirable for long-term observation.
HISTORY
21 Jun 2013 - Version 1.6
- fixed bug in sample_loop
reference sampleInterval value before assignment
20 Jun 2013 - Version 1.5
- changed --cpu command line option parsing to allow embedded ranges
- added --version command line switch
- added cpu id list de-duplicating and sorting to CPU.query function,
- added optional argument to CPU.query: strictChecking
- added MissingCPU exception, thrown by CPU.query when strictChecking
is True and the number of requested cpuids is greater than the
number of found CPUs.
19 Jun 2013 - Version 1.4
- fixed command line specification of list of CPUs to monitor
- changed --cpu to --cpu=CPUS,
CPUS is a comma-delimited list of CPU ids or the string "all"
- added --allcpu option, equivalent to --cpu=all
- added parse_cmdline function
function uses deprecated optparse module
18 Jun 2013 - Version 1.3
- fixed state filtering in CPU chip & core class methods
- changed command-line options and handling (again)
22 Apr 2013 - Version 1.2
- added chip/core filter to command-line processing
- added chip/core class methods to CPU class
- added state filtering to CPU.query classmethod
- added sample_loop function
- added timestamp function
- added chip and core identifiers to output
- added poweradm detection
- removed sampling loop from __main__
- changed command-line options and handling
19 Apr 2013 - Version 1.1
- added CError exception, subclass of OSError
- added LIBC class
- added CPU class, child class of LIBC
- changed SPARCRegister now a child class of LIBC
- changed all exceptions raised from calling libc functions to CError
- removed _libc global data
- removed reported_clockrates function
- removed bind function
- removed measure_clockrate function
18 Apr 2013 - Version 1.0
- initial release
'''
Author = "erik.oshaughnessy@oracle.com"
Version = "1.6"
##
## Constants lifted from various C header files, makes calling the
## C foreign functions a little more friendly.
##
P_PID = 0 # sys/procset.h
P_MYID = -1 # sys/procset.h
PS_NONE = -1 # sys/pset.h
PS_QUERY = -2 # sys/pset.h
PBIND_NONE = -1 # sys/processor.h
PBIND_QUERY = -2 # sys/processor.h
PROT_RWX = 0x1 | 0x2 | 0x4 # sys/mman.h
SC_NPROCESSORS_ONLN = ctypes.c_int32(15) # unistd.h
##
## Classes
##
class CError(exceptions.OSError):
'''
Raised when foreign functions called via ctypes return errors.
Will retrieve errno from ctypes via ctypes.get_errno() and then
determine the proper errno string via os.strerror(). The optional
msg argument can be any string or value the user supplies.
'''
def __init__(self, msg=None):
'''
'''
err = ctypes.get_errno()
super(CError, self).__init__(err, os.strerror(err), msg)
class MissingCPU(Exception):
'''
Raised when the found set of CPUs does not match the requested
set of CPU identifiers.
'''
def __init__(self,missingIDs,foundIDs=None):
self.missingIDs = ','.join([str(x) for x in missingIDs])
self.foundIDs = ','.join([str(x) for x in foundIDs])
def __str__(self):
return "CPUs were not found: %s" % self.missingIDs
class LIBC(object):
'''
Provides a foundation object to minimize the number of duplicated
libc handles.
'''
_libc = ctypes.CDLL('libc.so', use_errno=True)
_libc.gethrtime.restype = ctypes.c_uint64 # uint64_t gethrtime(void)
_libc.valloc.restype = ctypes.c_void_p # void *valloc(size_t)
class SPARCRegister(LIBC):
'''
Provides Pythonic access to a SPARC CPU register. The class is
initialized with a ctypes.c_uint32() array of valid SPARC machine
instructions that comprise a function whose call signature in C
would look like:
void func(uint64_t *)
The instructions for the function are copied into a page-aligned
buffer which has had it's page permissions modified to
read/write/execute using mprotect(2) from libc.
Finally, a python-callable foreign function is constructed from a
pointer to the page-aligned buffer.
The user-provided instructions are expected to perform whatever
actions necessary to retrieve the desired SPARC register contents
and store the contents at the address provided in the function
argument. Instructions must be position-independent.
Yes, it's magic.
'''
def __init__(self, getterInstructions):
'''
getterInstructions is an array of ctypes.c_uint32's initialized
with SPARC machine instructions.
e.g.
G0_ASM = (ctypes.c_uint32 * 4)(0x81c3e008, # retl
0xc0720000, # stx %g0, [%o0]
0x00000000, # illtrap 0
0x00000000) # illtrap 0
GlobalZero = SPARCRegister(G0_ASM)
if GlobalZero.value != 0:
raise Exception("%g0 != 0")
'''
bufsiz = ctypes.sizeof(getterInstructions)
self._ibuf = self._libc.valloc(bufsiz)
if self._ibuf is None:
raise CError("valloc")
if self._libc.mprotect(self._ibuf, bufsiz, PROT_RWX) != 0:
raise CError("mprotect")
ctypes.memmove(self._ibuf, getterInstructions, bufsiz)
prototype = ctypes.CFUNCTYPE(None, ctypes.POINTER(ctypes.c_uint64))
fptr = ctypes.cast(self._ibuf, prototype)
self._getter = prototype(fptr)
def __del__(self):
'''
Free the memory allocated via libc's valloc(3C).
'''
self._libc.free(self._ibuf)
@property
def value(self):
'''
Executes a foreign function call to fetch a register value.
'''
v = ctypes.c_uint64()
self._getter(ctypes.byref(v))
return v.value
class Tick(SPARCRegister):
'''
%tick is a SPARCv9 per-CPU 63-bit register that counts CPU clock
cycles. Values obtained from the %tick register on one CPU cannot
be compared against values obtained from another CPU's %tick
register.
This class provides access to the %tick register via the 'value'
property. When the value property is accessed, %tick is fetched
for the CPU that the python instance is currently executing on.
It is highly recommended that callers bind the python process to a
target CPU before attempting to make repeated samples of the %tick
register.
e.g.
tick = Tick()
if ctypes.CDLL("libc.so").processor_bind(0, -1, TARGETCPU, None) == 0:
while True:
print tick.value
'''
## The following array, _ASM, contains SPARC machine instructions
## which implement a leaf-function that will read the %tick register
## of the CPU the process is executing on and then writes the 64-bit
## value obtained to the address supplied by the caller. The illtrap
## provides insurance that the function returns to the caller, otherwise
## the process will abend when the illtrap instruction is executed with
## a message similar to:
##
## Illegal instruction (core dumped)
##
_ASM = (ctypes.c_uint32 * 4)(0x93410000, # rd %tick, %o1
0x81c3e008, # retl
0xd2720000, # stx %o1, [%o0]
0) # illtrap 0
def __init__(self):
super(Tick, self).__init__(self._ASM)
@property
def value(self):
'''
Returns the %tick register contents for the CPU that the
calling process is executing on currently.
'''
##
## I didn't need to over-ride the value method here, but
##
return super(Tick, self).value
class CPU(LIBC):
'''
Provides access to per-CPU data.
'''
_tick = Tick()
@classmethod
def query(cls, cpuids=None, states=None,strictChecking=False):
'''
Returns a list of CPU objects initialized from the
output of '/usr/bin/kstat -p cpu_info'. The cpuids
argument can be used to filter the list of CPUs returned
( if it is not None and is a list of integer CPU ids ).
'''
if cpuids and len(cpuids) == 0:
cpuids = None
if cpuids is not None:
cpuids = sorted(set(cpuids)) # de-dup, sort cpuids list
p = subprocess.Popen(['/usr/bin/kstat', '-p', 'cpu_info'],
stdout=subprocess.PIPE)
cpus = {}
for line in p.stdout:
(keys, sep, val) = line.strip().partition('\t')
(module, instance, name, key) = keys.split(':')
try:
if int(instance) not in cpuids:
continue
except TypeError:
pass # cpuids is None, find all CPUs
cpu = cpus.setdefault(instance, CPU(instance))
# First try to treat val as an integer
# Next treat val as float
# Finally, treat val as string
try:
setattr(cpu, key, int(val))
except ValueError:
try:
setattr(cpu, key, float(val))
except ValueError:
setattr(cpu, key, val)
cpus[instance] = cpu
if states:
for cpu in cpus.values():
if cpu.state in states:
continue
cpus[cpu.cpuid] = None
del(cpu)
if cpuids and strictChecking and len(cpus) != len(cpuids):
askedfor = set(cpuids)
found = set([x.cpuid for x in cpus.values()])
common = askedfor.intersection(found)
notfound = askedfor.difference(common)
if len(notfound):
raise MissingCPU(notfound,common)
return sorted(cpus.values())
@classmethod
def chips(cls, cpus=None, states=None):
'''
Sorts CPUs into buckets organized by chip_id. Returns a dictionary
of CPU arrays structured thusly:
{ chip_id: [CPU(), ...], ... }
'''
chips = {}
if cpus is None:
cpus = cls.query(states=states)
else:
if states:
for cpu in cpus.values():
if cpu.state in states:
continue
cpus[cpu.cpuid] = None
del(cpu)
for cpu in cpus:
l = chips.setdefault(cpu.chip_id, [])
l.append(cpu)
chips[cpu.chip_id] = l
return chips
@classmethod
def cores(cls, cpus=None, states=None):
'''
Sorts CPUs into buckets organized by core_id. Returns a dictionary
of CPU arrays structured thusly:
{ core_id: [CPU(), ...], ... }
'''
cores = {}
if cpus is None:
cpus = cls.query(states=states)
else:
if states:
for cpu in cpus.values():
if cpu.state in states:
continue
cpus[cpu.cpuid] = None
del(cpu)
for cpu in cpus:
l = cores.setdefault(cpu.core_id, [])
l.append(cpu)
cores[cpu.core_id] = l
return cores
def __init__(self, cpuid):
self.cpuid = int(cpuid)
self._psetid = PS_NONE
def __repr__(self):
return 'CPU(id=%s,chip=%s,core=%s' % (self.cpuid,self.chip_id,self.core_id)
def __str__(self):
'''
CPU pretty printing
'''
s = []
s.append('CPU: %s' % (self.cpuid))
attrnames = self.__dict__.keys()
attrnames.remove('cpuid')
for k in sorted(attrnames):
s.append('\t%s = %s' % (k, self.__dict__[k]))
return '\n'.join(s)
def __cmp__(self, other):
'''
Compares CPUs by their cpuid attribute. This has the side-effect
of ordering CPUs by chip and core identifiers as well.
'''
return cmp(self.cpuid, other.cpuid)
@property
def psetid(self):
'''
The processor set id (psetid) that this CPU belongs to.
'''
psetid = ctypes.c_int(PS_NONE)
r = self._libc.pset_assign(PS_QUERY,
self.cpuid,
ctypes.byref(psetid))
if r != 0:
raise CError("pset query failed for cpu %d" % (self.cpuid))
return psetid.value
@property
def active(self):
'''
Returns True if the calling process is executing on this CPU,
False otherwise. The assumption is the calling process is
bound to this CPU. Using getcpuid() isn't entirely
fool-proof, but does not require a system call/trap like
pset_assign(PS_QUERY) would.
'''
return self.cpuid == self._libc.getcpuid()
@property
def reported_clockrate(self):
'''
The clockrate in megahertz (MHz) reported to Solaris by firmware
and recorded in the cpu_info:: kstats.
'''
try:
return self._clockrate
except AttributeError:
self._clockrate = self.current_clock_Hz / 1e6
return self._clockrate
def measured_clockrate(self, sampleIntervalInSeconds=1.0):
'''
The clockrate in megahertz (MHz) as measured using the %tick register.
'''
hrt0 = self._libc.gethrtime()
t0 = self._tick.value
time.sleep(sampleIntervalInSeconds)
t1 = self._tick.value
hrt1 = self._libc.gethrtime()
deltaTicks = t1 - t0
deltaSeconds = (hrt1 - hrt0) / 1e9
return (deltaTicks / deltaSeconds) / 1e6
def bind_process(self, curPset=PS_NONE):
'''
Binds the calling process to this CPU's processor set and then
to the processor. Returns a tuple of the previous
bindings. If the target CPU belongs to a processor set (pset),
the calling process is first bound to the target pset and then
bound to the CPU.
'''
prevCPU = ctypes.c_int32(PBIND_NONE)
prevPset = ctypes.c_int32(PS_NONE)
r = self._libc.pset_bind(self.psetid,
P_PID,
P_MYID,
ctypes.byref(prevPset))
if r is not 0:
raise CError('failed to bind to pset %d' % self.psetid)
r = self._libc.processor_bind(P_PID,
P_MYID,
self.cpuid,
ctypes.byref(prevCPU))
if r is not 0:
raise CError('failed to bind process to CPU %d' % self.cpuid)
return (prevCPU.value, prevPset.value)
def unbind_process(self):
'''
Undoes the processor binding and then the processor set binding.
'''
r = self._libc.processor_bind(P_PID, P_MYID, PBIND_NONE, None)
if r is not 0:
raise CError('failed to unbind process from CPU %d' % self.cpuid)
r = self._libc.pset_bind(PS_NONE, P_PID, P_MYID, None)
if r is not 0:
raise CError('failed to unbind process from pset %d' % self.psetid)
def poweradm(disable=False):
'''
Determine if the Solaris poweradm(1m) facility is active by parsing
the output of poweradm.
'''
p = subprocess.Popen(['/usr/sbin/poweradm', 'show'],
stdout=subprocess.PIPE)
for l in p.stdout:
if l.find('enabled') != -1:
return True
return False
def timestamp(epochTime=None):
'''
Returns an epoch-precision 24-hour clock timestamp with the format:
YYYY-MM-DD hh:mm:ss
'''
if epochTime is None:
epochTime = time.time()
return time.strftime('%Y-%m-%d %H:%M:%S',
time.localtime(epochTime))
def sample_loop(cpus,
interval=1.0,
sampleCount=-1,
reportThreshold=100.0):
'''
Samples the clock rate of the list of specified cpus. The
sampling is accomplished via round-robin scheduling and is
dictated by the interval argument. The sampleCount argument
allows the caller to specify how many samples to collect. The
default value of -1 will allow sampling to continue until a
keyboard interrupt is encountered or the process is killed. The
reportThreshold argument directs the function to only report clock
rates when they are at or less than given threshold.
'''
if interval > 1:
reportInterval = interval - 1.0
sampleInterval = 1.0
else:
reportInterval = 0.0
sampleInterval = interval
# reportInterval == how long to wait between sweeps
# sampleInterval == how long to sample each CPU during a sweep
sampleIntervalInSeconds = sampleInterval / len(cpus)
header = '%s %4s %4s %4s %4s %6s %6s %6s t=%3.0f' % (timestamp(),
'PSET',
'CHIP',
'CORE',
'CPU',
'Mhz(r)',
'Mhz(m)',
'Mhz(%)',
reportThreshold)
fmt = '%s %4d %4d %4d %4d %6.0f %6.0f %5.0f%%'
doBind = len(cpus) > 1
try:
if len(cpus) == 1:
cpus[0].bind_process()
while sampleCount != 0:
sampleCount -= 1
print header
for cpu in cpus:
if doBind:
cpu.bind_process()
mhz_r = cpu.reported_clockrate
mhz_m = cpu.measured_clockrate(sampleIntervalInSeconds)
ratio = (mhz_m / mhz_r) * 100.0
if int(ratio) <= reportThreshold:
print fmt % (timestamp(),
cpu.psetid,
cpu.chip_id,
cpu.core_id,
cpu.cpuid,
mhz_r,
mhz_m,
ratio)
cpu.unbind_process()
if reportInterval and sampleCount != 0:
time.sleep(reportInterval)
except KeyboardInterrupt:
print '\n'
##
## Command-line option parsing follows
##
ACTIVE_CPUS = ['on-line', 'no-intr'] # CPU states
def get_chips(option, opt_str, value, parser):
'''
optparse callback function which populates the specified option destination
key with a list of CPUs representing each chip found.
'''
setattr(parser.values,option.dest,
sorted([x[0] for x in CPU.chips(states=ACTIVE_CPUS).values()]))
def get_cores(option, opt_str, value, parser):
'''
optparse callback function which populates the specified option destination
key with a list of CPUs representing each core found.
'''
setattr(parser.values,option.dest,
sorted([x[0] for x in CPU.cores(states=ACTIVE_CPUS).values()]))
def get_cpus(option, opt_str, value, parser):
'''
optparse callback function which populates the specified option destination
key with a list of CPUs whose composition is dependent on the values given
on the command line:
[ 'all', comma delimited list, range, mix of lists and ranges ]
'''
if value is None or value.lower() in ['all']:
cpus = CPU.query(states=ACTIVE_CPUS)
else:
ids = []
for things in value.split(','):
if '-' in things:
(start,stop) = things.split('-')
ids.extend([x for x in range(int(start),int(stop)+1)])
else:
ids.append(int(things.strip()))
try:
cpus = CPU.query(ids,strictChecking=True)
except MissingCPU, e:
print '%s: Error.' % parser.get_prog_name(), e
exit(-1)
setattr(parser.values,option.dest,cpus)
def print_version(option, opt_str, value, parser):
'''
Display the current contents of Version and exit.
'''
print '%s version %s' % (parser.get_prog_name(),Version)
print 'See "pydoc %s" for more information.' % (parser.get_prog_name())
exit(0)
def parse_cmdline():
'''
Returns a dictionary of options specified on the command line.
Uses optparse which is a deprecated module but present in
S11uX. This can be replaced with argparse at a later date.
'''
import optparse
parser = optparse.OptionParser(prog='clockrate')
parser.add_option('-v','--version',
action='callback',callback=print_version,
help='The current version of this script (version %s)' % Version )
parser.add_option('--chip',dest='cpus',
action='callback',callback=get_chips,
help='use the first active CPU found for each chip (default)')
parser.add_option('--core',dest='cpus',
action='callback',callback=get_cores,
help='use the first active CPU found for each core')
parser.add_option('--allcpus',dest='cpus',
action='callback',callback=get_cpus,
help='report clock rate for all active CPUs')
parser.add_option('--cpu',dest='cpus',type='string',
action='callback',callback=get_cpus,
help='report clock rate for specified CPUs')
parser.add_option('-i',type='float',dest='interval',default=1.0,
help='clock rate sample interval in seconds (default=1.0)')
parser.add_option('-c',type='int',dest='sampleCount',default=-1,
help='number of samples to collect (default=-1)')
parser.add_option('-t',type='int',dest='threshold',default=100,
help='report clock rates below threshold (default=100)')
parser.epilog = 'See "pydoc %s" for further explanation of these options.'
parser.epilog = parser.epilog % parser.get_prog_name()
(options,args) = parser.parse_args()
return options
##
## main
##
if __name__ == '__main__':
if poweradm():
print "WARNING - poweradm(1m) is currently ENABLED."
print " verify with 'poweradm show'"
print " disable with 'poweradm set administrative-authority=none'"
print "re-enable with 'poweradm set administrative-authority=default'"
print "re-enable with 'poweradm set administrative-authority=platform'"
print "re-enable with 'poweradm set administrative-authority=smf'"
options = parse_cmdline()
if options.cpus is None:
options.cpus = sorted([x[0] for x in CPU.chips(states=ACTIVE_CPUS).values()])
sample_loop(options.cpus,
options.interval,
options.sampleCount,
options.threshold)
Listing 1