|By Darryl Gove, Compiler Performance Engineering, Sun Microsystems, September 29, 2006|
The SHADE library is an emulator for SPARC hardware. An emulator is software which 'pretends' to be a processor; so that the application runs on the emulator, which then runs on the real hardware. Emulators are often used in situations where the original hardware is no longer available, or where the application needs to run on new hardware that has a different processor to the current hardware, or in cases where the target hardware is different to the development platform.
The particular advantage of using SHADE is that it is possible to write an analysis tool which gathers information from the application being emulated. The SHADE library comes with some example analysis tools which track things like the number of instructions executed or the frequency that each type of instruction is executed. A more advanced analysis tool might look at cache misses that the application encounters for a given cache structure.
The SHADE tools (for Solaris OS releases 9 and 10 on SPARC processors) are available from the Cool Tools site. The file is downloaded as a compressed tar file, and when it is decompressed it will have a directory structure as follows:
The include files are located in the
inc directory, and the library files are located in the
lib directory. There are a number of examples located in the
eg directory, and documentation is provided in the
For the purposes of this article it is assumed that the shade code has been installed in
SHADE works by emulating a number of instructions from the target application, and recording each instruction in a buffer -- this set of recorded instructions is often referred to as a trace (of the run of the application). The SHADE library does all the work of emulating the application, once it has gathered a trace of instructions, it hands this trace over to the 'analyzer'. The analyzer has to be written in order to take the trace generated by the SHADE library and analyze it.
SHADE ships with the source code to several analyzers in the
eg subdirectory. The simplest analyzer provides an instruction count, a more complex one is a cache simulator. At the heart of a SHADE analyzer is a simple loop that iterates over records from a trace of the application. This loop takes each individual record and does what ever processing is necessary.
Internally, SHADE works by splitting the application into short snippets of code. These snippets do two things, first of all they do the same thing as the original application would have done. The second thing that they do is that they record, into the trace record, what happened as they executed the code.
There are a number of routines which have to be provided when writing a SHADE analyzer:
shadeuser_initialize function gets called before the program getting traced is loaded. This gives the analyzer the opportunity to select which instructions are to be traced, and also to set up any necessary data structures.
shadeuser_analyze function gets called repeatedly until the application being traced ends. Each time this function is called, it should ask shade to get the next set of traced instructions.
shadeuser_report routine is called after the application being traced has exited, or when a signal requesting a report is received.
shadeuser_terminate routine is called after the application being traced has terminated and allows the analyzer to perform any necessary clean up.
shadeuser_analusage routine needs to return usage information for the analyzer.
shadeuser_analversion character string needs to be defined to identify the analyzer.
The two functions that need to be provided that are not related to the tracing of the application are to set up the name of the analyzer by assigning a value to the string
shadeuser_analversion; and to write a routine (
shadeuser_analversion) which outputs usage information for the analyzer. In this case we will call the analyzer 'trace' and there are no user configurable parameters.
const char shadeuser_analversion = "trace ";
There are two steps to defining what data is to be collected.
The first step is to define the trace record type that is to be used to store the information. In this case we are using the SHADE_SPARCV9_TRACE record type. This type is defined in the shade_sparcv9.h header file and includes the tr_i record which holds a text representation of the instruction which will be used when the trace is printed out. There are other trace records defined in the shade_sparcv9.h header file which can hold information about the values held in the source and destination registers. There is a simpler trace record called SHADE_TRACE has space for trace information such as the PC of the instruction, or the Effective Address (EA) of a load or store instruction.
The second step is to set up SHADE so that it delivers just the information that is required by the analyzer.
shade_setopt determines how the SHADE library should handle forks and execs. With the
SHADE_OPT_FORKNOTIFY setting, the SHADE analyzer will be notified when the program being traced forks, and SHADE will also fork a new process to trace the process forked by the program being traced.
It is necessary to tell SHADE the size of the trace record using the
shade_trsize call. The variable
shade_trace_t is equivalent the the previously declared
The next step is to select which instructions are to be traced, and what is to be recorded about those instructions.
First of all the routine
shade_iset_newclass is called to select the set of instructions to be traced. This routine takes a list of the instruction types to trace. The list is terminated by the value -1. For the purposes of this program, it is necessary to select all instructions using the
SHADE_ICLASS_ANY specifier. However a different analyzer might chose to just trace loads and stores, or other subsets of all instructions.
shade_tset_new is used to select the data that will be recorded in the trace records. Again the routine takes a list of all the data to be recorded, and terminates the list with the value -1. For the purposes of tracing the executed instructions it is necessary to record the instruction that was executed (SHADE_SPARCV9_TRCTL_I) and the address of that instruction (SHADE_TRCTL_PC). Using other specifiers it is possible to record other data, such as the effective address of memory operations.
shade_trctl interface is used to inform SHADE of the decisions. This interface also requires a specifier to determine whether instructions should be traced when they are only when they are executed, only when they are annulled, or when they are either executed or annulled. The specifier
SHADE_TRI_ISSUED tells SHADE to trace instructions both when they are executed or annulled.
shadeuser_analyze gets called repeated by the SHADE library to fetch more instructions from the running application. This routine needs to call the
shade_run routine to get a fresh set of trace records, and then should process those records. The simplest format for this routine would be to just count the total number of instructions executed. An example of doing this is shown in the next figure.
The routine declares a local array of records which is filled by the call to the SHADE routine
share_run. This routine returns the number of records placed in the array as the result of the call.
It is actually necessary to have a bit more complexity than this to handle the forking of the traced application. When this happens the variable total will get copied into the new environment, so it is necessary to reset this variable in the new process, but not the parent, an example of doing this can be found in the example
icount.c which is shipped with SHADE.
The objective for the analyzer being written here is to print out the traced instructions. There are routines available in the SPIX library (which ships as part of SHADE) to convert the information recorded in the trace records into disassembly.
spix_sparc_iop returns opcode for the given assembly language instruction. This is fed into the routine
spix_sparc_dis32 together with assembly language instruction and its address. This routine writes the disassembly of the instruction into a buffer. If the instruction is successfully disassembled it is printed out.
The two routine
shadeuser_report is called either when all the traced applications have completed running, or when a signal is received by SHADE. This routine should produce a report of the results so far. In this particular case there is no report to produce as the output is produced during the run. Clean up after the application terminates is done in the
shadeuser_terminate routine. The return value from this routine is used as the return value from the SHADE analyzer.
The complete source code for the example appears at the end of this article. The build instructions are as follows:
$ cc -I/export/home/shade-32/inc -xO2 -DSPIX_ADDR=64 -o trace \
Here is the kind of output that this analyzer produces:
% trace -- ls
There is a weakness in this analyzer, as written. For performance reasons, the analyzer gets a block of records from the SHADE library. If the application fails, then SHADE may not return all the instructions up until the failing instruction. If it is important to return the trace up until the point of failure, then the analyzer can be modified to return just one instruction at a time.
Tracing the execution of a program is a relatively simple task, but it does demonstrate the fact that we can see the exact sequence of instructions being executed. It is also possible to retrieve information from the SHADE library about the effective address of a load or store instruction, or even what data is fetched from or stored to memory.
Here is the complete example:
#define SPIX_ADDR 64