The Sun Studio Binary Code Optimizer

   
By Sheldon Lobo, Compiler Technology And Performance Engineering, Sun Microsystems, November 30, 2005  

Improving binary performance is a frequent request from customers. These requests usually come from end customers of Sun systems or even performance, benchmarking and production groups of large independent software vendors (ISVs). The common theme is the non-availability of the original source code. Without a re-compile, it is usually a hard, time-consuming and costly endeavor to meaningfully improve binary performance. Sometimes system tweaks to a non-optimized system will do the trick, but often a complete system upgrade is necessary.

The Binary Optimizer is a tool that improves binary performance, without the need for system changes or upgrades. This tool modifies the binary by updating the binary instructions to generate more optimal code. Capability exists to instrument the binary for profile collection. When data from such a profile training run is fed back to the Binary Optimizer, significant performance improvements may be achieved. This is especially true for binaries that were not built with high levels of optimizations, or were built without profile data, or even built with profile data that is not representative of the end customers unique workload.

What is the Binary Optimizer?





A Quick-Start Guide


  • The binary must be compiled and linked with optimizations ( -O or -xO n) and a special compiler option -xbinopt=prepare.
  • The resulting binary should be instrumented for profile collection using the -binstrument flag.
  • Run the application with one or more representative workloads.
  • Optimize the binary with this profile data via the -buse flag.

%  
               cc -O -xbinopt=prepare -o myapp *.c                 
% binopt -binstrument myapp
% myapp < input_data
% binopt -buse myapp

Why Use the Binary Optimizer?

The global optimizations performed by the Binary Optimizer usually show greater performance improvements on large applications. We see the following potential users of binary optimization technology:

End Users on SPARC Platforms:

Experienced users of Sun systems (for example, database administrators) are often looking for ways to improve binary performance. For such users, ready to go that extra mile to tune binaries they receive from software vendors, the Binary Optimizer is an ideal tool. And, it is available for free as part of the Sun Studio toolkit.

For the software vendor, the necessary step to follow is:

  • The vendor ships a binary, app, built with the -xbinopt=prepareflag
    %  
                       cc -O -xbinopt=prepare -o app
                    
For the end user, the following steps will optimize the binary for their specific workload:
  • Instrument the binary, using binopt.
  • Run the instrumented binary on a representative workload.
  • Use binopt again to optimize the binary with the collected runtime data.

For example, the end user optimizes app using binopt:

%  
                 binopt -binstrument -bdata=datafile -o app.instr app
                

%  
                 app.instr < input_data
                

%  
                 binopt -buse -bdata=datafile -o app.opt app
              

Software Vendors:

The Binary Optimizer performs optimizations that are not normally performed by the compiler. Hence by including the Binary Optimizer in the build process, a better performing production binary may be obtained.

The steps necessary to create a production binary with the Binary Optimizer are:

  • Compile the application with the -xbinopt=prepare flag.
  • Instrument the resulting binary for profile collection using the -binstrument flag of binopt.
  • Run the application with one or more representative workloads.
  • Optimize the binary using the collected profile data and the binopt -buse flag.

Example:

%  
                 cc -xO4 -xbinopt=prepare -o app *.c
                

%  
                 binopt -binstrument -bdata=datafile -o app.instr app
                

%  
                 app.instr < input_data
                

%  
                 binopt -buse -bdata=datafile -o app.opt app
              

It is important to note that if you are already using profile feedback ( -xprofile=collect|use compiler flags) to build the application, it may easier to use the -xlinkopt compiler flag in the build, rather than using the Binary Optimizer, to obtain similar optimizations.

Performance With binopt

We see significant performance improvements on large applications when the Binary Optimizer is used. This is especially true for applications that are not built with profile feedback or are built with feedback that does not truly represent the end customer's workload. In these situations, a 10% or more performance gain is not unheard of.

The user must also be aware that using the Binary Optimizer causes an increase in size of the binary. This is due to the fact that optimized code is cached in a new segment in the binary. On large applications, an increase in size of up to 1.8x is seen.

The Binary Optimizer runtime is usually a fraction of the build time of the entire binary. For large applications, where the build time is usually several hours, binopt runtime can be measured in minutes. For example, building a well known database application from source takes over 5 hours. Performing binary optimizations on the resultant binary takes 8 minutes.

Optimization Levels

The -blevel=1 optimization level is the default level of optimization for binopt(1). At this level, code ordering and control flow optimizations are performed. While ordering code, functions may be split to optimize I-cache performance.

At the -blevel=2 optimization level, data-flow information is constructed and more aggressive optimizations are performed. These include inlining, address simplification and load instruction optimizations. Usually better performance is derived from using this higher level of optimization. The tradeoff is an increased binopt runtime.

At -blevel=0, no optimizations are performed.

Profile Instrumentation

Collecting and using a profile of the execution characteristics of a binary is crucial to making effective use of the Binary Optimizer. Instrumenting a binary and executing a training run to collect the data is relatively easy when using this tool. A single command line instruments the binary. The instrumented binary may be freely copied to a potentially different run machine – it is self contained, and no dependencies need to be maintained. After the training run is complete, a file containing the profile data is created. Accumulation of profile data from multiple training runs is another useful feature – the user just needs to specify the pre-existing data file on the -binstrument command line.

When collecting profile for applications which contain one or more executables and/or shared objects, all binaries for which optimizations are planned need to be instrumented. In the example below, the executable app has a dependency on the shared object x.so. As demonstrated, both binaries need to be instrumented and optimized separately.

%  
               binopt -binstrument -bdata=app.data -o app.instr app
%  
               binopt -binstrument -bdata=x.so.data -o x.so x.so
%  
               app.instr < input_data
%  
               binopt -buse -bdata=app.data -o app.opt app
%  
               binopt -buse -bdata=x.so.data -o x.so x.so
            

 

Debugging

The Binary Optimizer maintains full compatibility with tools that statically or dynamically examine a binary ( analyzer(1), dbx(1), pstack(1), etc.). The symbol tables are updated to reflect all transformations. Mangled symbol names are often assigned (see the example below), which are automatically de-mangled when displayed by the Studio tools.

If the prepared binary was built for debugging (with the -g compiler option), debugging information is automatically propagated to the binary, instead of leaving it in the object file by default. When such a binary is optimized by binopt, the available debugging information is updated to reflect the transformations performed.

Example

Here is a small example to help understand how the Binary Optimizer transforms the binary.

In the code below, there are three functions main(), add() and sub(). The frequently executed parts of the code are denoted by the red rectangles, while the less frequently executed code is colored green. The layout of the optimized binary is shown on the right hand side. Here are some of the characteristics of the new binary:

  • The optimized code is placed in a new segment of the binary (named “Optimized code” in Figure 1 below.
  • Functions may be split while laying out code (function main() is split, the hot fragment which is not the entry point is given the mangled name _$o1cexhO0.main()).
  • The original functions are given new mangled names ( _$r1.main(), _$b1.add(), _$b1.sub()).

Figure 1: Typical code layout from the binary optimizer

 

Additional Details

-xbinopt=prepare Considerations:

The -xbinopt=prepare compiler flag, when used to build a binary, adds certain information to the binary that allows it to be transformed by the Binary Optimizer. This information describes the location of the executable code, points out control flow structures like function boundaries and switch tables, and provides data flow information about the code. This data is stored in a new ELF section named .annotate. This additional information in the binary results in a 5% increase in size, on average. There is no noticeable build time impact when this flag is used.

In addition, prepared binaries built for debugging (with the -g compiler option), have an additional size increase due to the presence of debugging information. On average we see a 50% increase in binary file size when compared to a debuggable binary built without the -xbinopt=prepare option.

Profile Instrumentation ( -binstrument) Considerations:

While doing a training run to collect binary profile information, the user will notice a slowdown in application performance. This is to be expected since there is an overhead associated with recording the execution count profile of the executable code. Usually we see a 2.5 to 3x slowdown in application performance.

There is also an increase in binary file size associated with adding instrumentation code. We usually see a 2.5x increase in binary size due to profile instrumentation.

-bfinal Usage:

As mentioned above, a binary that may be optimized by the Binary Optimizer must be prepared using the -xbinopt=prepare compiler flag. This results in additional information being placed in an ELF section in the binary. When creating a final binary that is to be deployed on the run systems, and on which no future optimizations are planned, the -bfinal option may be used to strip the -xbinopt=prepare information from the resultant binary. This flag may be used to prevent users of the binary from making any further modifications to it. For example:

%  
               cc -xO4 -xbinopt=prepare -o app *.c
              

%  
               binopt -binstrument -bdata=datafile -o app.instr app
              

%  
               app.instr < input_data
              

%  
               binopt -buse -bdata=datafile -bfinal -o app.opt app
            

Handling Modules Not Built With -xbinopt=prepare

If the binary contains a combination of legacy code and newly created code (built with -xbinopt=prepare), the Binary Optimizer may still be gainfully employed. The Binary Optimizer optimizes only that code that was built with the -xbinopt=prepare compiler option, leaving the legacy code untouched.

Conflicts

The Binary Optimizer has some restrictions.

It will not optimize binaries built as follows:

  • With the -xprofile=collect compiler option.
  • With the -xlinkopt compiler option.
  • With the -s compiler option or stripped using the strip(1) tool.
  • Binopt will not optimize that part of the code compiled with the -xF compiler option.
  • Binopt will not optimize the template code portion of a C++ application.

The Binary Optimizer also does not optimize those parts of the executable code in a binary that were derived from assembly language files. As mentioned earlier, code derived from object files compiled without the -xbinopt=prepare flag are not optimized either. On the other hand, the presence of assembly code or legacy object code in a binary does not prevent binopt(1) from optimizing the remainder of the binary.

Future Updates

The Sun Studio 11 release includes the Binary Optimizer, binopt. Updates in future releases may include new functionality and more optimizations. Also, several of the restrictions and conflicts will be addressed. Stay tuned!

 
About the Author
Sheldon Lobo is a staff engineer in the SPARC compiler backend team. He works primarily on developing Sun's object and binary file optimization and analysis tools.
 

(Last updated December 1, 2005)
 
Rate and Review
Tell us what you think of the content of this page.
Excellent   Good   Fair   Poor  
Comments:
Your email address (no reply is possible without an address):
Sun Privacy Policy

Note: We are not able to respond to all submitted comments.
Left Curve
System Administrator
Right Curve
Left Curve
Developer and ISVs
Right Curve
Left Curve
Related Products
Right Curve
solaris-online-forum-banner