How to Optimize the Parallel Performance of Applications

Oracle Solaris Studio 12.3

by Darryl Gove, December 2011


This article describes techniques for using Oracle Solaris Studio software to generate parallel applications, including automatic parallelization, support for OpenMP directives, and support for the POSIX threads API.




Introduction

Independent of the execution environment, as you seek to exploit parallelism, you must ensure that your code is correct and provides predictable results. This article describes how to use the Oracle Solaris Studio Thread Analyzer to analyze parallel code for correctness.

If you'd like to download software, participate in forums, and get access to other technical how-to goodies in addition to content like this, become an OTN member. No spam!

Most processors today—SPARC and x86 alike—are equipped with multiple cores and are capable of supporting multiple, simultaneous execution threads. Many systems also employ multiple multicore processors. Taking advantage of these multiple cores and exploiting multiple threads of execution is important if you want to derive as much value and performance as possible from your selected platforms.

The Oracle Solaris operating system provides an efficient and scalable threading model as well as a smart scheduler to deliver resources to applications through a variety of application development and deployment tools:

  • Virtualization systems, such as Oracle VM Server for x86 and Oracle VM Server for SPARC, let multiple operating system instances share a single physical system.
  • Threaded Oracle Solaris Containers allow multiple execution environments within a single operating system instance.
  • Threaded applications can take advantage of multiple cores on multicore processors and multisocket systems.

Using Automatic Parallelization

Much existing code was written without the assumption of parallel threads of execution. Oracle Solaris Studio compilers provide mechanisms to let an application run multiple threads without requiring you to specify how this is done. Loops, in particular, often represent opportunities where a previously repetitive serial operation can be divided into multiple, independent execution threads.

You can use the following compiler flags with Oracle Solaris Studio compilers to govern automatic parallelization behavior:

  • Use the -xautopar compiler flag to instruct the compiler to look for loops that can be safely parallelized in the code.
  • Use the -xloopinfo compiler flag to generate information about the loops the compiler has parallelized.
  • Use the -xreduction compiler flag find and parallelize reduction operations that take a range of values and output a single value, such as summing all the values in an array.

Also, set the OMP_NUM_THREADS environment variable at runtime to control the number of threads for code that is parallelized using automatic parallelization or the OpenMP compiler flags.

Using OpenMP Compiler Flags

Support for OpenMP 3.1 in Oracle Solaris Studio means that the compilers can look for directives (pragma) in the source code in order to build a parallel version of the application. Similar to automatic parallelization, the compiler does the work so you don't have to manage the threads.

OpenMP represents an incremental approach to parallelization with potentially fine granularity. OpenMP allows you to set directives around specific loops to be optimized through threading while leaving other loops untouched. The other distinct advantage of this approach is that you can derive a serial and a parallel version of the application from the exact same code base, which can be helpful for debugging.

Use the following OpenMP-related compiler flags with Oracle Solaris Studio:

  • Enable OpenMP by using the -xopenmp compiler flag. OpenMP directives are recognized only when this flag is used.
  • Use the -xvpara compiler flag to report potential parallelization issues.
  • Use the -xloopinfo compiler flag to direct the compiler to provide the details of which loops were parallelized.

Also, set the OMP_NUM_THREADS environment variable at runtime to control the number of desired threads for code that is parallelized using OpenMP compiler flags or automatic parallelization. The default number of threads is two.

Support for POSIX Threads

By programming to the POSIX threads API, you can have complete control over thread usage in your applications. The POSIX threads (or Pthreads) specification represents a POSIX standard for a thread API that defines a set of C programming language types, functions, and constants. Oracle Solaris Studio compilers support the POSIX threads programming model.

For information about the Pthreads API, see "pthread.h(3HEAD)" in the man pages section 3: Library Interfaces and Headers Oracle Solaris 11 reference manual.

Using the Thread Analyzer

The Oracle Solaris Studio Thread Analyzer is designed to help ensure multithreaded application correctness. Specifically, the Thread Analyzer can help detect, analyze, and debug the following situations, which can arise in multithreaded applications:

  • Data races can cause incorrect or unpredictable results, and can occur arbitrarily far way from where a problem seems to occur. Data races occur under the following conditions:
    • Two or more threads in a single process concurrently access the same memory location.
    • At least one of the threads is accessing the memory location for writing.
    • The threads are not using any exclusive locks to control their accesses to that memory.
  • Deadlock conditions occur when one thread is blocked waiting on a resource held by a second thread, while the second thread is blocked waiting on a resource held by the first (or an equivalent situation with more threads involved).

To detect data race and deadlock conditions, do the following to compile the code, execute it under the control of the collect -r all command, and load the code into the Thread Analyzer.

  1. Compile the application with the -xinstrument=datarace compiler flag. It is recommended that the -g flag also be set and that no optimization level be used to help ensure that the line numbers and call-stacks information is returned correctly.

    Or, if you have existing binaries that have been compiled with the Oracle Solaris Studio 12.3 compiler, they can be instrumented using the command discover -i datarace -o a.out.instrumented a.out.

  2. Use the collect -r all option to run the resulting application code and create a data race detection and deadlock detection experiment during the execution process. The resulting experiment will be named tha.1.er.

    Alternately, use the following command to run the application code and create only a data race detection experiment:

    % collect -r race <app> <params>
    

    Or use the following command to run the application code and create only a deadlock detection experiment:

    % collect -r deadlock <app> <params>
    
  3. Finally, use the command tha tha.1.er to load the results of the experiment into the Thread Analyzer to identify the data race and deadlock conditions. Figure 1 shows Races tab of the Thread Analyzer.
Figure 1

Figure 1. Data race conditions can be identified through the Thread Analyzer.

The Thread Analyzer can also help identify individual lines of source code that are associated with race conditions (Figure 2).

Figure 2

Figure 2. Individual lines of source code associated with data race conditions can be identified using the Thread Analyzer.

For More Information

For an exhaustive description of compiler flags and options, see the complete Oracle Solaris Studio product documentation at http://oracle.com/technetwork/server-storage/solarisstudio/documentation/oss123-docs-1357739.html.

Also see:

Revision 1.0, 12/13/2011

Follow us on Facebook, Twitter, or Oracle Blogs.