OpenMP Support in Sun Studio Compilers and Tools
| By Nawal Copty, Scalable Systems Group, Sun Microsystems, December 13, 2005
|
|
|
OpenMP is a specification for a set of compiler directives, library routines, and environment variables that can be used to express multi-threaded shared-memory parallelism in C, C++, and Fortran programs. OpenMP is fast becoming the standard paradigm for parallelizing applications. With a relatively small amount of coding effort, programmers can obtain scalable performance for their applications on a shared-memory multi-processor system.
This paper presents an overview of the OpenMP model of computation, and describes OpenMP support in the Sun Studio compilers and tools. In addition, the paper reports on the performance of the SPEC OMP2001 benchmarks and outlines directions for future work.
- Compiler directives,
- Runtime library routines, and
- Environment variables
#pragma!$ompc$omp*$omp
The OpenMP Execution Model
|
omp_set_num_threads
ParallelDO/forSections
OpenMP Support in Sun Studio Software
|
-xopenmpCompiler Support__mt_MasterFunction___mt_MasterFunction___mt_MasterFunction___mf_par_001n__mf_par_001__mf_par_001id__mf_par_001__mf_par_001__mt_MasterFunction___mf_par_001__mt_MasterFunction___mt_WorkSharing_Automatic Parallelization-xautoparRuntime Library Support__mt_MasterFunction___mt_MasterFunction___mt_EndOfTask_Barrier___mt_MasterFunction_pthread_createslave_startup_function__mt_EndOfTask_Barrier_slave_startup_functionomp_set_nestedTools SupportAutomatic Scoping of VariablesStatic Error Checking-vpara-xvpara-XlistMPRuntime Error CheckingOpenMP Debugging-xopenmp=noopt -gPerformance AnalysisCompiler Commentary-ger_src
| Name
|
Application Area
|
Language
|
| 311.wupwise
|
Quantum chromodynamics
|
Fortran
|
| 313.swim
|
Shallow water modeling
|
Fortran
|
| 315.mgrid
|
Multi-grid solver
|
Fortran
|
| 317.applu
|
Partial differential equations
|
Fortran
|
| 321.equake
|
Earthquake modeling
|
C
|
| 325.apsi
|
Air pollutants
|
Fortran
|
| 327.gafort
|
Genetic algorithm
|
Fortran
|
| 329.fma3d
|
Crash simulation
|
Fortran
|
| 331.art
|
Neural network simulation
|
C
|
| 318.galgel
|
Fluid dynamics analysis
|
Fortran
|
| 332.ammp
|
Computational chemistry
|
C
|
Table 1: Applications in the SPEC OMP2001 Benchmark
- In June 2003, Sun Microsystems announced a world-record for the SPEC OMPL2001 suite (peak performance) on a Sun Fire 15K server configured with 72 UltraSPARC III Cu processors. The Sun Fire 15K server was the first server to break the 200,000 mark with a score of 213,466.
- In February 2004, Sun Microsystems established a new world record on the SPEC OMPL2001 suite (peak performance) on a Sun Fire E25K server configured with 72 UltraSPARC IV processors The Sun Fire E25K system set a record SPEC OMPL2001 peak performance score of 316,182.
- In March 2005, Sun Microsystems announced new world record SPEC OMPM2001 results in the two- and four-thread categories. The peak result of 12,434 on the Sun Fire V40z server in a four CPU configuration outperformed the scores reported by other commercially available compilers by up to 43 percent.
Figure 6: Scaling of OMPL2001 (Base) on Sun Fire 6800
Figure 7: Scaling of OMPL2001 (Base) on Sun Fire 15K
- New OpenMP features. Sun continues to track changes in the OpenMP Specification and play an active part in its evolution.
- OpenMP-specific optimizations for improved performance. These include compiler optimizations, such as removal of redundant barriers, as well as optimizations in the runtime library for reducing the overhead of parallelization.
- Enhanced tools support. Sun continues to invest in tools that aid the programmer in writing, debugging, and analyzing the performance of OpenMP programs. These tools include enhanced automatic scoping for OpenMP programs, data race detection, and interactive tools to assist programmers in parallelizing their applications.
- Architectural support. Sun continues to enhance the performance of its OpenMP implementation. Work in this area includes improving performance on Non-Uniform Memory Access (NUMA) machines and on Chip Multi-threading (CMT) architectures.
http://www.openmp.org/drupal/node/view/8http://www.openmp.orghttp://www.sun.com/software/products/studio/index.htmlhttp://docs.sun.com/doc/819-3694http://www.sun.com/servers/highend/sunfire_e25k/index.xmlhttp://www.spec.org/ompPDFhttp:/docs.sun.com/app/docs/doc/819-3687http://www.spec.org/omp/results
APPENDIX A.1: Parallel Directive Example
|
omp_set_num_threadsomp_set_dynamicFortran – PARALLEL Directive Example:
PROGRAM HELLO
USE OMP_LIB
INTEGER TID
CALL OMP_SET_DYNAMIC (.FALSE.)
CALL OMP_SET_NUM_THREADS (10)
!$OMP PARALLEL PRIVATE (TID)
! Obtain thread ID.
TID = OMP_GET_THREAD_NUM()
! Print thread ID.
PRINT *, 'Hello World from thread = ', TID
!$OMP END PARALLEL
END
C/C++ – PARALLEL Directive Example:
#include <stdio.h>
#include <omp.h>
int main(void)
{
int tid; omp_set_dynamic(0);
omp_set_num_threads(10);
#pragma omp parallel private(tid)
{
/* Obtain thread ID. */
tid = omp_get_thread_num();
/* Print thread ID. */
printf ("Hello World from thread = %d\n", tid);
}
}
APPENDIX A.2: DO/for Directive Example
|
Fortran – DO Directive Example:
PROGRAM VECTOR_ADD
USE OMP_LIB
PARAMETER (N=100)
INTEGER N, I
REAL A(N), B(N), C(N)
CALL OMP_SET_DYNAMIC (.FALSE.)
CALL OMP_SET_NUM_THREADS (20)
! Initialize arrays A and B.
DO I = 1, N
A(I) = I * 1.0
B(I) = I * 2.0
ENDDO
! Compute values of array C in parallel.
!$OMP PARALLEL SHARED(A, B, C), PRIVATE(I)
!$OMP DO
DO I = 1, N
C(I) = A(I) + B(I)
ENDDO
!$OMP END PARALLEL
PRINT *, C(10)
END
C/C++ – For Directive Example:
#include <stdio.h>
#include <omp.h>
#define N 100
int main(void)
{
float a[N], b[N], c[N];
int i;
omp_set_dynamic(0);
omp_set_num_threads(20);
/* Initialize arrays a and b. */
for (i = 0; i < N; i++)
{
a[i] = i * 1.0;
b[i] = i * 2.0;
}
/* Compute values of array c in parallel. */
#pragma omp parallel shared(a, b, c) private(i)
{
#pragma omp for
for (i = 0; i < N; i++)
c[i] = a[i] + b[i];
}
printf ("%f\n", c[10]);
}
APPENDIX A.3: Parallel Sections Example
|
Fortran – SECTIONS Directive Example:
PROGRAM SECTIONS
USE OMP_LIB
INTEGER SQUARE
INTEGER X, Y, Z, XS, YS, ZS
CALL OMP_SET_DYNAMIC (.FALSE.)
CALL OMP_SET_NUM_THREADS (3)
X = 2
Y = 3
Z = 5
!$OMP PARALLEL
!$OMP SECTIONS
!$OMP SECTION
XS = SQUARE(X)
PRINT *, "ID = ", OMP_GET_THREAD_NUM(), "XS =", XS
!$OMP SECTION
YS = SQUARE(Y)
PRINT *, "ID = ", OMP_GET_THREAD_NUM(), "YS =", YS
!$OMP SECTION
ZS = SQUARE(Z)
PRINT *, "ID = ", OMP_GET_THREAD_NUM(), "ZS =", ZS
!$OMP END SECTIONS
!$OMP END PARALLEL
END
INTEGER FUNCTION SQUARE(N)
INTEGER N
SQUARE = N*N
END
C/C++ – SECTIONS Directive Example:
#include <stdio.h>
#include <omp.h>
int square(int n);
int main(void)
{
int x, y, z, xs, ys, zs;
omp_set_dynamic(0);
omp_set_num_threads(3);
x = 2;
y = 3;
z = 5;
#pragma omp parallel
{
#pragma omp sections
{
#pragma omp section
{
xs = square(x);
printf ("id = %d, xs = %d\n", omp_get_thread_num(), xs);
}
#pragma omp section
{
ys = square(y);
printf ("id = %d, ys = %d\n", omp_get_thread_num(), ys);
}
#pragma omp section
{
zs = square(z);
printf ("id = %d, zs = %d\n", omp_get_thread_num(), zs);
}
}
}
}
int square(int n)
{
return n*n;
}
Nawal Copty is a staff engineer in the Scalable Systems Group, and OpenMP project lead.