Tip: How Many Threads Does It Take?

   
Sometimes we can observe an OpenMP program using a different number of threads each time it is run. Why does that happen?

For example, here is a program that appears in the OpenMP User's Guide to demonstrate nested parallelism. A team of more than one thread is executing a nested parallel region:

#include <omp.h>
#include <stdio.h>
void report_num_threads(int level)
{ 
#pragma omp single 
        {
                printf("Level %d: number of threads in the team - %d\n",
                        level, omp_get_num_threads()); 
        }
}
int main()
{
        omp_set_dynamic(0);
        #pragma omp parallel num_threads(2) 
        { 
                report_num_threads(1);
                #pragma omp parallel num_threads(2)
                { 
                        report_num_threads(2); 
                        #pragma omp parallel num_threads(2) 
                        { 
                                report_num_threads(3);
                        }
                }
        }
 return(0);
 } 

Compiling and running this program with nested parallelism enabled produces the following output:

% setenv OMP_NESTED TRUE
% a.out
Level 1: number of threads in the team - 2
Level 2: number of threads in the team - 2
Level 2: number of threads in the team - 2
Level 3: number of threads in the team - 2
Level 3: number of threads in the team - 2
Level 3: number of threads in the team - 2
Level 3: number of threads in the team - 2

At level one two threads are created and each of those threads creates two more threads, and so on.

Compare this with the result by running the same program with nested parallelism disabled:

% setenv OMP_NESTED FALSE
% a.out
Level 1: number of threads in the team - 2
Level 2: number of threads in the team - 1
Level 3: number of threads in the team - 1
Level 2: number of threads in the team - 1
Level 3: number of threads in the team - 1

 

The User Guide goes on to demonstration how setting the SUNW_MP_MAX_POOL_THREADS environment variable can control the number of threads in the pool:

The thread pool consists of only non-user threads that the runtime library creates. It does not include the master thread or any thread created explicitly by the user's program. If this environment variable is set to zero, the thread pool will be empty and all parallel regions will be executed by one thread.

The following example shows that a parallel region can get fewer threads if there are not sufficient threads in the pool. The code is the same as above. The number of threads needed for all the parallel regions to be active at the same time is eight. The pool needs to contain at least seven idle threads. If we set SUNW_MP_MAX_POOL_THREADS to 5, two of the four inner-most parallel regions may not be able to get all the slave threads they ask for. One possible result is shown below.

% setenv OMP_NESTED TRUE
% setenv SUNW_MP_MAX_POOL_THREADS 5
% a.out
Level 1: number of threads in the team - 2
Level 2: number of threads in the team - 2
Level 2: number of threads in the team - 2
Level 3: number of threads in the team - 2
Level 3: number of threads in the team - 2
Level 3: number of threads in the team - 1
Level 3: number of threads in the team - 1

 

But you may run the same program and get the following output:

 % a.out
 Level 1: number of threads in the team - 2
 Level 2: number of threads in the team - 2
 Level 2: number of threads in the team - 2
 Level 3: number of threads in the team - 2   
 Level 3: number of threads in the team - 1  
                  

 Level 3: number of threads in the team - 2   
                  

 Level 3: number of threads in the team - 2
                

Note here that there are seven level 3 threads, not the six shown in the first run. Is this a bug, or expected? And how can it be explained?

Well, note that the program can have at most eight (2x2x2) level 3 threads. Depending on how the operating system schedules the threads, a user may see six, seven, or eight level 3 threads.

At level 2, there are four threads, T1, T2, T3, T4. Each wants to create a parallel region with a team of two threads. The maximum number of threads in this progress is six ( SUNW_MP_MAX_POOL_THREADS+1), so there are two threads can be used as slave threads at level 3.

If T1, T2, T3, and T4 try to acquire the slave threads at the same time, and T1 gets one, T2 gets one, but T3 and T4 are not able to get one. Then there are 2+2+1+1=6 level 3 threads. If T1 gets one and T2 gets one, and T1 finishes its parallel region and returns the slave thread it gets to the pool just at the moment that T3 tries to get a slave thread, it may be able to get the one returned by thread T1. Suppose thread T4 does not get one. Then there are 2+2+2+1=7 level 3 threads. If T4 is also able to get the one returned by T2, then there will be 2+2+2+2=8 level 3 threads. Any of these scenarios are possible, depending on the timing of the events and the scheduling of operating system.

And that is why the User Guide uses the phrase "one possible result".


(Page last updated May 3, 2005)
 
Rate and Review
Tell us what you think of the content of this page.
Excellent   Good   Fair   Poor  
Comments:
Your email address (no reply is possible without an address):
Sun Privacy Policy

Note: We are not able to respond to all submitted comments.