The Challenge of Race Conditions in Parallel Programming

   
By Liang T. Chen, Sun Microsystems, July 21, 2006  
This article discusses general and data race condition issues that arise in parallel programming. Data race condition problems are common, while rarer, but harder, general race problems can also occur. Some data race conditions are not harmful and can be permitted in parallel software for performance reasons. Furthermore a race condition might be the symptom of a deeper design problem in the program code. A simple and actual parallel partitioning program is used to illustrate these various race condition issues and how to avoid them.

Introduction

[1]race conditions

datageneral



Partitioning Program Example



partition_main.cpp

1.  // global declaration
2.  #include <stdio.h>
3.  #include <math.h>
4.  #include <pthread.h>
5.  #include "element.h"
6.  #include "container.h"
7.  #define NGRPS 30
8.
9.  int object_count = 0;
10. element* object_array;
11. container group_array[NGRPS];
12. int total_count = 0;
13.
14. void* collect(void* arg)
15. {
16.    int j;
17.
18.    int group_id = *((int *) arg);
19.    int group_count = 0;
20.
21.    attribute group_attribute = get_group_attribute(group_id);
22.    container group_container = group_array[group_id];
23.    for (j = 0; j < object_count; j++) {
24.       element current_object = object_array[j];
25.       if (current_object.collectFlag == true) continue; // this flag is initialized to false
26.       if (current_object.matchAttribute( group_attribute)) {
27.          current_object.collectFlag = true;
28.          group_container.add( current_object);
29.          group_count++;
30.       }
31.    }
32.    total_count += group_count;
33.    return NULL;
34. }
35.
36.
37. int main(int argc, char** argv)
38. {
39.    int i;
40.    pthread_t pids[NTHRS -1];
41.
42.    object_count = process_input_data(argv[1], &object_array);
43.
44.    for (i = 0; i < NTHRS; i++) {
45.       pthread_create(&pids[i], NILL, collect, (void*) &i);
46.    }
47.
48.    if (total_count != object_count) {
49.       printf(" the collected object count %d doesn't match the original object count %d\n",
50.          total_count, object_count);
51.    }
52. }

Data Race Condition Problem

[2][3]data race condition

total_count += group_counttotal_count

[4,5]

Benign Data Race Condition

collect

collect_flagcollect_flag



collect_flag,collect_flag,collect_flag,



General Race Condition

general race condition[6][7,8].

partition_shuffle.cpp
1.  void shuffle_objects( container* source, container* destination, element* target_objects ) {
2.
3.  // remove target objects from source container
4.     mutex_lock();
5.     source.remove_array( target_objects );
6.     mutex_unlock();
7.
8.  // Here is a transitory state which may cause general race condition
9.
10. // add target objects to source container
11.    mutex_lock();
12.    destination.add_array( target_objects);
13.    mutex_unlock();
14. }

remove_arrayadd_array

[9]

partition_shuffle.cpp (continued)

20. void shuffle_components(container* component_source, container* component_destination,
21.    elements* target_components)
22. {
23.
24.    shuffle_objects(component_source, component_destination, target_components);
25. // Here is another transitory state which may cause general race condition
26.    pins* target_pins = get_child_pins(target_components);
27.    container* pin_source = get_pin_container( component_source);
28.    container* pin_destination = get_pin_container(component_destination);
29.    shuffle_objects(pin_source, pin_destination, target_pins);
30. }


shuffle_components

as well as

Understanding The Real Cause Of A Race Condition Problem

group_container.add(current_object)

argarg



void*

partition_omp.cpp

1.  // global declaration
2.  #include <stdio.h>
3.  #include <math.h>
4.  #include <omp.h>
5.  #include "element.h"
6.  #include "container.h"
7.  #define NGRPS 30
8.
9.  int object_count = 0;
10. element* object_array;
11. container group_array[NGRPS];
12. int total_count = 0;
13.
14. int collect(int group_id) // return the collected group count
15. {
16.    int j;
17.    int group_count = 0;
18.
19.    attribute group_attribute = get_group_attribute(group_id);
20.    container group_container = group_array[group_id];
21.    for (j = 0; j < object_count; j++) {
22.       element current_object = object_array[j];
23.       if (current_object.collectFlag == true) continue; // this flag is initialized to false
24.       if (current_object.matchAttribute( group_attribute)) {
25.          current_object.collectFlag = true;
26.          group_container.add( current_object);
27.          group_count++;
28.       }
29.    }
30.    return group_count;
31. }
32.
33.
34. int main(int argc, char** argv)
35. {
36.    int i;
37. #ifdef _OPENMP
38.    omp_set_num_threads( NTHRS );
39.    omp_set_dynamic(0);
40. #endif
41.
42.    object_count = process_input_data(argv[1], &object_array);
43.
44. #pragma omp parallel for
45.    for (i = 0; i < NTHRS; i++) {
46.       total_count += collect( i );
47.    }
48.
49.    if (total_count != object_count) {
50.       printf(" the collected object count %d doesn't match the original object count %d\n",
50.          total_count, object_count);
51.     }
52. }


Summary

[10][11]



12][13]14]

  • Adopt a higher design abstraction model such as OpenMP to reduce the risk of tedious implementation causing race condition.
  • Use passing-by-data-value instead of passing-by-pointer to communicate between the threads and processes.
  • Design the data structure to limit the global variable usage and restrict accesses of a shared variable by multiple threads.
  • Analyze the program state, input set, and run-time environment to decide if a race condition is a real program bug. It may be a compromise for better performance.
  • Understand the real cause of a race condition. Focus on fixing the real program bug instead of fixing a race condition at the surface level.
  • Fully understand the program execution states and state transitions. When a thread or a process causes a transitory state occurring on some program objects, avoid another concurrent thread or process operating on the same objects in parallel.
  • Implement auxiliary internal state checking mechanisms to verify the program state integrity at each operating stage of the program.

References

  1. "Parallel Programming Introduction", http://www.mhpcc.edu/training/workshop/parallel_intro/MAIN.html
  2. "POSIX Threads Programming", http://www.llnl.gov/computing/tutorials/pthreads
  3. "OpenMP home page", http://www.openmp.org
  4. "The Message Passing Interface (MPI) standard", http://www-unix.mcs.anl.gov/mpi/
  5. "Open MPI: Open Source High Performance Computing", http://www.open-mpi.org/
  6. "What are Race Conditions? Some Issues and Formalization", Robert H.B. Netzer and Barton P. Miller, In ACM Letters on Programming Languages and Systems, Vol. 1 No. 1 March 1992
  7. "Transactional Memory Online", http://www.cs.wisc.edu/trans-memory/
  8. "Transactional Memory Coherence and Consistence", Lance Hammond, Vicky Wong, Mike Chen, Brian D. Carlstrom, John D. Davis, Ben Hertzberg, Manohar K. Prahu, Honggo Wijaya, Christos Kozyrakis and Kunle Olukotun, http://ogun.stanford.edu/~kunle/publications/tcc_stmcs2006.pdf
  9. "OpenMP Parallelization Pragmas for C and C++", http://www.tacc.utexas.edu/services/userguides/pgi/pgiws_ug/pgi32u12.htm
  10. "Throughput Computing Main Page", http://www.sun.com/processors/throughput/
  11. "Developing Applications For Parallel Computing", Liang T. Chen, link
  12. "Sun Studio Express: Data Race Detection Tool", link
  13. "Intel Threading Tools", http://www.intel.com/cd/software/products/asmo-na/eng/threading/tcwin/index.htm
  14. "Valgrind's Tool Suite", http://valgrind.org/info/tools.html

Acknowledgements

The author would like to thank his Sun colleagues on the Data Race Detection Tool project who inspired this article. Some of the data race conditions discussed in this article were taken from real problems encountered in DRDT testing. The author also wants to thank his son David Chen for comments on the initial draft, his Sun colleagues Yuan Lin and Vijay Tatkar for their helpful reviews, and Richard Friedman for the final editing.

 
Rate and Review
Tell us what you think of the content of this page.
Excellent   Good   Fair   Poor  
Comments:
Your email address (no reply is possible without an address):
Sun Privacy Policy

Note: We are not able to respond to all submitted comments.
Left Curve
System Administrator
Right Curve
Left Curve
Developer and ISVs
Right Curve
Left Curve
Related Products
Right Curve
solaris-online-forum-banner