By Ann Rice, April 2011 (updated June 2016)
While a traditional UNIX® process contains a single thread of control, multithreading separates a process into many execution threads, each of which runs independently. Multithreading your code has a number of benefits, but it can also introduce bugs that might be difficult to find. This article suggests ways of avoiding such bugs in your code as well as strategies for finding these bugs using the dbx
command-line debugger.
Multithreading your code can help in the following areas.
Improving Application Responsiveness
Any program in which many activities are not dependent upon each other can be redesigned so that each independent activity is defined as a thread. For example, the user of a multithreaded GUI does not have to wait for one activity to complete before starting another.
Using Multiprocessors Efficiently
Typically, applications that express concurrency requirements with threads need not take into account the number of available processors. The performance of the application improves transparently with additional processors because the operating system takes care of scheduling threads for the number of processors that are available. When multicore processors and multithreaded processors are available, a multithreaded application's performance scales appropriately because the cores and threads are viewed by the OS as processors.
Numerical algorithms and applications with a high degree of parallelism, such as matrix multiplications, can run much faster when implemented with threads on a multiprocessor.
Improving Program Structure
Many programs are more efficiently structured as multiple independent or semi-independent units of execution instead of as a single, monolithic thread. For example, a non-threaded program that performs many different tasks might need to devote much of its code just to coordinating the tasks. When the tasks are programmed as threads, the code can be simplified. Multithreaded programs. especially programs that provide service to multiple concurrent users, can be more adaptive to variations in user demands than single-threaded programs.
Using Fewer System Resources
Programs that use two or more processes that access common data through shared memory are applying more than one thread of control.
However, each process has a full address space and operating environment state. The cost of creating and maintaining this large amount of state information makes each process much more expensive than a thread in both time and space.
In addition, the inherent separation between processes can require a major effort by the programmer. This effort includes handling communication between the threads in different processes, or synchronizing their actions When the threads are in the same process, communication and synchronization becomes much easier.
Combining Threads and RPC
By combining threads and a remote procedure call (RPC) package), you can exploit nonshared-memory multiprocessors, such as a collection of workstations. This combination distributes your application relatively easily and treats the collection of workstations as a multiprocessor.
For example, one thread might create additional threads. Each of these children could then place a remote procedure call, invoking a procedure on another workstation. Although the original thread has merely created threads that are now running in parallel, this parallelism involves other computers.
The Message Processing Interface (MPI) might be a more effective approach to achieve mutithreading in applications that run across distributed systems. See http://www-unix.mcs.anl.gov/mpi/
for more information about MPI.
The Oracle Message Passing Toolkit includes Open MPI Message Passing Interface (OMPI), which is an open source implementation of MPI.
The following list points out some of the more frequent oversights that can cause bugs in multithreaded programs:
sigwait(2)
model for handling asynchronous signals.setjmp()
and longjmp()
, and then long-jumping away without releasing the mutex locks.*_cond_wait()
or *_cond_timedwait()
.PTHREAD_CREATE_JOINABLE
and must be reclaimed with pthread_join(3C)
. Note that pthread_exit(3C)
does not free up its storage space.Multithreaded programs, especially those containing bugs, often behave differently in two successive runs, given identical inputs, because of differences in the thread scheduling order.
In general, multithreading bugs are statistical instead of deterministic. Tracing is usually a more effective method of finding order of execution problems than is breakpoint-based debugging.
Oracle Solaris Dynamic Tracing (DTrace) is a comprehensive dynamic tracing facility built into the Oracle Solaris operating system. You can use DTrace to examine the behavior of your multithreaded program. DTrace inserts probes into running programs to collect data at points in the execution path that you specify. The collected data can be examined to determine problem areas. See the Oracle Solaris Dynamic Tracing Guide and the DTrace User's Guide for more information about using DTrace.
Oracle Developer Studio includes the Thread Analyzer tool. This tool lets you analyze the execution of a multithreaded program. It can detect multithreaded programming errors such as data races or deadlocks in code written using the using the POSIX thread API, the Oracle Solaris thread API, OpenMP directives, or a mix of these technologies. See the Thread Analyzer User's Guide for more information.
dbx
DebuggerOracle Developer Studio includes the db
command-line debugger, an interactive source level debugging tool.
When it detects a multithreaded program, dbx
tries to load libthread_db.so
, a special system library for thread debugging located in /usr/lib
. dbx
is synchronous; when any thread or lightweight process (LWP) stops, all other threads and LWPs sympathetically stop. (An LWP is a thread in the Oracle Solaris kernel that executes kernel code and system calls.) This behavior is sometimes referred to as the “stop the world” model.
Setting Breakpoints in Multithreaded Code
You can set breakpoints in multithreaded code using the stop
command, trace
command, or when
command. The basic syntax of these commands is:
stop
event-specification [
modifier]
trace
event-spcification [
modifier]
when
event-specification [
modifier ] {
command; ... }
Two thread-specific events were added in Oracle Developer Studio 11 dbx
:
thr_create [thread_id]
event occurs when a thread, or a thread with the specified thread_id, has been created. For example, in the following stop
command, the thread ID t@1
refers to the creating thread, while the thread ID t@5
refers to the created thread.
(dbx) stop thr_create t@5 -thread t@1
thr_exit
event occurs when a thread has been exited. To capture the exit of a specific thread, use the -thread
option of the stop command as follows:
(dbx) stop thr_exit -thread t@5
Understanding Thread Creation Activity
You can get an idea of how often your application creates and destroys threads by using the thr_create
event and thr_exit
event as in the following example:
(dbx) trace thr_create
(dbx) trace thr_exit
(dbx) run
trace: thread created t@2 on l@2
trace: thread created t@3 on l@3
trace: thread created t@4 on l@4
trace: thr_exit t@4
trace: thr_exit t@3
trace: thr_exit t@2
The application created three threads. Note how the threads exited in reverse order from their creation, which might indicate that had the application had more threads, the threads would accumulate and consume resources.
To get more interesting information, you could try the following in a different session:
(dbx) when thr_create { echo "XXX thread $newthread created by $thread"; }
XXX thread t@2 created by t@1
XXX thread t@3 created by t@1
XXX thread t@4 created by t@1
The output shows that all three threads were created by thread t@1
, which is a common multithreading pattern.
Suppose you want to debug thread t@3
from its outset. You could stop the application at the point that thread t@3
is created as follows:
(dbx) stop thr_create t@3
(dbx) run
t@1 (l@1) stopped in tdb_event_create at 0xff38409c
0xff38409c: tdb_event_create : retl
Current function is main
216 stat = (int) thr_create(NULL, 0, consumer, q, tflags, &tid_cons2);
(dbx)
If your application occasionally spawns a new thread from thread t@5
instead of thread t@1
, you could capture that event as follows:
(dbx) stop thr_create -thread t@5
See Setting Event Specifications in the Debugging a Program With dbx manual for a complete list of event specifications. Bear in mind that the event you specify may occur in more than one thread, so your program may hit the breakpoint many times. You can specify a thread_id or lwp_id as a modifier to the stop command and the trace command. The action associated with the event is then executed only if the thread or LWP that caused the event matches the thread_id or lwp_id. However, the specific thread of LWP you have in mind might be assigned a different thread_id or lwp_id from one execution of the program to the next.
Stepping Through Multithreaded Code
dbx
supports two basic single-step commands: next
and step
, plus two variants of step
, called step up
and step to
. Both the next
command and the step
command let the program execute one source line before stopping again. The basic difference between the next
and step
commands is in how they handle function calls. If the line executed contains a function call:
next
command allows the call to be executed and stops at the following line (“steps over” the call)step
command stops at the first line in a called function (“steps into” the call).The syntax of the next
command is:
next [ n ] [ -sig signal ] [ thread_id ] [lwp_id ]
The syntax of the step
command is:
step [ n ] [ up ] [ -sig signal ] [ thread_id ] [lwp_id ] [ to function ]
To step one line in the current thread or LWP, type:
next
or
step
To step multiple (n) lines in the current thread or LWP, type:
next n
or
step n
To step one line in a different thread, type:
step thread_id
For example:
(dbx) step t@2
With multithreaded programs when a function call is stepped into or stepped over, all LWPs are implicitly resumed for the duration of that function call in order to avoid deadlock.
You can specify a specific thread_id or lwp_id to the next
command or the step
command, thus changing the current thread or LWP. However, if you do so, this deadlock avoidance measure is defeated. So to avoid deadlocks, it is safer to change the current thread or LWP using the thread
command or lwp
command, and then use the next
command or step
command to step in the new current thread or LWP.
Whenever you give a command that steps a single thread or LWP, you need to be aware of potential deadlocks. If the thread that continues executing needs to acquire a lock that is held by a thread that has not resumed execution, your program deadlocks. If such a deadlock occurs, you can break it only by typing ctrl-C
and then resuming all threads.
To step up and out of the current function in the current thread or LWP, type:
step up
or
step up lwp_id
To step into a specified function at the current source line, type:
step to function_name
To step into the last function called as determined by the assembly code for the current source line, type:
step to
To deliver a signal while stepping, you can add -sig
signal to any of the above next
and step
commands.
Resuming Execution
To resume execution of your multithreaded program after hitting a breakpoint or after single-stepping through your code, use the cont
command. For multithreaded programs, the syntax is:
cont [ at line ] [ thread_id | lwp_id ] [ -sig signal ]
To continue execution of all threads, type:
cont
To continue execution of a specific thread or LWP, type:
cont thread_id
or
cont lwp_id
For example:
(dbx) cont l@3
To continue execution at a specific source line, type:
cont at line_number thread_id
or
cont at line_number lwp_id
To continue execution with a specific signal, you can add -sig signal
in any of the above cont
commands. Whenever you give a command that resumes a single thread or LWP, you need to be aware of potential deadlocks. If the thread that continues executing needs to acquire a lock that is held by a thread that has not resumed execution, your program deadlocks. If such a deadlock occurs, you can break it only by typing ctrl-C and then resuming all threads.
Viewing the Threads List
To view the threads list, use the threads
command. The syntax is:
threads [ -all ] [ -mode [ all|filter ] [ auto|manual ] ]
The threads
command displays the thread information shown in the following example:
(dbx) threads
t@1 a l@1 ?() running in main()
t@2 ?() asleep on 0xef751450 in_swtch()
t@3 b l@2 ?() running in sigwait()
t@4 consumer() asleep on 0x22bb0 in _lwp_sema_wait()
*>t@5 b l@4 consumer() breakpoint in Queue_dequeue()
t@6 b l@5 producer() running in _thread_start()
(dbx>
For native code, each line of information displayed by the threads
command is composed of the following:
An 'o' instead of an asterisk indicates that a dbx
internal event has occurred.
t@number
, the thread id, refers to a particular thread. The number is the thread_t
value passed back by thr_create
.b l@number
or a l@number
means the thread is bound to or active on the designated LWP, meaning the thread is actually runnable by the operating system.thr_create
. A ?()
means that the start function is not known.Table 1 Thread and LWP States
State | Description |
---|---|
suspended | The thread has been explicitly suspended. |
runnable | The thread is runnable and is waiting for an LWP as a computational resource. |
zombie | When a detached thread exits (thr_exit()() ), it is in a zombie state until it has rejoined through the use of thr_join().() THR_DETACHED is a flag specified at thread creation time (thr_create()() ). A non-detached thread that exits is in a zombie state until it has been reaped. |
asleep on syncobj | The thread is blocked on the given synchronization object. Depending on what level of support libthread.so and libthread_db.so provide, syncobj might be as simple as a hexadecimal address or something with more information content. |
active | The thread is active on an LWP, but dbx cannot access the LWP. |
unknown | dbx cannot determine the state. |
lwpstate | A bound or active thread state has the state of the LWP associated with it. |
running | The LWP was running but was stopped in synchrony with some other LWP. |
syscall num | The LWP stopped on an entry into the given system call number. |
syscall return num | The LWP stopped on an exit from the given system call number. |
job control | The LWP stopped due to job control. |
LWP suspended | The LWP is blocked in the kernel. |
single stepped | The LWP has just completed a single step. |
breakpoint | The LWP has just hit a breakpoint. |
fault num | The LWP has incurred the given fault number. |
signal name | The LWP has incurred the given signal. |
process sync | The process to which this LWP belongs has just started executing. |
LWP death | The LWP is in the process of exiting. |
To print the list of all known threads, type:
threads
The output of this command might be:
*> t@1 a l@1 ?() signal SIGINT in _XFlushInt()
t@2 b l@2 ?() running in _signotifywait()
t@3 b l@3 ?() running in _lwp_sema_wait()
t@4 ?() sleep on (unknown) in _reap_wait()
To print threads normally not printed (zombies), type:
threads -all
The output of this command might be:
*> t@1 a l@1 ?() signal SIGINT in _XFlushInt()
t@2 b l@2 ?() running in _signotifywait()
t@3 b l@3 ?() running in _lwp_sema_wait()
t@4 ?() sleep on (unknown) in _reap_wait()
t@5 myThread() zombie in in
t@5 myThread() zombie in in
By default, the threads command runs in filter mode, meaning that hidden threads and zombie threads are not printed. To print all threads including hidden threads and zombies, type:
threads -mode all
Displaying, Changing, Suspending, or Resuming the Current Thread
The thread command lists or changes the current thread. The syntax is:
thread [ -blocks
] [ -blockedby
] [ -info
] [ -hide
] [ -unhide
] [ -suspend
] [ -resume
] thread_id
To change the current thread, type:
thread thread_id
To print everything known about the current thread or given thread, type:
thread -info [thread_id
]
For example, this command might produce the following output:
thread -info t@4
Thread t@4 (0xfe60bd70) at priority 127
state: asleep on (unknown)
base function: 0x0: 0x00000000() stack: 0xfe60bd70[1047920]
flags: DETACHED|DAEMON
masked signals: HUP INT QUIT ILL TRAP ABRT EMT FPE BUS SEGV SYS PIPE ALRM TERM USR1 USR2 CLD PWR WINCH
URG POLL TSTP CONT TTIN TTOU VTALRM PROF XCPU XFSZ WAITING FREEZE THAW LOST RTMIN RTMIN+1
RTMIN+2 RTMIN+3 RTMIN+4 RTMIN+5 RTMIN+6 RTMIN+7
Currently inactive in _reap_wait
To print all locks held by the current thread or given thread blocking other threads, type:
thread -blocks [thread_id]
To show which synchronization object the current thread or given thread is blocked by, if any, type:
thread -blockedby [thread_id]
To suspend the current thread or given thread, type:
thread -suspend [thread_id]
To resume (unsuspend) the current thread or given thread, type:
thread -resume [thread_id]
To hide the current thread or given thread so that it will not be displayed by the threads command, type:
thread -hide [thread_id]
To unhide the current thread or given thread so that it will be displayed by the threads command, type:
thread -unhide [thread_id]
Displaying LWP Information
Normally, you need not be aware of LWPs, but there are times when thread level queries cannot be completed. In these cases, you can use the lwp
command and lwps
command to show information about LWPs.
The syntax of the lwp
command is:
lwp [ -info] lwp_id
To list the current LWP, type:
lwp
For example:
lwp l@3
To change the current LWP, type:
lwp lwp_id
To display the name, home, and masked signals of the current LWP, type:
lwp -info
For example, this command might produce the following output:
lwp -info l@2
l@2 running in _signotifywait()
masked signals are:
To list all LWPs in the current process, use the lwps
command:
lwps
The lwps
command displays the LWP information shown in the following example:
(dbx) lwps
l@1 running in main()
l@2 running in sigwait()
l@3 running in _lwp_sema_wait()
*>l@4 breakpoint in Queue_dequeue()
l@5 running in _thread_start()
(dbx)
Each line of the LWP list contains the following:
l@number
refers to a particular LWP.in function_name()
identifies the function that the LWP is currently executing.Runtime Checking Multithreaded Applications
Runtime checking in dbx
supports multithreaded applications. Along with each access checking error report, RTC prints the ID of the thread on which the error occurred. The leak report generated by RTC includes the leaks from all the threads in the program.
Potential Problems With Dynamic Function Calls
If you are accustomed to using dynamic function calls from dbx
when debugging single-threaded programs, take care in using the same technique when debugging multithreaded code.
dbx
allows you to use function calls in expressions. For example, the following command forces the target program to call foo()()
:
print foo()
Forcing a function call can be useful because it lets you use the program code to examine the state of the program.
You can use the when
command to stop execution at particular locations in the program, print data, and then resume execution:
when in bar {print var;}
If you combine these two examples, as in the following command, you are stopping the program at various locations, forcing it to call a function, and then continuing execution:
when in bar {print foo();}
If you give dbx
such a command, you must be sure that calling foo()()
at the times you stop execution in bar()
does not interfere with your program's intended execution.
If your program is multithreaded, it is more difficult to predict when it is safe to force a call to foo()()
. One thread might be stopped in bar()
, which you know is safe, but other threads might be in the process of modifying data that foo()()
relies on.
For further details, see: