Libraries in C++

Part V of Libraries, Linking, Initialization, and C++ Series

By Darryl Gove and Stephen Clamage, May 2011

Part I - Introduction to Libraries and Linking
Part II - Resolving Symbols in Libraries
 

C++ Header Files and Libraries

In C++, it is very common to have functionality defined in header files. Functions defined in header files can end up in all the objects that include those header files. If multiple libraries include the same header file, those functions can be defined in multiple libraries.

The linker will usually pick a single definition and bind to it. If different references to the symbol in different parts of the program bind to different definitions, the program violates the C++ One-Definition Rule, which says that each symbol should be uniquely defined in an application.

However, in the presence of multiple definitions of a symbol, the linker might not be able to determine the best version to bind to. If the linker chooses the wrong version, it can cause circular dependencies in the libraries, and circular dependencies can cause problems with applications that have complex initialization requirements.

The same problem can occur with C code, but the problem is more prevalent with C++ because its language rules allow the compiler to automatically generate the functions it requires.

Example

Suppose the functionality required to support some statistics gathering is implemented in a class named Cstat. This is implemented using a header file that contains the definition for the class and a library that contains the implementation. The library also has an object of this class called stats. This is shown in Listing 1.

Listing 1: Implementation of the Cstat Class
 $ more stat.hpp
 #include <iostream>
 #include <string>
 class Cstat
 {
   std::string data;
   public:
   Cstat() data("data"){ }
   void show() {std::cout<<data;}
 };
 void ready();
  
 $ more stat.cpp
 #include "stat.hpp"
 
 Cstat stats;
 
 void ready()
 {
   stats.show();
 }

Notice that the constructor for the class is defined in the header file, and the other member functions are defined in the .cpp file. The steps for compiling this library and inspecting the functions that it contains are shown in Listing 2.

Note that the tool nm provides the option -C, which prints the demangled function names. Other ways of interpreting mangled names include the utility dem, which takes a mangled name and prints the equivalent demangled name, and the tool c++filt, which outputs a demangled version of the input piped into it.

Listing 2: Building and Inspecting the stats Library
  $ CC -G -Kpic -o libstat.so stat.cpp
  $ nm -C libstat.so|grep GLOB|grep -v UND
  [52]    |      3640|       236|FUNC |GLOB |0    |9      |Cstat::Cstat #Nvariant 1()
  [71]    |      3640|       236|FUNC |GLOB |0    |9      |Cstat::Cstat()
  ...
  [63]    |     69936|         4|OBJT |GLOB |0    |17     |stats
  [78]    |      3368|        60|FUNC |GLOB |0    |9      |void Cstat::show()
  [62]    |      3448|        56|FUNC |GLOB |0    |9      |void ready() 

The init section for the library contains the constructor for the stat object. When the library gets loaded, this section constructs the object.

The situation gets more complex when a second library uses this library. Listing 3 shows code for another library that includes the header file from libstat.so. This library defines a Cdata class, which contains a Cstat object. The library libstat.so also has an object of the Cdata class, which is called data.

Listing 3: Library that Uses libstat.so
 $ more data.hpp
 void notready();
 
 $ more data.cpp
 #include "stat.hpp"
 
 class Cdata
 {
   Cstat stats;
   public:
   Cdata() { ready(); }
 };
 
 Cdata data;
 
 void notready()
 {
 }

The object data will be constructed by the initialization section when the library is loaded, and this object contains a Cstat object. When the Cdata object is constructed, it will call ready() in libstat.so. Therefore, it is important that libstat.so be initialized before libdata.so is initialized.

The process of compiling the library and inspecting the symbols that it exports is shown in Listing 4.

Listing 4: Compiling and Inspecting libdata.so
  $ CC -g -G -Kpic -o libdata.so data.cpp -L. -R'$ORIGIN' -lstat 
  $ nm -C libdata.so|grep GLOB|grep -v UND
  ...
  [79]    |      3832|        64|FUNC |GLOB |0    |9      |Cdata::Cdata #Nvariant 1()
  [83]    |      3832|        64|FUNC |GLOB |0    |9      |Cdata::Cdata()
  [60]    |      3912|       124|FUNC |GLOB |0    |9      |Cstat::Cstat #Nvariant 1()
  [76]    |      3912|       124|FUNC |GLOB |0    |9      |Cstat::Cstat()
  ...
  [81]    |     70132|         4|OBJT |GLOB |0    |17     |data
  [71]    |      3504|        12|FUNC |GLOB |0    |9      |void notready()

This library defines the functions that are resident in the library, but it also defines the constructor for the Cstat object. This symbol is defined in both libstat.so and libdata.so. Hence, the run-time linker is free to choose either definition, which can cause a problem. This issue can be demonstrated by writing an application that uses libdata.so. Such an application is shown in Listing 5.

Listing 5: Application that Uses libdata.so
 $ more main.cpp
 #include "data.hpp"
 
 int main()
 {
   notready();
 }

When run, this application “seg faults.” The output from LD_DEBUG=init in Listing 6 shows the sequence of events that leads to this problem.

Listing 6: Compiling and Running the Application
  $ CC -o main main.cpp -L. -R'$ORIGIN' -ldata
  $ LD_DEBUG=init ./main
  ... 
  28006: 1: calling .init (from sorted order): /codes/library/stl2/libstat.so
  28006: 1: 
  28006: 1: calling .init (dynamically triggered): /codes/library/stl2/libdata.so
  28006: 1: 
  28006: 1: warning: calling /library/stl2/libstat.so whose init has not completed
  28006: 1: 
  Segmentation Fault (core dumped)

The problem sequence is that libstat.so is being initialized, but this triggers the initialization section for libdata.so. libdata.so calls into libstat.so, which has not yet completed initialization, and at this point, the program seg faults. One red flag in this instance is the warning that the init section for libstat.so has not yet completed.

The environment setting LD_DEBUG=bindings can be used to examine the exact symbol that is being bound from libstat.so to libdata.so. It is this symbol that causes the initialization failure, as shown in Listing 7.

Listing 7: Using LD_DEBUG=bindings to Examine Symbols
  $ LD_DEBUG=bindings ./main 2>&1 |grep libdata |grep libstat |c++filt
  07510: 1: binding file=/codes/library/stl2/libstat.so to file=/codes/library/stl2/libdata.so: symbol `Cstat::Cstat()'
  07510: 1: binding file=/codes/library/stl2/libdata.so to file=/codes/library/stl2/libstat.so: symbol `void ready()'

The output shows that the symbol Cstat::Cstat() is bound from libstat.so to libdata.so. This is the Cstat constructor. So rather than using its own definition of the Cstat constructor, libstat.so is calling the definition provided by libdata.so.

To further confirm the sequence of events, we can inspect the call stack using dbx, as shown in Listing 8. The command where -l lists the current call stack together with the libraries where each function resides.

Listing 8: Using dbx to Inspect the Call Stack
  $ dbx - core
  Current function is Cstat::show
      7     void show() {std::cout<<data;}
  (dbx) where -l
    [1] libCstd.so.1:std::operator<< 
  <char,std::char_traits<char>,std::allocator<char> >(0xff0eb170, 0xff1111f4, 
  0xff160f0c, 0x0, 0xff0e5560, 0xff0e77d0), at 0xff06b85c 
  =>[2] libstat.so:Cstat::show(this = 0xff1111f4), line 7 in "stat.hpp"
    [3] libstat.so:ready(), line 8 in "stat.cpp"
    [4] libdata.so:Cdata::Cdata(this = 0xff1711f4), line 7 in "data.cpp"
    [5] libdata.so:__SLIP.INIT_A(), line 10 in "data.cpp"
    [6] libdata.so:__STATIC_CONSTRUCTOR(), line 10 in "data.cpp"
  ...

Stack frames 2, 3, and 4 show the constructor for Cdata calling the routine ready() and then this calling Cstat::show(). The application seg faults in Cstat::show() when it calls cout, because the variable data has not yet been initialized by the constructor of the object stats.

We can use truss to examine the exact call sequence, as shown in Listing 9.

Listing 9: Using truss to View the Call Sequence
  $ truss -u libstat,libdata,a.out ./main
  ...
  /1@1:   -> libstat:_init(0x0, 0x0, 0xfefd2a40, 0x1)
  /1@1:     -> libdata:_init(0x0, 0x0, 0xfefd2a40, 0x1)
  /1@1:       -> libstat:__1cFready6F_v_(0xff3511f4, 0x0, 0x0, 0x0)
  /1:         Incurred fault #6, FLTBOUNDS  %pc = 0xFF26B85C
  /1:           siginfo: SIGSEGV SEGV_MAPERR addr=0xFFFFFFF8
  /1:         Received signal #11, SIGSEGV [default]
  /1:           siginfo: SIGSEGV SEGV_MAPERR addr=0xFFFFFFF8
    

All this information gives us an explanation of what happened. The core problem is that we have multiple definitions for the constructor of Cstat. At run time, the linker first encounters the definition in libdata.so, so this becomes the function that libstat.so calls when it needs to construct a Cstat class. The linker correctly identifies libstat.so as the first library to initialize, but during this initialization, the library needs to call the constructor for a Cstat class. This constructor is located in libdata.so, so the initialization code for libdata.so needs to be run first. This code calls back into libstat.so, which has not completed initialization, and it is this final bit of code that causes the application to seg fault. This sequence of events in illustrated in Figure 1.

linking_series_five_image1

Figure 1. Initialization Sequence of Problem Application

The Role of -g in This Example

In this example, a contributing factor was the presence of -g on the compile line. In the absence of optimization flags, the -g flag stops the compiler from inlining some of the functions. If the functions were inlined, they would no longer be called, and, in this instance, the problem with picking the wrong version would not occur.

However, this workaround exists only for this example. A more complex application could exhibit the same problem for both the optimized and debug builds of the application. So, although it might be tempting to think that this problem is somehow caused by -g, the problem is really just demonstrated by -g. The problem can occur in other situations that are more difficult to debug.

As an example of how this might happen in applications that are more complex, consider a function that is declared inline. The function might not be inlined at sites where the resulting code exceeds some complexity measure. In other situations, a function whose address is needed will be generated out of line even if it is also inlined at some call sites. A virtual function will always be generated out of line, because its address is needed in the virtual table. Consequently, there are a number of reasons why multiple definitions of functions could persist in an application and its libraries.

Summary of Recommendations

  • Use LD_DEBUG=init to examine the initialization of the libraries loaded by an application.
  • Use LD_DEBUG=bindings to examine how symbols are resolved between libraries.
Revision 1, 05/19/2011