Avoiding Linking Problems

Part IV of Libraries, Linking, Initialization, and C++ Series

By Darryl Gove and Stephen Clamage, May 2011

Part I - Introduction to Libraries and Linking
Part II - Resolving Symbols in Libraries


Duplicate Symbols

The same symbol can occur in multiple libraries. In this case, the linker typically causes binding to the first definition of that symbol it encounters (although later we will discuss how direct binding can cause different libraries to bind to different instances of the same symbol name). This can be demonstrated by modifying a test program so it contains duplicate symbols, as shown in Listing 1.

Listing 1: Libraries Containing Duplicate Symbols
$ more main.c
 #include <stdio.h>

 void f();

 void main()
 {
   f();
   printf("In main\n");
 }

 $ more lib1.c
 #include <stdio.h>

 void f()
 {
   printf("In library 1\n");
 }

 $ more lib2.c
 #include <stdio.h>

 void f()
 {
   printf("In library 2\n");
 }

When compiled with lib1.so listed first on the link line, the symbol f() will resolve to the definition in that library. When compiled with lib2.so listed first, it will be the code executed. This is shown in Listing 2.

Listing 2: Symbols Resolve to the First Encountered Definition
 $ cc -G -Kpic lib1.c -o lib1.so
 $ cc -G -Kpic lib2.c -o lib2.so
 $ cc -o main main.c -L. -R'$ORIGIN' -l1 -l2
 $ ./main
 In library 1
 In main
 $ cc -o main main.c -L. -R'$ORIGIN' -l2 -l1
 $ ./main
 In library 2
 In main

Notice that there are no warnings either at link time or at run time. It is possible to use the debug tool lari to identify these duplicate symbols, as shown in Listing 3.

Listing 3: Using lari to Identify Duplicate Symbols
 $ lari ./main
 [2:0]: f(): codes/library/lib1.so
 [2:1E]: f(): codes/library/lib2.so

The output shows that there are two symbols of the name f(), which is indicated in the output by the text [2:*] next to the symbol. The definition in lib2.so has been bound to, indicated by the text [2:1E] (the letter E indicates that the binding is from an external object), whereas the version in lib1.so has not, indicated by [2:0].

The other way of checking for duplicate symbols is to examine what symbols the libraries define. This can be done using the nm utility to search for global symbols that are not undefined, as shown in Listing 4. The utility elfdump with the flag -s can also be used to report the symbol information.

Listing 4: Using nm to Determine the Symbols a Library Defines
 % nm ./lib1.so|grep GLOB |grep -v UNDEF
 [47]    |     66300|       0|OBJT |GLOB |0    |13     |_DYNAMIC
 [45]    |     66452|       0|OBJT |GLOB |0    |15     |_edata
 [41]    |     66452|       0|OBJT |GLOB |0    |16     |_end
 [40]    |       690|       0|OBJT |GLOB |0    |10     |_etext
 [39]    |       660|      12|FUNC |GLOB |0    |8      |_fini
 [48]    |     66228|       0|OBJT |GLOB |0    |11     |_GLOBAL_OFFSET_TABLE_
 [42]    |       648|      12|FUNC |GLOB |0    |7      |_init
 [44]    |       672|       4|OBJT |GLOB |0    |9      |_lib_version
 [38]    |     66236|       0|OBJT |GLOB |0    |12     |_PROCEDURE_LINKAGE_TABLE_
 [43]    |       592|      56|FUNC |GLOB |0    |6      |f

The real problem with multiply defined symbols is that the linker is free to resolve the symbol using whichever library it finds first. This means that the run-time behavior of the application might change depending on which library gets loaded first. This is a critical problem, as we will discover in later sections.

Circular Dependencies

The other potential problem is circular dependencies. This is where one library depends on the functionality provided by another, and the other library relies on the functionality provided by the first. This circular dependency can happen in two ways.

  • The first is a genuine dependency where the developer has written the application in such a way that this is the expected result.
  • The second way this might happen is if there are multiple symbols of the same name and the linker ends up resolving symbols to the first-loaded dependency rather than to the library's own definitions.

The main problem with deliberate circular dependencies is that it becomes impossible to link to the libraries without having some unresolved dependencies remaining at link time. The application in Listing 5 has a circular dependency between lib1.so and lib2.so. In the code, lib1.so relies on lib2.so providing a definition of g2(), and lib2.so relies on lib1.so for a definition of g1().

Listing 5: Application Containing a Deliberate Circular Dependency Between Libraries
$ more main.c
 #include <stdio.h>

 void f1();

 void main()
 {
   f1();
   printf("In main\n");
 }

 $ more lib1.c
 #include <stdio.h>

 void g2();

 void f1()
 {
   g2();
   printf("In library 1\n");
 }

 void g1()
 {
  printf("Back in library 1\n");
 }

 $ more lib2.c
 #include <stdio.h>

 void g1();

 void g2()
 {
   g1();
   printf("In library 2\n");
 }

Linking fails if we try to link either lib1.so or lib2.so using -z defs to warn of unresolved dependencies. To make progress, we need to link one of the two libraries allowing unresolved symbols. The process of linking is shown in Listing 6.

Listing 6: Linking in the Presence of Circular Dependencies
$ cc -G -Kpic -o lib1.so lib1.c -z defs -L. -R'$ORIGIN' -lc
 Undefined                       first referenced
  symbol                             in file
 g2                                  lib1.o
 ld: fatal: symbol referencing errors. No output written to lib1.so
 $ cc -G -Kpic -o lib1.so lib1.c  -L. -R'$ORIGIN' -lc
 $ cc -G -Kpic -o lib2.so lib2.c -z defs -L. -R'$ORIGIN' -l1 -lc
 $ cc -o main main.c -L. -R'$ORIGIN' -l1 -l2
 $ ./main
 Back in library 1
 In library 2
 In library 1
 In main

Having circular dependencies means that it is not possible to use best practices when linking the application, which could mask potential problems. When the application is run, circular dependencies can cause problems with guaranteeing the correct initialization order for the libraries.

Best Practices for Avoiding Circular Dependencies by Design

When an application is composed of multiple libraries (apart from system libraries), best practice is to ensure that for any two libraries, libA.so and libB.so, the libraries are either independent or hierarchical. That is, if libA.so uses something from libB.so, libB.so does not use anything from libA.so. Furthermore, this hierarchical rule must extend to other libraries in the mix.

Using the notation libA.so->libB.so to mean that libA.so uses something from libB.so, suppose we have this set of dependencies:

 libA.so->libB.so->libC.so
 libC.so->libA.so

This cycle of dependencies must be broken, probably by ensuring that libC.so does not use anything from libA.so. The libraries must be organized into one or more hierarchies. For this example, we must have the following hierarchy:

 Layer 1: libA.so
 Layer 2: libB.so
 Layer 3: libC.so

If it is not possible to avoid the circularity by a simple restructuring of the code, it is probably necessary to pull out common code into another library at a lower level of the hierarchy. In this example libC.so used something from libA.so. This code should be extracted and placed into a new library, libD.so.

 Layer 1: libA.so
 Layer 2: libB.so
 Layer 3: libC.so
 Layer 4: libD.so

The requirement is that libD.so does not use anything from libA.so, libB.so, or libC.so.

Ideally, circular dependencies should be factored out of the application. However, a further concern is that the build process and compiler could actually create duplicate symbols that result in circular dependencies being introduced into an application. This will be discussed in a later article.

Summary of Recommendations

  • Applications should avoid defining duplicate copies of the same symbol. This problem can be detected using the tool lari on the executable.
  • Applications should not contain circular dependencies. Libraries should either not depend on each other, or they should be part of a well-defined hierarchy.
Revision 1, 05/10/2011