Introduction to Libraries and Linking

Part I of Libraries, Linking, Initialization, and C++ Series

By Darryl Gove and Stephen Clamage, May 2011

Part I - Introduction to Libraries and Linking
Part II - Resolving Symbols in Libraries

When an application starts, the run-time linker is responsible for loading all the libraries the application requires. The linker uses smart algorithms to determine the order in which the libraries are loaded, but some coding styles can cause problems and result in unanticipated behavior.

The first article in this series describes how to link libraries so that applications are able to locate them at run time.

Linking to Libraries

Suppose we have an application that depends on two libraries. The source code for a simple application and libraries is shown in Listing 1.

Listing 1: Application and Libraries
 $ more main.c
   #include <stdio.h>
 void f1();
   void f2();
 void main()
   {
   f1();
   f2();
   printf("In main\n");
   }
 $ more lib1.c
   #include <stdio.h>
 void f1()
   {
   printf("In library 1\n");
   }
 $ more lib2.c
   #include <stdio.h>
 void f2()
   {
   printf("In library 2\n");
   }

To build the libraries, we need to compile with the flag -G, to tell the compiler that the output is a library, and the flag -Kpic, to tell the compiler to produce position-independent code. Position-independent code allows the library to work when placed at any location in memory, and this in turn allows the same library image to be shared between multiple processes. In addition, position-independent code reduces the amount of relocation that needs to be done, speeding the linking process. The steps to produce the two libraries are shown in Listing 2.

Listing 2: Compiling Two Libraries
 $ cc -G -Kpic lib1.c -o lib1.so -z text
 $ cc -G -Kpic lib2.c -o lib2.so -z text

The linker flag, -z text, causes the linker to report an error if the object files contain any nonrelocatable code, as shown in Listing 3.

Listing 3: Attempting to Build a Library Using Nonrelocatable Object Files
 $ cc -G lib2.c -o lib2.so -z text
   Text relocation remains                         referenced
   against symbol                  offset      in file
   .rodata1 (section)                  0x14        lib2.o
   .rodata1 (section)                  0x18        lib2.o
   printf                              0x1c        lib2.o
   ld: fatal: relocations remain against allocatable but non-writable sections

When building the main application, we need to tell the complier to link to these two shared libraries. The flag to link to libraries is -l. So, the first attempt at linking might produce the results shown in Listing 4.

Listing 4: Attempt at Linking Without Specifying Location of Libraries
 $ cc -o main main.c -l1 -l2
   ld: fatal: library -l1: not found
   ld: fatal: library -l2: not found
   ld: fatal: File processing errors. No output written to main

The compiler does not know where to find the libraries, so it is unable to link to them. It would be tempting to explicitly list the libraries on the link line, as shown in Listing 5.

Listing 5: Explicitly Listing Libraries in the Link Command
 $ cc -o main main.c lib1.so lib2.so
   $ ./main
   ld.so.1: main: fatal: lib1.so: open failed: No such file or directory

Explicitly listing the libraries seems to work, but the application fails at run time. The obvious, but bad, workaround for this is to use LD_LIBRARY_PATH to specify where the libraries are found. Listing 6 shows this poor approach.

Listing 6: Using LD_LIBRARY_PATH to Mask Bad Development Practices
 $ export LD_LIBRARY_PATH=`pwd`
   $ ./main
   In library 1
   In library 2
   In main

The reason this is a poor choice is that the application now depends on environment settings to function correctly. This means it needs to be launched using a script to set the environment settings. It also opens up the potential for the program to fail to work with other similarly coded applications, or even for the application to pick up similarly named libraries from other applications. The environment flag LD_LIBRARY_PATH is useful during development to test alternative library implementations, but it should ''not'' be used as part of a production environment.

Linking Correctly

The correct approach to linking is to use the -L option, to specify where the linker can find the libraries at link time, and the -R option, to specify where the application can find them at run time. Listing 7 shows how -L and -R can be set so the application builds and runs correctly without needing LD_LIBRARY_PATH.

Listing 7: Using -L and -R to Set Compile Time and Run-time Library Paths
 $ cc -o main main.c -L. -R. -l1 -l2
   % ./main
   In library 1
   In library 2
   In main

The compile command in Listing 7 specifies that the linker search the current directory both at compile time and at run time to locate the required libraries. At compile time, the build process can control the current directory, so this works. At run time, it is not possible to control the current directory and, as a result, this approach can fail, as shown in Listing 8.

Listing 8: Using Current Directory to Specify Run-Time Library Location Can Fail
 $ codes/library/main
   ld.so.1: main: fatal: lib1.so: open failed: No such file or directory
   Killed

Another way of fixing this problem would be to use absolute paths to the locations of the libraries. This works very well for the system libraries, since these appear in exactly the same place on every system. It does not work as well for applications and libraries that are delivered to users. A user might not have permission to write to the chosen directory, there might be conflicting versions of the application installed on the system, and so on.

A better solution is to use relative paths to specify the location of the libraries. This can be achieved using the $ORIGIN token at link time. This token tells the run-time linker that the run-time path is specified relative to the location of the module.

An example of this approach is shown in Listing 9. In this case, the $ORIGIN token is used to indicate that the two libraries can be found in the same directory as the executable. The same relative path approach can be used to specify where a library might locate the libraries it depends on.

Listing 9: Using the $ORIGIN Token to Indicate Run-Time Location of Libraries
 $ cc -o main main.c -L. -R'$ORIGIN' -l1 -l2
   $ cd ../..
   $ codes/library/main
   In library 1
   In library 2
   In main

Escaping the $ORIGIN Token

One complication with using the $ORIGIN token is that it needs to be escaped so that the shell does not process it. The exact escape sequence might depend on what shell or shell version is used. The common sequence of using single quotes is shown in Listing 9. The situation is complicated in a makefile, where two levels of escaping are necessary in order for the token to be correctly presented to the linker, as shown in Listing 10.

Listing 10: Using the $ORIGIN Token in a Makefile
RUNPATH = -R \$$ORIGIN

Real-World Applications

The examples presented so far, with the application and libraries in the current directory, are not very realistic. Real-world applications are typically either installed in a dedicated directory tree or scattered around standard locations for applications and libraries. The paper "Using and Redistributing Solaris Studio Libraries in an Application" discusses best practices for distributing libraries that come with the compiler, but the same principles apply to libraries you provide with your application.

Summary of Best Practices

  • Use -L to specify the path to where the libraries can be found at compile time.
  • Use -R to specify the location of the libraries at run time.
  • Use the token $ORIGIN to specify a relative path for the libraries' location. This avoids the need to have a hard-coded location where the libraries can be found.
       
Revision 1.0, 04/22/2011