Using Symbol Scoping to Avoid Linking Issues

Part VII of Libraries, Linking, Initialization, and C++ Series

By Darryl Gove and Stephen Clamage, July 2011

Part I - Introduction to Libraries and Linking
Part II - Resolving Symbols in Libraries


The default scope for symbols is to be globally visible. This means the symbols defined in one library can be seen and used by other libraries and executables. The nm command coupled with grep is one way of identifying globally scoped symbols. Listing 1 shows the process of compiling a library and then examining the symbol definition.

Listing 1: Global Symbol Scoping
$ CC -g -G -Kpic -o libstat.so stat.cpp
$ nm libstat.so|grep __1cFCstatEshow6M_v_
[Index]   Value      Size      Type  Bind  Other Shndx   Name
...
[87]    |      3848|        60|FUNC |GLOB |0    |9      |__1cFCstatEshow6M_v_

The symbol __1cFCstatEshow6M_v_ is a function with global binding. Note that the Other column contains the value zero, which indicates that the function has the default scope. The Other column can have alternative values that indicate different levels of scope: 2 for hidden scope and 3 for protected scope. If elfdump is used to examine the symbols, it reports the scoping in a more readable format of D for default, H for hidden, and P for protected, as shown in Listing 2.

Listing 2: Using elfdump to Examine Scoping
$ elfdump  libstat.so |grep __1cFCstatEshow6M_v_
[71]  0x00000f90 0x0000003c  FUNC GLOB  D  0 .text          __1cFCstatEshow6M_v_

The next sections discuss how different scope levels can be used to reduce symbol scoping.
 

Using Symbolic Binding to Restrict Symbol Scope

Symbolic binding corresponds to protected scope. If a library contains a function of symbolic scope, that function will be used by the library, and the function will also be available for other libraries to use if they wish. A good way of thinking about this is that the function is being exported for other libraries to use, but it is required that the library to use the function itself.

We can set this to be the default level of scoping for all the functions in the library using the flag -xldscope=symbolic. Listing 3 shows the effect of compiling libstat.so with this flag. The Other column shows the value 3, which indicates protected scope; however, the function is still global.

Listing 3: Using Symbolic Scoping
$ CC -g -G -Kpic -o libstat.so stat.cpp -xldscope=symbolic
$ nm libstat.so|grep __1cFCstatEshow6M_v_
[87]    |      3824|        60|FUNC |GLOB |3    |9      |__1cFCstatEshow6M_v_

With this scoping as the default, a problem application works. This can be observed using truss, as shown in Listing 4.

Listing 4: Using truss to Observe Initialization of an Application
$ truss -u libstat,libdata,a.out ./main
/1@1:   -> libstat:_init(0x0, 0x0, 0xfefd2a40, 0x1)
/1@1:   <- libstat:_init() = 0
/1@1:   -> libdata:_init(0x0, 0x0, 0xfefd2a40, 0x1)
/1@1:     -> libstat:__1cFready6F_v_(0xff3511f4, 0x0, 0x0, 0x0)

There are two things to note here. First, the initialization of libstat.so does not call libdata.so, which avoids the problem where the initialization of libdata.so gets triggered before the initialization of libstat.so has completed. Second, the initialization of libstat.so does not appear to call into any libraries. This is actually an artifact of the way that truss works. Because truss cannot interpose on the calls within the libstat.so library, it does not see the call into Cstat::Cstat().

This introduces the side-effect of using symbolic scoping. It is not possible to interpose all calls to a library-provided function. The calls between libraries are visible, but those within the library are not visible. This means that functionality cannot be reliably replaced by interposing on the symbols. See the section Interposing on Functions for details.

As with direct binding, symbolic binding can result in the definition of multiple copies of the same object or function.
 

Using Hidden Scope to Hide Symbols

It is also possible to hide symbols using -xldscope=hidden. This means that the symbol is bound within the library, but it is not visible outside of the library. If we recompile libstat.so with hidden scoping, we can see that the symbols become local, as shown in Listing 5.

Listing 5: Compiling with Hidden Scope
$ CC -g -G -Kpic -o libstat.so stat.cpp -xldscope=hidden
$ nm libstat.so|grep __1cFCstatEshow6M_v_
[40]    |      3728|        60|FUNC |LOCL |2    |9      |__1cFCstatEshow6M_v_

With this change, the default scoping becomes local, which means that the symbols are not visible to other libraries or the executable. These effects are shown in Listing 6.

Listing 6: Effect of Declaring the Default Scoping as Hidden
$ ./main
ld.so.1: main: fatal: relocation error: file /codes/library/stl2/libdata.so: symbol __1cFready6F_v_: referenced symbol not found
Killed

The advantage of hidden scoping is that the symbols are available only for the internal use of the library. They take no part in the interface of the library. This reduces the chance of there being a problem with multiply defined symbols, and it also has the benefit of simplifying the work that the run-time linker has to perform , which can, therefore, reduce application startup time.

However, to really leverage hidden symbol scoping, it needs to be used with global and symbolic scoping, as discussed next.
 

Using Scoping to Produce the Minimal Interfaces for a Library

We have discussed scoping at the level of the library or executable. However, it is desirable to actually perform this scoping at the function level. In this way, it is possible to identify those functions that need to be exported and those that need to be imported. The functions that do not need to be exported or imported can then be hidden, producing libraries with minimal interfaces.

We can scope individual functions or variables using the specifiers __global, __symbolic, or __hidden. Functions or variables that need to be imported by a library must be defined as __global. Those that we are exporting from a library should be defined as __symbolic, unless the functions need to be interposed upon or if there could be multiple definitions of the symbol and it is important that only a single definition be used. We can then use -xldscope=hidden on the command line to hide all the other symbols.

This introduces one difficulty for our header files. If we have a function that is exported by a library, and that function is declared in the header file, then when another library includes that header file, we need to scope the function as __global, but when the header is included by the library itself, we want to declare the function as __symbolic. The best way around this is to use a #define to control the scoping. The modifications to the test program to build with minimal scope are shown in Listing 7.

Listing 7: Source Code Modified to use Minimal Scoping
$ more stat.cpp
#include <iostream>

#define BUILD_STAT_LIBRARY /*Indicating a library build*/
#include "stat.hpp"

Cstat stats;

void ready()
{
  stats.show();
}

$ more stat.hpp
#include <iostream>

#ifdef BUILD_STAT_LIBRARY
#define SCOPE __symbolic /*Building library, export symbol*/
#else
#define SCOPE __global   /*Building other modules, import symbol*/
#endif

class Cstat
{
  std::string data;
  public:
  SCOPE Cstat() { data = "data"; }
  SCOPE void show() {std::cout<<data;}
};

SCOPE void ready();

$ more data.cpp
#include "stat.hpp"

class Cdata
{
  Cstat stats;
  public:
  __symbolic Cdata() { ready(); }
};

Cdata data;

__symbolic void notready()
{
}

$ more data.hpp
__global void notready();

$ more main.cpp
#include "data.hpp"

int main()
{
  notready();
}

The program can be compiled with the default of hidden scoping, as shown in Listing 8.

Listing 8: Compiling with Hidden Scoping as Default
$ CC -g -G -Kpic -o libstat.so stat.cpp -xldscope=hidden
$ CC -g -G -Kpic -o libdata.so data.cpp -xldscope=hidden -L. -R'$ORIGIN' -lstat
$ CC -g -o main main.cpp -xldscope=hidden -L. -R'$ORIGIN' -ldata

There is a subtle difference between using default scoping and individual scoping. Global symbolic scoping will not scope undefined functions or variables. However placing the __symbolic keyword with the definition scopes variables that are not defined. This means that __symbolic can be applied only to those functions that are defined in the library. The linker reports an error if __symbolic scoping is applied to functions that are not defined, as shown in Listing 9.

Listing 9: __symbolic Scoping Applied to an Undefined Function
$ more lib.cpp
__symbolic extern int value;

void myfunction()
{
  value=0;
}
$ CC -g -G -Kpic -o liblib.so lib.cpp 
Undefined                       first referenced
 symbol                             in file
value                               lib.o  (symbol scope specifies local binding)
ld: fatal: symbol referencing errors. No output written to liblib.so

Compilers since Sun Studio 9 implement the flag -qoption ccfe -xldscoperef=global to force undefined symbols to have global scoping. Listing 10 shows the use of this flag to force the undefined variable value to have global scope and to resolve the linking error.

Listing 10: Using a Flag to Give Undefined Symbols Global Scope
$ more lib.cpp
__symbolic extern int value;

void myfunction()
{
  value=0;
}
$ CC -g -qoption ccfe -xldscoperef=global -G -Kpic -o liblib.so lib.cpp 
$ nm liblib.so | grep value
[61]    |         0|       0|NOTY |GLOB |0    |UNDEF  |value

Interposing on Functions

Sometimes, you intend to allow a function in a library to be replaced, or interposed on, by a different version supplied by an application. A classic example is the set of functions malloc(), realloc(), calloc(), and free() in the basic C library. If such functions are replaced, all calls to them throughout the entire program must be replaced. Otherwise, having different results from calls in different places in the code could cause subtle or catastrophic program errors.

If you intend for a function to be user-replaceable, you must bear in mind two considerations:

  • The function must have global binding, not symbolic or hidden binding. Direct binding also creates some issues for interposing.
  • The function must not be generated inline anywhere in the library.

If the function does not have global binding, at least some references will bind to the library version, and interposition will not be complete.

Direct binding can be overridden by preloading a library and providing the replacement function in that library. An example of using LD_PRELOAD to interpose on direct-linked libraries is shown in Listing 11.

Listing 11: Interposing on Direct-Bound Libraries
$ ./main
In libu1
In lib1
In libu2
In lib2
$ more libpre.c
#include <stdio.h>

void display()
{
  printf("Interposing on display()\n");
}

$ cc -G -Kpic -o libpre.so libpre.c -lc
$ export LD_PRELOAD=./libpre.so
$ ./main
In libu1
Interposing on display()
In libu2
Interposing on display()

It can also be overridden by linking with the -z interpose option. This linker option causes the functions declared in the library to interpose on the existing functions of the same name. An example of this is shown in Listing 12, where initially the interposing library is compiled with out the -z interpose option. Linking this library into the application causes no change in behavior. Once the library is rebuilt with the linker option -z interpose, the behavior of the application changes, and the interposing library is called instead of the directly bound libraries.

Listing 12: Building Interposing Libraries with -z interpose
$ cc -o main main.c -L. -R'$ORIGIN/.' -lu1 -lu2 -lpre
$ ./main
In libu1
In lib1
In libu2
In lib2
$ cc -G -Kpic -o libpre.so libpre.c -lc -zinterpose
$ ./main
In libu1
Interposing on display()
In libu2
Interposing on display()

The function must not be declared inline, and you should add -xinline options when building the library to prevent the optimizer from inlining the function on its own. That is, at high optimization levels, the optimizer might decide to inline a function that was not declared inline. Although it is unlikely that you would want to allow a class member function to be interposed upon, recall that functions defined inside a class are implicitly declared inline.

Example: Suppose you want to allow global function foo() to be interposed upon. You could declare it this way in the library header:

__global int foo(int);

The explicit __global linkage is not overridden by -xldscope options on the command line, and the function is not declared inline. When building the library, you could use the option -xlnline= to disable all optimizer inlining, or you could use -xinline=no%zzzz (where zzzz is the mangled name of the function) to disallow optimizer inlining of just this function.
 

Summary of Recommendations

  • For situations where it is not desirable to interpose on symbols defined in the libraries and correct behavior does not rely on only a single definition of a symbol being used, using the flag -xldscope=symbolic ensures that all libraries use their local definition of a symbol in preference to a definition provided by another library. This might be the easiest way of adding scoping to existing code.
  • For new code, or code where a more rigorous scoping is desired, compile the libraries and applications with -xldscope=hidden. This ensures that all unscoped symbols are hidden and, consequently, inaccessible to other libraries. For symbols that should be exported by a library, prefix their definitions by the keyword __symbolic. For symbols that are to be imported by a library, scope them with the prefix __global.
  • When using symbolic scoping, consider whether having multiple live copies of an object or function would affect program correctness. If it would, use global scoping for such objects or functions.
Revision 1, 07/11/2011