Stability of the C++ ABI: Evolution of a Programming Language


Updated March 2011

By Stephen Clamage, Oracle Solaris Studio Tools Development Engineering

As C++ evolved over the years, the Application Binary Interface (ABI) used by a compiler often needed changes to support new or evolving language features. Subsequently, programmers are expected to recompile all their binaries with every new compiler release. However, an unstable ABI is incompatible with the Oracle Solaris philosophy of shared libraries, and it is a nightmare for library and middleware vendors. With the advent of the C++ Standard in 1998, there was new hope for a stable C++ ABI on Solaris platforms. This paper addresses these issues in Oracle Solaris Studio C++, and what you can expect when you develop programs using Oracle Solaris Studio C++.

Contents

Introduction

The ABI of a programming-language implementation is a specification of all the low-level details that allow separately compiled modules to work together. Without a stable ABI, all parts of a program must be compiled with the same version of the same compiler. That situation creates a maintenance nightmare for distributed projects, particularly for suppliers of binary libraries. The early rapid evolution of the C++ programming language precluded a stable ABI. The advent of the C++ international standard in 1998 (ISO/IEC 14882:1998 Programming Languages - C++) provided a base for a stable C++ ABI, at least for a given C++ implementation. This article explores the stability question for Oracle Solaris Studio C++ compilers.

The C ABI

The Oracle Solaris ABI is also the C ABI, because C is the standard UNIX implementation language. Among other things, the C ABI specifies:

  • Size and layout of predefined types (char, int, float, and so on)
  • Layout of compound types (arrays and structs)
  • External (linker-visible) spelling of programmer-defined names
  • Machine-code function-calling sequence
  • Stack layout
  • Register usage

The C++ ABI

The C++ ABI includes the C ABI. In addition, it covers the following features:

  • Layout of hierarchical class objects, that is, base classes and virtual base classes
  • Layout of pointer-to-member
  • Passing of hidden function parameters (for example, this)
  • How to call a virtual function:
    • Vtable contents and layout
    • Location in objects of pointers to vtables
    • Finding adjustment for the this pointer
  • Finding base-class offsets
  • Calling a function via pointer-to-member
  • Managing template instances
  • External spelling of names ("name mangling")
  • Construction and destruction of static objects
  • Throwing and catching exceptions
  • Some details of the standard library:
    • Implementation-defined details
    • typeinfo and run-time type information
    • Inline function access to members

Name Mangling

C++ allows different functions to have the same name, and it allows an unbounded number of scopes where different global entities with the same name can be declared. Example:

int     f(int);
float   f(float);
class T {
        int f(int);
        int f(char*);
        class U {
                 int f(int);
        };
};
namespace N {
        class T {
                 int f(int);
        };
}

This example has two classes named T and six functions named f, some of which are in the same scope. All the functions have external linkage. To differentiate entities with the same name, the C++ implementation must make references to these functions unique. To ensure that references to the same entity from different modules can be resolved correctly, the method of making references unique must be predictable.

The usual scheme involves decorating the name of the entity with encodings of the scope names, along with the parameter types and return type if it is a function. The resulting names appear to be scrambled, or "mangled." For example, the names of the six functions above would be encoded by the Oracle Solaris Studio C++ compiler as follows.

Examples of Mangled Function Names:

Function Mangled Name
float f(float) __1cBf6Ff_f_
int f(int) __1cBf6Fi_i_
int T::f(int) __1cBTBf6Mi_i_
int T::f(char*) __1cBTBf6Mpc_i_
int T::U::f(int) __1cBTBUBf6Mi_i_
int N::T::f(int) __1cBNBTBf6Mi_i_

C++ also provides a way to specify that a name is accessible from C code and, therefore, should not be mangled.

Name Mangling and ABI

The name-mangling algorithm is part of the ABI, because it defines how a compiler must generate external references and definitions for program entities. If two compilers or compiler versions do not mangle equivalent declarations the same way, a program composed of parts compiled from the two compilers will not link correctly.

Hierarchical Layout

C++ allows user definition of hierarchies of class types, wherein a "derived" class implicitly includes all the data and functions of the classes from which it inherits. An ordinary base class is laid out in an object similarly to a member of class type, at a fixed offset from the start of the complete object. Example:

class Base {
      ...
};
class Derived : public Base {
        int i, j;
};
class Composed {
        Base b;
        int i, j;
};

In many C++ implementations, the layout of classes Derived and Composed will be the same.

A pointer to a complete object can be converted to a pointer to one of its base classes, but the address the pointer represents must be adjusted by the offset of the base class within the complete object.

A C++ class can have more than one immediate base class, a feature called multiple inheritance.

If classes A and B each have a base class Z, a class C derived from both A and B could have two copies of Z. Sometimes, it is appropriate for C to have two independent copies of Z. Other times, Z represents a resource of which there must be only one copy.

To specify that there is to be only one copy of a base class in a hierarchical object, that base class can be declared "virtual."

The offset of a virtual base class relative to an intermediate class depends on the entire hierarchy. Example:

class Z {
        ...
};
class A : virtual public Z { // has one instance of Z
        ...
};
class B : virtual public Z { // has one instance of Z
        ...
};
class C : public A, public B { // has only one instance of Z
        ...
};

Suppose in an A object, the Z portion is at offset OA, and in a B object, it is at offset OB. There is only one copy of Z in a C object. It cannot simultaneously be at offset OA from the A portion and at offset OB from the B portion. At least one of these offsets must be different when the entire object is of type C.

Given a pointer to A, the location of the Z sub-object, therefore, cannot be determined at compile time, because the A object might be in turn a sub-object of some more complex type, such as C. The run-time system must allow for the dynamic determination of the type of the complete object so that the offsets of other objects can be found.

C++ implementations typically store the offset information for each object type in an auxiliary table, often called a vtable. There is usually one vtable for each type that needs one, shared by all objects of that type. An object needing a vtable then contains a pointer to the vtable. The vtable also contains addresses of virtual functions to allow dynamic function dispatch based on the actual object type referred to by a pointer or reference.

The C++ Standard Library

The C++ standard defines the names and properties of types and functions in the library, as well as the programming interface to the library. Source code written to the specification is, therefore, portable among conforming implementations. The binary interface is a different story, however.

The C++ standard allows considerable variation in implementation details, as long as the programming interface is not affected. Many of those implementation details therefore become part of the ABI -- particularly the size of class objects.

Many parts of the standard library are best implemented with inline functions to enhance performance. Somewhat like a macro in C, a call to an inline function is replaced by the body of the function. If the function accesses members of a class defined in the standard library, the location of the class members become built into the code of application programs that use the inline function. Anything referred to by an inline function is, therefore, part of the C++ ABI.

Even if an enhancement or bug fix to the standard library does not affect the programming interface, the change would affect the ABI if it altered the size or layout of classes defined in the library.

The Sources of ABI Instability

A new or changed language feature can require a change, not just an extension, to an ABI. Here are two examples:

  • The C++ standard allows an overriding virtual function to have a return type different from the function it overrides. The return type must be a pointer or reference type, and the return type of the function in the derived class must refer to a type derived from the type referred to by the function it overrides. Example:

    class Base {
            virtual Base* clone();
    };
    class Derived : public Base {
            virtual Derived* clone();
    };
    void f(Base* p)
    {
            Base* copy = p->clone();
    }
    

    The compiler cannot know whether the call to clone will return a Base* or a pointer to a derived type. The ABI must provide a way to accomplish the correct pointer adjustment no matter what type is returned. The required mechanism would not have been available in a pre-standard ABI.

  • Consider a template function specialization and a non-template function with the same name and type:

       template<class T> T min(T, T) { ... }
       int min<int>(int, int); // old specialization syntax
       int min(int, int); // non-template
    

    Under old language rules, a non-template function with the same name and type as a template function was considered to be a specialization of the template. Such a function and the corresponding specialization must, therefore, have the same mangled name.

    Under the rules in the C++ standard, they are distinct functions and must have different mangled names. The external name of at least one of the functions must change compared to a pre-standard ABI.

Fixing some bugs requires an ABI change. Here are two examples:

  • Early C++ compilers typically could not support calls to functions in a virtual base class from the constructor or destructor of a derived class under some circumstances. Eventually a reasonable-cost solution to this problem was invented, but it required a different vtable organization, and a different way of calling constructors and destructors.

  • The Solaris Studio C++ compiler in -compat=5 mode generates different mangled names for some function declarations that are supposed to be equivalent. Fixing the bug would mean that some existing functions get a different mangled name, an ABI change.

The Consequences of ABI Instability

Any difference in the ABI can mean that object files from different compilers will not link, or, if they do link, they will not run correctly. (To help prevent code generated for different ABIs from accidentally linking, different compiler implementations typically use different name mangling schemes.)

In the early days of C++, when the language was evolving rapidly, ABIs changed frequently. C++ programmers were accustomed to recompiling everything whenever they updated a compiler.

Suppose an application uses an ORB library from vendor A and a database library from vendor B. The vendors do not wish to distribute source code, and so they provide binary libraries. The application code and the libraries from both vendors must all use the same ABI.

If every compiler release has a different ABI, application programmers will not want to upgrade compilers frequently. It would mean coordinating the upgrade among all developers on the project, and recompiling everything on the official upgrade installation date.

If vendor A and vendor B must support many clients, each of whom is using a different compiler release, they must release and support a library for each compiler version. This situation is very expensive in resources, and typically is not feasible. Under this scenario, vendors might release source code, and clients would build the libraries themselves. That in turn creates new support problems, since different clients will use different tool sets, and the build scripts must be configured to conform to local practices.

The Oracle Solaris vision of shared libraries is not well-supported by the scenario above. A different version of a C++ shared library must be generated for every supported ABI variation. Even when a compiler is no longer supported, programs may exist in the field that depend on using an old shared library. The obsolete library versions must continue to be shipped for a long time.

Successful distribution of libraries as products--particularly shared libraries--depends on having a stable ABI.

A History of Oracle Solaris Studio C++ ABIs

Major releases of Oracle Solaris Studio C++ compilers have always used incompatible ABIs, in accordance with the engineering taxonomy of release numbers: A new major version number signifies an incompatible release.

Beginning with C++ 3.0, Sun attempted to inject some stability into the C++ ABI. The C++ runtime support library became a shared library shipped along with Oracle Solaris: libC.so.3.

But it was also recognized that C++ was still evolving, and work was in progress on a C++ standard that would doubtless change some important details. Accordingly, Sun policy was that no Sun software product could export a C++ interface. With no exported C++ interface, it might be reasonable to "recompile everything" when the ABI changed.

C++ 4.0, released in 1993, introduced a new, incompatible ABI. This ABI was intended to be stable. The C++ development team solicited input from major Sun clients, and even competitors, on the ABI design, and the team accepted some suggestions from outside. The ABI was published as a public document. The new C++ support library, libC.so.5, was added to Oracle Solaris shipments.

Over time, some bugs were found that required small changes in name mangling. These bugs were corrected, and users were provided with ways to restore the previous behavior when it was necessary to link with older code. C++ 4.2, released in 1996, represented the final version of this ABI.

This ABI contained known bugs, such as the virtual base-class problem described in the The Sources of ABI Instability section. In addition, work on the C++ standard was nearing completion, and the standard was known to contain features that would require a different ABI. The change in template semantics described in the The Sources of ABI Instability section is one of several changes that affected the ABI.

To avoid the scenario of a constantly changing ABI that would result from trying to track the evolving C++ standard, the C++ development team adopted the strategy of implementing in C++ 4.2 only those features that were assumed to remain stable and that did not require an ABI change. At the cost of lagging behind in C++ features, Sun provided a stable ABI for its customers.

The Runtime Libraries

The earlier C++ runtime library consisted of an I/O library known as "iostreams" and the run-time "helper" functions for the compiler, including support for heap memory allocation, exception handling, and dynamic type information.

The library specified in the C++ standard includes an extensive set of template classes and functions, including strings, iostreams, numerics, and the "STL."

Due to time pressure, version 5.0 of the C++ compiler did not implement all the features of the C++ standard. For example, parts of the standard library definition involve templates as members of classes, a feature not supported by C++ 5.0. In the library to be delivered with the compiler, those parts were missing, or they were implemented slightly differently.

For those reasons, the library implementation was split into two parts:

  • libCrun, consisting of the compiler helper functions, including support for heap memory allocation, exception handling, and dynamic type information
  • libCstd, consisting of the remainder of the C++ standard library

Here an ABI, There an ABI...

Sun (and now, Oracle) policy deems continued compatibility more important than conformance to the C++ standard, and more important even than correctness. Even though libCstd was sub-standard and some bugs were found in name mangling, those deficiencies would remain in the following releases.

Because libCstd would remain binary compatible, it could be shipped as a shared library. C++ 5.2 shipped with libCstd.so.1 as an optional library, and C++ 5.3 provided libCstd.so.1 as part of an Oracle Solaris package. To support customers who need more of the features of the C++ standard library but do not need binary compatibility, C++ 5.4 also shipped with the open source STLport implementation of the C++ standard library.

ABI Status, March 2011

The current shipping compiler is C++ 5.11, a component of Oracle Solaris Studio 12.2.

The C++ compilers from 5.0 through 5.11 provide a mode compatible with C++ 4.2, and a default mode that supports the C++ standard. The two modes represent two different ABIs. For a given mode, all the 5.x compilers generate binary-compatible code.

C++ 5.11 is also supported on selected Linux platforms. Since developers on Linux are likely to want to link with code compiled by Gnu C++ (g++), the Linux compilers also provide a mode where they generate the same ABI as g++ (which is very different from the default Solaris Studio C++ ABI).

Despite the compatibility policy, some customers have demanded fixes for ABI bugs that prevent their programs from working. To satisfy these customers, the compiler has an undocumented option that generates a correct, but incompatible, ABI. Customers who do not depend on third-party binary libraries can use the corrected ABI, provided they are careful to compile all their code with the option. The libraries that ship with the C++ compiler do not trigger any of the ABI bugs, so no separate version of those libraries is required.

To support customers who need a more standard-conforming library but who do not need compatibility with libCstd, the compiler ships with the open source library from STLport. In addition, C++ 5.10 and 5.11 directly support the use of Apache stdcxx on Oracle Solaris when installed in a designated location. Finally, the compiler also provides a relatively easy way to substitute a third-party standard library for the ones shipped with the compiler.

The Future

There is no question that the default C++ ABI now in use must continue to be supported for some years. That means shipping compilers that generate this ABI, along with compatible versions of libraries. The mode compatible with C++ 4.2 has been deprecated for several years, and will not be supported in future releases.

At this writing, a new C++ standard is nearing publication. The existing Solaris Studio ABI cannot support all the new features of the enhanced C++ language, and the C++ Standard Libary has been modified in incompatibile ways. Yet another ABI will be required for the new standard, along with a new library.

Presumably g++ will also generate a different ABI and will have a new library, so continued compatibility with g++ would require supporting the current and the future g++ ABI.

About the Author

Steve Clamage has been at Sun (now, at Oracle) since 1994. He is currently technical lead for the C++ team, and involved with all aspects of the Oracle Solaris Studio product.. He has been chair of the ANSI C++ Committee since 1995.