How to Find Out What's in an Oracle Solaris Binary File

by Miriam Blatt, January 2012

How to determine the contents of Oracle Solaris binaries and what tools you can use to read, extract, and delete sections. Plus, the effect of compiler flags on binary file size and how to reduce the size of the executable.

When the size of application object files and executable binary files becomes an issue, it can be helpful to know what tools you can use to see what's inside the files. It's also helpful to understand why the content is there and what parts can be reduced in size.

OTN is all about helping you become familiar enough with Oracle technologies to make an informed decision. Articles, software downloads, documentation, and more. Join up and get the technical resources you need to do your job.

This article reviews the contents of binaries and tools that can be used to read, extract, and delete sections. It ends with a discussion of the effect of compiler flags on binary file size and how to reduce the size of the executable.

What Is Inside a Binary file?

If you've ever wondered what is lurking inside Oracle Solaris binaries, here's a way to see a quick summary:



% /usr/sfw/bin/greadelf -SW /bin/cat

The SPARC binary we are looking at is /bin/cat, which will be used for examples throughout this article. This greadelf command generates the output shown in Listing 1.

Listing 1. Output of greadelf Command



% /usr/sfw/bin/greadelf -SW /bin/cat 
There are 23 section headers, starting at offset 0x247c:

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .interp           PROGBITS        000100f4 0000f4 000011 00   A  0   0  1
  [ 2] .hash             HASH            00010108 000108 0001ac 04   A  3   0  4
  [ 3] .dynsym           DYNSYM          000102b4 0002b4 000360 10   A  4   1  4
  [ 4] .dynstr           STRTAB          00010614 000614 0001e2 00  AS  0   0  1
  [ 5] .SUNW_version     VERNEED         000107f8 0007f8 000030 00   A  4   1  4
  [ 6] .rela.data        RELA            00010828 000828 00000c 0c  AI  3  11  4
  [ 7] .rela.bss         RELA            00010834 000834 000024 0c  AI  3  13  4
  [ 8] .rela.plt         RELA            00010858 000858 000150 0c  AI  3   f  4
  [ 9] .text             PROGBITS        000109a8 0009a8 001200 00  AX  0   0  4
  [10] .init             PROGBITS        00011ba8 001ba8 00000c 00  AX  0   0  4
  [11] .fini             PROGBITS        00011bb4 001bb4 00000c 00  AX  0   0  4
  [12] .rodata           PROGBITS        00011bc0 001bc0 000004 00   A  0   0  4
  [13] .rodata1          PROGBITS        00011bc4 001bc4 0001a9 00   A  0   0  4
  [14] .got              PROGBITS        00022000 002000 000004 04  WA  0   0 8192
  [15] .plt              PROGBITS        00022004 002004 000184 0c WAX  0   0  4
  [16] .dynamic          DYNAMIC         00022188 002188 0000b8 08  WA  4   0  4
  [17] .data             PROGBITS        00022240 002240 000048 00  WA  0   0  8
  [18] .data1            PROGBITS        00022288 002288 000017 00  WA  0   0  4
  [19] .bss              NOBITS          000222a0 0022a0 008360 00  WA  0   0  8
  [20] .comment          PROGBITS        00000000 00229f 000025 00      0   0  1
  [21] .shstrtab         STRTAB          00000000 0022c4 0000a7 00   S  0   0  1
  [22] .SUNW_signature   LOOS+ffffff6    00000000 00236b 00010e 00   p  0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings)
  I (info), L (link order), G (group), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor specific)

When compiling a program, you can think of the binary as containing assembler instructions generated from the source code. These compiled instructions can be found in section .text, which is one of the many sections inside the binary file.

Sections fall into two categories: program data and link-editing information. Program data is the portion of the binary that is meaningful only to the application, and it consists of a number of sections, including compiled instructions and data to be initialized before the program starts. Link-editing information contains several more sections, including symbol and string tables as well as relocation information. The link-editing information sections are interpreted by the linker to modify other sections.

For the descriptions below, the section numbers shown in the first column of the output in Listing 1 (for example, [1]) are provided for reference. Note, however, that section numbers vary for different binaries; it is only the section names that are common to all binaries.

The first section in the /bin/cat binary is [1].interp. This is the pathname to the program interpreter, also known as the runtime linker, which interprets the information inside the binary, loads the program and initialized data into memory, and then starts the program.

In the example in Listing 1, /bin/cat is an Oracle Solaris 32-bit binary, for which the [1].interp field contains the string /usr/lib/ld.so.1. A 64-bit SPARC binary would have /usr/lib/sparcv9/ld.so.1, and a 64-bit Oracle Solaris x86 binary would have /usr/lib/amd64/ld.so.1.

The next sections are for dynamic linking: [3].dynsym is the symbol table with global symbols for dynamic linkage, [4].dynstr contains strings for the dynamic symbol table, and [2].hash is a hash table to access entries inside the dynamic symbol table. Section [5].SUNW_version contains version information for global symbols in the dynamic symbol table, [3].dynsym.

Section names containing .rel have relocation information for items to be relocated by the runtime linker. Relocations for section x can be found for Oracle Solaris 10 in section .rel.x on Intel platforms and on .rela.x on SPARC platforms. For Oracle Solaris 11, most relocations are combined into .SUNW_reloc.

In the SPARC-based Oracle Solaris 10 example in Listing 1, the relocation sections are [6].rela.data, [7].rela.bss, and [8].rela.plt, all with the type .rela. These are relocations for the initialized data in [17].data, the uninitialized data in [19].bss, and the procedure linkage table, [15].plt.

The next several sections contain program information, indicated by the type PROGBITS. Section [10].init contains initialization instructions to be executed before the program begins or—for objects loaded with lazy loading or dlopen( )—before they execute. The compiled program binary instructions are in [9].text. The code in [11].fini is executed after the program finishes execution or when objects are unloaded.

There are several sections for data, depending on whether the data is initialized before starting the program and whether it will be changed by the running program. Read-only data is stored in [12].rodata and [13].rodata1. Data initialized before starting the program that will be both read and written by the program at runtime is in [17].data and [18].data1. Space reserved for uninitialized read/write data is indicated by [19].bss. The runtime linker allocates the space needed for .bss and fills it with zeros. Note that although .bss is the largest section in /bin/cat, it takes up no space inside the binary file on disk, as indicated by the type NOBITS. For this example, the .bss size = 0x8360 = 33,632 bytes.

There are more program information sections labeled PROGBITS. Section [14].got contains the global offset table with indirect addresses for externally accessible data locations in the libraries being linked. Section [15].plt is the procedure linkage table, which contains indirect addresses for externally accessible functions in the libraries being linked.

Section [16].dynamic contains a cache of information required by ld.so.1, including the starting address and size of the symbol, relocation, and string tables.

Sections [17].data and [18].data1 contain initialized program data, as mentioned earlier. Section [19].bss reserves space for uninitialized program data.

Section [20].comment contains version information for system header files and compiler components.

Section [21].shstrtab is a table of name strings for all sections. For this example, the first name string is .interp and the last is .SUNW_signature.

Section [22].SUNW_signature contains a module verification signature used for security checking.

For more information about the contents of binaries, look in the Linker and Libraries Guide, Chapter 7, "Object File Format." Binary file sections are described under "Special Sections." The Linker and Libraries Guide can be found here:

The example in Listing 1 uses greadelf to provide a quick, readable summary. Note, however, that it truncates long section header names, keeping the table nicely lined up and easy to read. To see the full names, use dump instead:



% dump -hv /bin/cat

The dump command generates the same information in a format more easily read by scripts than by human eyes, as shown in Listing 2.

Listing 2. Output of dump Command



% dump -hv /bin/cat
 **** SECTION HEADER TABLE ****
[No]    Type    Flags   Addr          Offset        Size                Name
        Link    Info    Adralgn       Entsize

[1]     PBIT    -A-     0x100f4      0xf4         0x11          .interp
        0       0       0x1          0

[2]     HASH    -A-     0x10108      0x108        0x1ac         .hash
        3       0       0x4          0x4

[3]     DYNS    -A-     0x102b4      0x2b4        0x360         .dynsym
        4       1       0x4          0x10

[4]     STRT    -A-     0x10614      0x614        0x1e2         .dynstr
        0       0       0x1          0

[5]     VERN    -A-     0x107f8      0x7f8        0x30          .SUNW_version
        4       1       0x4          0

[6]     RELA    -A-     0x10828      0x828        0xc           .rela.data
        3       17      0x4          0xc

[7]     RELA    -A-     0x10834      0x834        0x24          .rela.bss
        3       19      0x4          0xc

[8]     RELA    -A-     0x10858      0x858        0x150         .rela.plt
        3       15      0x4          0xc

[9]     PBIT    -AI     0x109a8      0x9a8        0x1200        .text
        0       0       0x4          0

[10]    PBIT    -AI     0x11ba8      0x1ba8       0xc           .init
        0       0       0x4          0

[11]    PBIT    -AI     0x11bb4      0x1bb4       0xc           .fini
        0       0       0x4          0

[12]    PBIT    -A-     0x11bc0      0x1bc0       0x4     .rodata
        0       0       0x4          0

[13]    PBIT    -A-     0x11bc4      0x1bc4       0x1a9         .rodata1
        0       0       0x4          0

[14]    PBIT    WA-     0x22000      0x2000       0x4           .got
        0       0       0x2000       0x4

[15]    PBIT    WAI     0x22004      0x2004       0x184         .plt
        0       0       0x4          0xc

[16]    DYNM    WA-     0x22188      0x2188       0xb8          .dynamic
        4       0       0x4          0x8

[17]    PBIT    WA-     0x22240      0x2240       0x48          .data
        0       0       0x8          0

[18]    PBIT    WA-     0x22288      0x2288       0x17          .data1
        0       0       0x4          0

[19]    NOBI    WA-     0x222a0      0x22a0       0x8360        .bss
        0       0       0x8          0

[20]    PBIT    ---     0            0x229f       0x25          .comment
        0       0       0x1          0

[21]    STRT    ---     0            0x22c4       0xa7          .shstrtab
        0       0       0x1          0

[22]    SIGN    ---E    0            0x236b       0x10e         .SUNW_signature
        0       0       0x1          0
  

On Linux, there is no dump command, and greadelf can be found under the name readelf.

Looking at the Contents of Sections

When binary files are large, it can be of interest to figure out what is inside each section that takes up space. There are multiple ways to look inside the sections of a binary.

When sections contain strings only, the easiest way is to use mcs:



  % mcs -p -n .shstrtab /bin/cat
  

The mcs command generates a list of section names, as shown in Listing 3.

Listing 3. Output of mcs Command



  
% mcs -p -n .shstrtab /bin/cat
.interp
.hash
.dynsym
.dynstr
.SUNW_version
.rela.data
.rela.bss
.rela.plt
.text
.init
.fini
.rodata
.rodata1
.got
.dynamic
.data1
.comment
.shstrtab
.SUNW_signature

However, many sections have strings interspersed with binary data. For these, mcs will generate unprintable characters, which is not likely to be helpful. Instead, use gobjdump, which generates the binary data together with any printable ASCII characters found inside it, for example:



% gobjdump -s -j .rodata1 /bin/cat
  

Using the gobjdump command generates the output shown in Listing 4.

Listing 4. Output of gobjdump Command



% gobjdump -s -j .rodata1 /bin/cat
/bin/cat:     file format elf32-sparc

Contents of section .rodata1:
 11bc4 00000000 53554e57 5f4f5354 5f4f5343  ....SUNW_OST_OSC
 11bd4 4d440000 75737674 65626e00 75736167  MD..usvtebn.usag
 11be4 653a2063 6174205b 202d7573 76746562  e: cat [ -usvteb
 11bf4 6e205d20 5b2d7c66 696c655d 202e2e2e  n ] [-|file] ...
 11c04 0a000000 6361743a 2043616e 6e6f7420  ....cat: Cannot
 11c14 73746174 20737464 6f75740a 00000000  stat stdout.....
 11c24 72000000 6361743a 2063616e 6e6f7420  r...cat: cannot
 11c34 6f70656e 2025730a 00000000 6361743a  open %s.....cat:
 11c44 2063616e 6e6f7420 73746174 2025730a   cannot stat %s.
 11c54 00000000 6361743a 20696e70 75742f6f  ....cat: input/o
 11c64 75747075 74206669 6c657320 27257327  utput files '%s'
 11c74 20696465 6e746963 616c0a00 6361743a   identical..cat:
 11c84 20636c6f 73652065 72726f72 0a000000   close error....
 11c94 6361743a 20636c6f 73652065 72726f72  cat: close error
 11ca4 0a000000 6361743a 20636c6f 73652065  ....cat: close e
 11cb4 72726f72 00000000 6361743a 2063616e  rror....cat: can
 11cc4 6e6f7420 72656164 2025733a 20000000  not read %s: ...
 11cd4 6361743a 20777269 74652065 72726f72  cat: write error
 11ce4 3a200000 00000000 6361743a 206d6d61  : ......cat: mma
 11cf4 70206572 726f7200 6361743a 206e6f20  p error.cat: no
 11d04 6d656d6f 72790000 6361743a 206f7574  memory..cat: out
 11d14 70757420 6572726f 72202825 642f2564  put error (%d/%d
 11d24 20636861 72616374 65727320 77726974   characters writ
 11d34 74656e29 0a000000 00000000 6361743a  ten)........cat:
 11d44 20696e70 75742065 72726f72 206f6e20   input error on
 11d54 25733a20 00000000 00000000 25366409  %s: ........%6d.
 11d64 00000000 25366409 00                 ....%6d..

You can see the contents of the .text section using the disassembler, for example:



  % dis /bin/cat
  

Using the dis command generates full disassembly, including the .init and .fini sections, as shown in Listing 5.

Listing 5. Output of dis Command



% dis /bin/cat
section .text
_start()
    _start:   bc 10 20 00  clr       %fp
    _start+0x4:             e0 03 a0 40  ld        [%sp + 0x40], %l0
    _start+0x8:             13 00 00 a9  sethi     %hi(0x2a400), %o1
    _start+0xc:             e0 22 61 f8  st        %l0, [%o1 + 0x1f8]
...
main()
    main:                   9d e3 be 60  save      %sp, -0x1a0, %sp
    main+0x4:               19 00 00 46  sethi     %hi(0x11800), %o4
    main+0x8:               ba 10 20 00  clr       %i5
    main+0xc:               f0 23 a0 5c  st        %i0, [%sp + 0x5c]
...
section .init
_init()
    _init:                  9d e3 bf a0  save      %sp, -0x60, %sp
    _init+0x4:              81 c7 e0 08  ret
    _init+0x8:              81 e8 00 00  restore

section .fini
_fini()
    _fini:                  9d e3 bf a0  save      %sp, -0x60, %sp
    _fini+0x4:              81 c7 e0 08  ret
    _fini+0x8:              81 e8 00 00  restore

Looking for Symbols Inside a Data Section

Looking inside a data section, you will find a lot of binary data that might not be particularly helpful for understanding where the data came from. To figure out what is taking space in a data section, you can extract a list of symbol names for the section using elfdump, for example:



% elfdump -sN.dynsym /bin/cat | fgrep .data

Using elfdump generates the list of symbols from the .dynsym section that are located in .data or .data1:



% elfdump -sN.dynsym /bin/cat | fgrep .data
       [4]  0x0002229f 0x00000000  OBJT GLOB  D    0 .data1         _edata
      [18]  0x00022260 0x00000004  OBJT GLOB  D    0 .data          __cg92_used
      [19]  0x00022240 0x00000018  OBJT GLOB  D    0 .data          __environ_lock
      [26]  0x00022264 0x00000004  OBJT GLOB  D    0 .data          ___Argv
      [36]  0x00022258 0x00000004  OBJT WEAK  D    0 .data          environ
      [40]  0x00022258 0x00000004  OBJT GLOB  D    0 .data          _environ
  

You can do the same for the symbol table using -sN.symtab for a binary that does not have the symbol table stripped.

Deleting or Extracting Sections of a Binary File

Deleting sections can be useful as part of size analysis to see file size differences with and without some sections. You can do this using mcs -d, specifying the sections to be deleted with -n , for example:



% cc -g -o hello hello.c
% cp hello hello-without-comment-and-debugline
% mcs -d -n .comment -n .debug_line hello-without-comment-and-debugline
% ls -l hello hello-without-comment-and-debugline
-rwxr-xr-x   1 mblatt   ptg      5592 Oct 27 18:27 hello
-rwxr-xr-x   1 mblatt   ptg      4872 Oct 27 18:28 hello-without-comment-and-debugline
  

The mcs command above overwrites the file with the .comment and .debug_line sections removed. It cannot be used to remove sections critical to correct execution of an executable binary. The mcs command generates an error message if you attempt to delete, for example, the .interp section.

Sometimes, you might want to extract a section to look at it separately, without removing it from the binary. Sections can be extracted to a binary file using elfdump, for example:



% elfdump -N .text -w cat.text /bin/cat

The command above generates the file cat.text containing the .text section from /bin/cat.

Determining the Size of Sections

Binary size analysis requires detailed accounting for how the sizes of the sections add up to the byte size reported by ls -l.

You might be tempted to use the Size field from the greadelf or dump output to derive the size of each section. This is often correct, but not always. It is a lot safer to instead extract the Offset fields and subtract them. This is because Size sometimes records the amount of actual data, without including the space left unused between sections for alignment. If there is a notable difference between the section size total and the file size reported by ls -l, switching from using Size to using Offset subtraction can help.

For Oracle Solaris, the Section Header Table is not included in the output of greadelf or dump. This is generally around 2 KB, but it can be much larger for GNU g++ binaries, due to .linkonce fields. The size of this table can be estimated as follows:



% elfdump -e /bin/cat | grep e_shoff
 e_shoff:                0x247c  e_shentsize:  40  e_shnum:     23
  

Multiply shentsize and shnum to approximate the table size. In this case, the estimated size of the Section Header Table is 40 * 23 = 920 bytes.

Usually, it is fine to ignore this field. However, if there is a large difference between the total of the section sizes and the size in bytes from ls -l, this could be the reason. The .linkonce sections are generated by the Gnu g++ compiler for C++ in Oracle Solaris 10 9/10 and earlier.

Here's an example of what a .linkonce field looks like in dump -hv output from a Gnu g++ build on Oracle Solaris 10:



[15]    1       6       0x13ea28     0x12ea28     0x408 
.gnu.linkonce.t._ZNK22AutoDerivativeFunctionILi1EE20vector_gradient_listERKSt6ve
ctorI5PointILi1EESaIS3_EERS1_IS1_I6TensorILi1ELi1EESaIS9_EESaISB_EE

When viewed with greadelf, everything after the underscore is truncated:



[15] .gnu.linkonce.t._ PROGBITS        0013ea28 12ea28 000408 00  AX  0   0  4

The binary this output came from had over 7,000 fields called .gnu.linkonce.t_. They all look identical when truncated by greadelf, so you must use dump -hv to see the full name.

Generating Smaller Binary Files

It is possible to use Oracle Solaris Studio compiler flags to adjust the size of binaries. This is not recommended unless disk space is a critical concern.

Debug information can be one of the largest contributors to the size of compiled executables. When space is an issue, removing -g makes a large difference to file size for all languages, especially C++.

Note, however, that using the -g flag is highly recommended, because it helps with both debugging and performance analysis. For most programs, there is no measurable performance difference from adding -g. In the absence of space problems, it is recommended that -g be used all the time, including with optimized builds.

After removing -g, additional space reduction comes from adding the -xspace flag. When compiling with -O, adding -xspace removes some optimizations, namely those that use the most space for the least performance impact. This will significantly reduce the size of the .text segment, while performance decreases only a few percent. Note, however, that .text is only one piece of the binary, so the overall file size might not go down much.

For further reductions, the following flags reduce space in sections other than .text:

  • -mc removes duplicates from the .comment section with version numbers of system header files. This makes a difference when there are many files that use #include to include the same system headers.

  • -xannotate=no prevents the generation of the .annotate section. The .annotate data helps with performance analysis, code coverage, and datarace detection, but if the number one issue is binary size, you can manage without it.

Finally, you can strip the symbol table from the compiled binary.

Using these steps, size reduction for a C program's binary file is shown in Figure 1. The binary size is roughly half when -g is removed. With all the other steps, binary size is reduced by an additional third, ending up at around a third of the full size with -g.

Figure 1

Figure 1. Example of Binary File Size Reduction

For More Information

Here are some additional resources:

Revision 1.1, 1/23/2012