by Miriam Blatt, January 2012
When the size of application object files and executable binary files becomes an issue, it can be helpful to know what tools you can use to see what's inside the files. It's also helpful to understand why the content is there and what parts can be reduced in size.
This article reviews the contents of binaries and tools that can be used to read, extract, and delete sections. It ends with a discussion of the effect of compiler flags on binary file size and how to reduce the size of the executable.
If you've ever wondered what is lurking inside Oracle Solaris binaries, here's a way to see a quick summary:
% /usr/sfw/bin/greadelf -SW /bin/cat
The SPARC binary we are looking at is
/bin/cat, which will be used for examples throughout this article. This
greadelf command generates the output shown in Listing 1.
% /usr/sfw/bin/greadelf -SW /bin/cat There are 23 section headers, starting at offset 0x247c: Section Headers: [Nr] Name Type Addr Off Size ES Flg Lk Inf Al [ 0] NULL 00000000 000000 000000 00 0 0 0 [ 1] .interp PROGBITS 000100f4 0000f4 000011 00 A 0 0 1 [ 2] .hash HASH 00010108 000108 0001ac 04 A 3 0 4 [ 3] .dynsym DYNSYM 000102b4 0002b4 000360 10 A 4 1 4 [ 4] .dynstr STRTAB 00010614 000614 0001e2 00 AS 0 0 1 [ 5] .SUNW_version VERNEED 000107f8 0007f8 000030 00 A 4 1 4 [ 6] .rela.data RELA 00010828 000828 00000c 0c AI 3 11 4 [ 7] .rela.bss RELA 00010834 000834 000024 0c AI 3 13 4 [ 8] .rela.plt RELA 00010858 000858 000150 0c AI 3 f 4 [ 9] .text PROGBITS 000109a8 0009a8 001200 00 AX 0 0 4  .init PROGBITS 00011ba8 001ba8 00000c 00 AX 0 0 4  .fini PROGBITS 00011bb4 001bb4 00000c 00 AX 0 0 4  .rodata PROGBITS 00011bc0 001bc0 000004 00 A 0 0 4  .rodata1 PROGBITS 00011bc4 001bc4 0001a9 00 A 0 0 4  .got PROGBITS 00022000 002000 000004 04 WA 0 0 8192  .plt PROGBITS 00022004 002004 000184 0c WAX 0 0 4  .dynamic DYNAMIC 00022188 002188 0000b8 08 WA 4 0 4  .data PROGBITS 00022240 002240 000048 00 WA 0 0 8  .data1 PROGBITS 00022288 002288 000017 00 WA 0 0 4  .bss NOBITS 000222a0 0022a0 008360 00 WA 0 0 8  .comment PROGBITS 00000000 00229f 000025 00 0 0 1  .shstrtab STRTAB 00000000 0022c4 0000a7 00 S 0 0 1  .SUNW_signature LOOS+ffffff6 00000000 00236b 00010e 00 p 0 0 1 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings) I (info), L (link order), G (group), x (unknown) O (extra OS processing required) o (OS specific), p (processor specific)
When compiling a program, you can think of the binary as containing assembler instructions generated from the source code. These compiled instructions can be found in section
.text, which is one of the many sections inside the binary file.
Sections fall into two categories: program data and link-editing information. Program data is the portion of the binary that is meaningful only to the application, and it consists of a number of sections, including compiled instructions and data to be initialized before the program starts. Link-editing information contains several more sections, including symbol and string tables as well as relocation information. The link-editing information sections are interpreted by the linker to modify other sections.
For the descriptions below, the section numbers shown in the first column of the output in Listing 1 (for example,
) are provided for reference. Note, however, that section numbers vary for different binaries; it is only the section names that are common to all binaries.
The first section in the
/bin/cat binary is
.interp. This is the pathname to the program interpreter, also known as the runtime linker, which interprets the information inside the binary, loads the program and initialized data into memory, and then starts the program.
In the example in Listing 1,
/bin/cat is an Oracle Solaris 32-bit binary, for which the
.interp field contains the string
/usr/lib/ld.so.1. A 64-bit SPARC binary would have
/usr/lib/sparcv9/ld.so.1, and a 64-bit Oracle Solaris x86 binary would have
The next sections are for dynamic linking:
.dynsym is the symbol table with global symbols for dynamic linkage,
.dynstr contains strings for the dynamic symbol table, and
.hash is a hash table to access entries inside the dynamic symbol table. Section
.SUNW_version contains version information for global symbols in the dynamic symbol table,
Section names containing
.rel have relocation information for items to be relocated by the runtime linker. Relocations for section x can be found for Oracle Solaris 10 in section
.rel.x on Intel platforms and on
.rela.x on SPARC platforms. For Oracle Solaris 11, most relocations are combined into
In the SPARC-based Oracle Solaris 10 example in Listing 1, the relocation sections are
.rela.plt, all with the type
.rela. These are relocations for the initialized data in
.data, the uninitialized data in
.bss, and the procedure linkage table,
The next several sections contain program information, indicated by the type
.init contains initialization instructions to be executed before the program begins or—for objects loaded with lazy loading or
dlopen( )—before they execute. The compiled program binary instructions are in
.text. The code in
.fini is executed after the program finishes execution or when objects are unloaded.
There are several sections for data, depending on whether the data is initialized before starting the program and whether it will be changed by the running program. Read-only data is stored in
.rodata1. Data initialized before starting the program that will be both read and written by the program at runtime is in
.data1. Space reserved for uninitialized read/write data is indicated by
.bss. The runtime linker allocates the space needed for
.bss and fills it with zeros. Note that although
.bss is the largest section in
/bin/cat, it takes up no space inside the binary file on disk, as indicated by the type
NOBITS. For this example, the
.bss size = 0x8360 = 33,632 bytes.
There are more program information sections labeled
.got contains the global offset table with indirect addresses for externally accessible data locations in the libraries being linked. Section
.plt is the procedure linkage table, which contains indirect addresses for externally accessible functions in the libraries being linked.
.dynamic contains a cache of information required by
ld.so.1, including the starting address and size of the symbol, relocation, and string tables.
.data1 contain initialized program data, as mentioned earlier. Section
.bss reserves space for uninitialized program data.
.comment contains version information for system header files and compiler components.
.shstrtab is a table of name strings for all sections. For this example, the first name string is
.interp and the last is
.SUNW_signature contains a module verification signature used for security checking.
For more information about the contents of binaries, look in the Linker and Libraries Guide, Chapter 7, "Object File Format." Binary file sections are described under "Special Sections." The Linker and Libraries Guide can be found here:
The example in Listing 1 uses
greadelf to provide a quick, readable summary. Note, however, that it truncates long section header names, keeping the table nicely lined up and easy to read. To see the full names, use
% dump -hv /bin/cat
dump command generates the same information in a format more easily read by scripts than by human eyes, as shown in Listing 2.
% dump -hv /bin/cat **** SECTION HEADER TABLE **** [No] Type Flags Addr Offset Size Name Link Info Adralgn Entsize  PBIT -A- 0x100f4 0xf4 0x11 .interp 0 0 0x1 0  HASH -A- 0x10108 0x108 0x1ac .hash 3 0 0x4 0x4  DYNS -A- 0x102b4 0x2b4 0x360 .dynsym 4 1 0x4 0x10  STRT -A- 0x10614 0x614 0x1e2 .dynstr 0 0 0x1 0  VERN -A- 0x107f8 0x7f8 0x30 .SUNW_version 4 1 0x4 0  RELA -A- 0x10828 0x828 0xc .rela.data 3 17 0x4 0xc  RELA -A- 0x10834 0x834 0x24 .rela.bss 3 19 0x4 0xc  RELA -A- 0x10858 0x858 0x150 .rela.plt 3 15 0x4 0xc  PBIT -AI 0x109a8 0x9a8 0x1200 .text 0 0 0x4 0  PBIT -AI 0x11ba8 0x1ba8 0xc .init 0 0 0x4 0  PBIT -AI 0x11bb4 0x1bb4 0xc .fini 0 0 0x4 0  PBIT -A- 0x11bc0 0x1bc0 0x4 .rodata 0 0 0x4 0  PBIT -A- 0x11bc4 0x1bc4 0x1a9 .rodata1 0 0 0x4 0  PBIT WA- 0x22000 0x2000 0x4 .got 0 0 0x2000 0x4  PBIT WAI 0x22004 0x2004 0x184 .plt 0 0 0x4 0xc  DYNM WA- 0x22188 0x2188 0xb8 .dynamic 4 0 0x4 0x8  PBIT WA- 0x22240 0x2240 0x48 .data 0 0 0x8 0  PBIT WA- 0x22288 0x2288 0x17 .data1 0 0 0x4 0  NOBI WA- 0x222a0 0x22a0 0x8360 .bss 0 0 0x8 0  PBIT --- 0 0x229f 0x25 .comment 0 0 0x1 0  STRT --- 0 0x22c4 0xa7 .shstrtab 0 0 0x1 0  SIGN ---E 0 0x236b 0x10e .SUNW_signature 0 0 0x1 0
On Linux, there is no
dump command, and
greadelf can be found under the name
When binary files are large, it can be of interest to figure out what is inside each section that takes up space. There are multiple ways to look inside the sections of a binary.
When sections contain strings only, the easiest way is to use
% mcs -p -n .shstrtab /bin/cat
mcs command generates a list of section names, as shown in Listing 3.
% mcs -p -n .shstrtab /bin/cat .interp .hash .dynsym .dynstr .SUNW_version .rela.data .rela.bss .rela.plt .text .init .fini .rodata .rodata1 .got .dynamic .data1 .comment .shstrtab .SUNW_signature
However, many sections have strings interspersed with binary data. For these,
mcs will generate unprintable characters, which is not likely to be helpful. Instead, use
gobjdump, which generates the binary data together with any printable ASCII characters found inside it, for example:
% gobjdump -s -j .rodata1 /bin/cat
gobjdump command generates the output shown in Listing 4.
% gobjdump -s -j .rodata1 /bin/cat /bin/cat: file format elf32-sparc Contents of section .rodata1: 11bc4 00000000 53554e57 5f4f5354 5f4f5343 ....SUNW_OST_OSC 11bd4 4d440000 75737674 65626e00 75736167 MD..usvtebn.usag 11be4 653a2063 6174205b 202d7573 76746562 e: cat [ -usvteb 11bf4 6e205d20 5b2d7c66 696c655d 202e2e2e n ] [-|file] ... 11c04 0a000000 6361743a 2043616e 6e6f7420 ....cat: Cannot 11c14 73746174 20737464 6f75740a 00000000 stat stdout..... 11c24 72000000 6361743a 2063616e 6e6f7420 r...cat: cannot 11c34 6f70656e 2025730a 00000000 6361743a open %s.....cat: 11c44 2063616e 6e6f7420 73746174 2025730a cannot stat %s. 11c54 00000000 6361743a 20696e70 75742f6f ....cat: input/o 11c64 75747075 74206669 6c657320 27257327 utput files '%s' 11c74 20696465 6e746963 616c0a00 6361743a identical..cat: 11c84 20636c6f 73652065 72726f72 0a000000 close error.... 11c94 6361743a 20636c6f 73652065 72726f72 cat: close error 11ca4 0a000000 6361743a 20636c6f 73652065 ....cat: close e 11cb4 72726f72 00000000 6361743a 2063616e rror....cat: can 11cc4 6e6f7420 72656164 2025733a 20000000 not read %s: ... 11cd4 6361743a 20777269 74652065 72726f72 cat: write error 11ce4 3a200000 00000000 6361743a 206d6d61 : ......cat: mma 11cf4 70206572 726f7200 6361743a 206e6f20 p error.cat: no 11d04 6d656d6f 72790000 6361743a 206f7574 memory..cat: out 11d14 70757420 6572726f 72202825 642f2564 put error (%d/%d 11d24 20636861 72616374 65727320 77726974 characters writ 11d34 74656e29 0a000000 00000000 6361743a ten)........cat: 11d44 20696e70 75742065 72726f72 206f6e20 input error on 11d54 25733a20 00000000 00000000 25366409 %s: ........%6d. 11d64 00000000 25366409 00 ....%6d..
You can see the contents of the
.text section using the disassembler, for example:
% dis /bin/cat
dis command generates full disassembly, including the
.fini sections, as shown in Listing 5.
% dis /bin/cat section .text _start() _start: bc 10 20 00 clr %fp _start+0x4: e0 03 a0 40 ld [%sp + 0x40], %l0 _start+0x8: 13 00 00 a9 sethi %hi(0x2a400), %o1 _start+0xc: e0 22 61 f8 st %l0, [%o1 + 0x1f8] ... main() main: 9d e3 be 60 save %sp, -0x1a0, %sp main+0x4: 19 00 00 46 sethi %hi(0x11800), %o4 main+0x8: ba 10 20 00 clr %i5 main+0xc: f0 23 a0 5c st %i0, [%sp + 0x5c] ... section .init _init() _init: 9d e3 bf a0 save %sp, -0x60, %sp _init+0x4: 81 c7 e0 08 ret _init+0x8: 81 e8 00 00 restore section .fini _fini() _fini: 9d e3 bf a0 save %sp, -0x60, %sp _fini+0x4: 81 c7 e0 08 ret _fini+0x8: 81 e8 00 00 restore
Looking inside a data section, you will find a lot of binary data that might not be particularly helpful for understanding where the data came from. To figure out what is taking space in a data section, you can extract a list of symbol names for the section using
elfdump, for example:
% elfdump -sN.dynsym /bin/cat | fgrep .data
elfdump generates the list of symbols from the
.dynsym section that are located in
% elfdump -sN.dynsym /bin/cat | fgrep .data  0x0002229f 0x00000000 OBJT GLOB D 0 .data1 _edata  0x00022260 0x00000004 OBJT GLOB D 0 .data __cg92_used  0x00022240 0x00000018 OBJT GLOB D 0 .data __environ_lock  0x00022264 0x00000004 OBJT GLOB D 0 .data ___Argv  0x00022258 0x00000004 OBJT WEAK D 0 .data environ  0x00022258 0x00000004 OBJT GLOB D 0 .data _environ
You can do the same for the symbol table using
-sN.symtab for a binary that does not have the symbol table stripped.
Deleting sections can be useful as part of size analysis to see file size differences with and without some sections. You can do this using
mcs -d, specifying the sections to be deleted with
-n , for example:
% cc -g -o hello hello.c % cp hello hello-without-comment-and-debugline % mcs -d -n .comment -n .debug_line hello-without-comment-and-debugline % ls -l hello hello-without-comment-and-debugline -rwxr-xr-x 1 mblatt ptg 5592 Oct 27 18:27 hello -rwxr-xr-x 1 mblatt ptg 4872 Oct 27 18:28 hello-without-comment-and-debugline
mcs command above overwrites the file with the
.debug_line sections removed. It cannot be used to remove sections critical to correct execution of an executable binary. The
mcs command generates an error message if you attempt to delete, for example, the
Sometimes, you might want to extract a section to look at it separately, without removing it from the binary. Sections can be extracted to a binary file using
elfdump, for example:
% elfdump -N .text -w cat.text /bin/cat
The command above generates the file
cat.text containing the
.text section from
Binary size analysis requires detailed accounting for how the sizes of the sections add up to the byte size reported by
You might be tempted to use the
Size field from the
dump output to derive the size of each section. This is often correct, but not always. It is a lot safer to instead extract the
Offset fields and subtract them. This is because
Size sometimes records the amount of actual data, without including the space left unused between sections for alignment. If there is a notable difference between the section size total and the file size reported by
ls -l, switching from using
Size to using
Offset subtraction can help.
For Oracle Solaris, the
Section Header Table is not included in the output of
dump. This is generally around 2 KB, but it can be much larger for GNU
g++ binaries, due to
.linkonce fields. The size of this table can be estimated as follows:
% elfdump -e /bin/cat | grep e_shoff e_shoff: 0x247c e_shentsize: 40 e_shnum: 23
shnum to approximate the table size. In this case, the estimated size of the
Section Header Table is 40 * 23 = 920 bytes.
Usually, it is fine to ignore this field. However, if there is a large difference between the total of the section sizes and the size in bytes from
ls -l, this could be the reason. The
.linkonce sections are generated by the Gnu
g++ compiler for C++ in Oracle Solaris 10 9/10 and earlier.
Here's an example of what a
.linkonce field looks like in
dump -hv output from a Gnu
g++ build on Oracle Solaris 10:
 1 6 0x13ea28 0x12ea28 0x408 .gnu.linkonce.t._ZNK22AutoDerivativeFunctionILi1EE20vector_gradient_listERKSt6ve ctorI5PointILi1EESaIS3_EERS1_IS1_I6TensorILi1ELi1EESaIS9_EESaISB_EE
When viewed with
greadelf, everything after the underscore is truncated:
 .gnu.linkonce.t._ PROGBITS 0013ea28 12ea28 000408 00 AX 0 0 4
The binary this output came from had over 7,000 fields called
.gnu.linkonce.t_. They all look identical when truncated by
greadelf, so you must use
dump -hv to see the full name.
It is possible to use Oracle Solaris Studio compiler flags to adjust the size of binaries. This is not recommended unless disk space is a critical concern.
Debug information can be one of the largest contributors to the size of compiled executables. When space is an issue, removing
-g makes a large difference to file size for all languages, especially C++.
Note, however, that using the
-g flag is highly recommended, because it helps with both debugging and performance analysis. For most programs, there is no measurable performance difference from adding
-g. In the absence of space problems, it is recommended that
-g be used all the time, including with optimized builds.
-g, additional space reduction comes from adding the
-xspace flag. When compiling with
-xspace removes some optimizations, namely those that use the most space for the least performance impact. This will significantly reduce the size of the
.text segment, while performance decreases only a few percent. Note, however, that
.text is only one piece of the binary, so the overall file size might not go down much.
For further reductions, the following flags reduce space in sections other than
-mcremoves duplicates from the
.commentsection with version numbers of system header files. This makes a difference when there are many files that use
#includeto include the same system headers.
-xannotate=noprevents the generation of the
.annotatedata helps with performance analysis, code coverage, and datarace detection, but if the number one issue is binary size, you can manage without it.
Finally, you can strip the symbol table from the compiled binary.
Using these steps, size reduction for a C program's binary file is shown in Figure 1. The binary size is roughly half when
-g is removed. With all the other steps, binary size is reduced by an additional third, ending up at around a third of the full size with
Figure 1. Example of Binary File Size Reduction
Here are some additional resources:
|Revision 1.1, 1/23/2012|