How to Find Out What's in an Oracle Solaris Binary File
by Miriam Blatt, January 2012
How to determine the contents of Oracle Solaris binaries and what tools you can use to read, extract, and delete sections. Plus, the effect of compiler flags on binary file size and how to reduce the size of the executable.
When the size of application object files and executable binary files becomes an issue, it can be helpful to know what tools you can use to see what's inside the files. It's also helpful to understand why the content is there and what parts can be reduced in size.
OTN is all about helping you become familiar enough with Oracle technologies to make an informed decision. Articles, software downloads, documentation, and more. Join up and get the technical resources you need to do your job.
This article reviews the contents of binaries and tools that can be used to read, extract, and delete sections. It ends with a discussion of the effect of compiler flags on binary file size and how to reduce the size of the executable.
What Is Inside a Binary file?
If you've ever wondered what is lurking inside Oracle Solaris binaries, here's a way to see a quick summary:
% /usr/sfw/bin/greadelf -SW /bin/cat
The SPARC binary we are looking at is /bin/cat
, which will be used for examples throughout this article. This greadelf
command generates the output shown in Listing 1.
Listing 1. Output of greadelf
Command
% /usr/sfw/bin/greadelf -SW /bin/cat
There are 23 section headers, starting at offset 0x247c:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .interp PROGBITS 000100f4 0000f4 000011 00 A 0 0 1
[ 2] .hash HASH 00010108 000108 0001ac 04 A 3 0 4
[ 3] .dynsym DYNSYM 000102b4 0002b4 000360 10 A 4 1 4
[ 4] .dynstr STRTAB 00010614 000614 0001e2 00 AS 0 0 1
[ 5] .SUNW_version VERNEED 000107f8 0007f8 000030 00 A 4 1 4
[ 6] .rela.data RELA 00010828 000828 00000c 0c AI 3 11 4
[ 7] .rela.bss RELA 00010834 000834 000024 0c AI 3 13 4
[ 8] .rela.plt RELA 00010858 000858 000150 0c AI 3 f 4
[ 9] .text PROGBITS 000109a8 0009a8 001200 00 AX 0 0 4
[10] .init PROGBITS 00011ba8 001ba8 00000c 00 AX 0 0 4
[11] .fini PROGBITS 00011bb4 001bb4 00000c 00 AX 0 0 4
[12] .rodata PROGBITS 00011bc0 001bc0 000004 00 A 0 0 4
[13] .rodata1 PROGBITS 00011bc4 001bc4 0001a9 00 A 0 0 4
[14] .got PROGBITS 00022000 002000 000004 04 WA 0 0 8192
[15] .plt PROGBITS 00022004 002004 000184 0c WAX 0 0 4
[16] .dynamic DYNAMIC 00022188 002188 0000b8 08 WA 4 0 4
[17] .data PROGBITS 00022240 002240 000048 00 WA 0 0 8
[18] .data1 PROGBITS 00022288 002288 000017 00 WA 0 0 4
[19] .bss NOBITS 000222a0 0022a0 008360 00 WA 0 0 8
[20] .comment PROGBITS 00000000 00229f 000025 00 0 0 1
[21] .shstrtab STRTAB 00000000 0022c4 0000a7 00 S 0 0 1
[22] .SUNW_signature LOOS+ffffff6 00000000 00236b 00010e 00 p 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings)
I (info), L (link order), G (group), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
When compiling a program, you can think of the binary as containing assembler instructions generated from the source code. These compiled instructions can be found in section .text
, which is one of the many sections inside the binary file.
Sections fall into two categories: program data and link-editing information. Program data is the portion of the binary that is meaningful only to the application, and it consists of a number of sections, including compiled instructions and data to be initialized before the program starts. Link-editing information contains several more sections, including symbol and string tables as well as relocation information. The link-editing information sections are interpreted by the linker to modify other sections.
For the descriptions below, the section numbers shown in the first column of the output in Listing 1 (for example, [1]
) are provided for reference. Note, however, that section numbers vary for different binaries; it is only the section names that are common to all binaries.
The first section in the /bin/cat
binary is [1].interp
. This is the pathname to the program interpreter, also known as the runtime linker, which interprets the information inside the binary, loads the program and initialized data into memory, and then starts the program.
In the example in Listing 1, /bin/cat
is an Oracle Solaris 32-bit binary, for which the [1].interp
field contains the string /usr/lib/ld.so.1
. A 64-bit SPARC binary would have /usr/lib/sparcv9/ld.so.1
, and a 64-bit Oracle Solaris x86 binary would have /usr/lib/amd64/ld.so.1
.
The next sections are for dynamic linking: [3].dynsym
is the symbol table with global symbols for dynamic linkage, [4].dynstr
contains strings for the dynamic symbol table, and [2].hash
is a hash table to access entries inside the dynamic symbol table. Section [5].SUNW_version
contains version information for global symbols in the dynamic symbol table, [3].dynsym
.
Section names containing .rel
have relocation information for items to be relocated by the runtime linker. Relocations for section x can be found for Oracle Solaris 10 in section .rel.x
on Intel platforms and on .rela.x
on SPARC platforms. For Oracle Solaris 11, most relocations are combined into .SUNW_reloc
.
In the SPARC-based Oracle Solaris 10 example in Listing 1, the relocation sections are [6].rela.data
, [7].rela.bss
, and [8].rela.plt
, all with the type .rela
. These are relocations for the initialized data in [17].data
, the uninitialized data in [19].bss
, and the procedure linkage table, [15].plt
.
The next several sections contain program information, indicated by the type PROGBITS
. Section [10].init
contains initialization instructions to be executed before the program begins or—for objects loaded with lazy loading or dlopen
( )—before they execute. The compiled program binary instructions are in [9].text
. The code in [11].fini
is executed after the program finishes execution or when objects are unloaded.
There are several sections for data, depending on whether the data is initialized before starting the program and whether it will be changed by the running program. Read-only data is stored in [12].rodata
and [13].rodata1
. Data initialized before starting the program that will be both read and written by the program at runtime is in [17].data
and [18].data1
. Space reserved for uninitialized read/write data is indicated by [19].bss
. The runtime linker allocates the space needed for .bss
and fills it with zeros. Note that although .bss
is the largest section in /bin/cat
, it takes up no space inside the binary file on disk, as indicated by the type NOBITS
. For this example, the .bss
size = 0x8360 = 33,632 bytes.
There are more program information sections labeled PROGBITS
. Section [14].got
contains the global offset table with indirect addresses for externally accessible data locations in the libraries being linked. Section [15].plt
is the procedure linkage table, which contains indirect addresses for externally accessible functions in the libraries being linked.
Section [16].dynamic
contains a cache of information required by ld.so.1
, including the starting address and size of the symbol, relocation, and string tables.
Sections [17].data
and [18].data1
contain initialized program data, as mentioned earlier. Section [19].bss
reserves space for uninitialized program data.
Section [20].comment
contains version information for system header files and compiler components.
Section [21].shstrtab
is a table of name strings for all sections. For this example, the first name string is .interp
and the last is .SUNW_signature
.
Section [22].SUNW_signature
contains a module verification signature used for security checking.
For more information about the contents of binaries, look in the Linker and Libraries Guide, Chapter 7, "Object File Format." Binary file sections are described under "Special Sections." The Linker and Libraries Guide can be found here:
The example in Listing 1 uses greadelf
to provide a quick, readable summary. Note, however, that it truncates long section header names, keeping the table nicely lined up and easy to read. To see the full names, use dump
instead:
% dump -hv /bin/cat
The dump
command generates the same information in a format more easily read by scripts than by human eyes, as shown in Listing 2.
Listing 2. Output of dump
Command
% dump -hv /bin/cat
**** SECTION HEADER TABLE ****
[No] Type Flags Addr Offset Size Name
Link Info Adralgn Entsize
[1] PBIT -A- 0x100f4 0xf4 0x11 .interp
0 0 0x1 0
[2] HASH -A- 0x10108 0x108 0x1ac .hash
3 0 0x4 0x4
[3] DYNS -A- 0x102b4 0x2b4 0x360 .dynsym
4 1 0x4 0x10
[4] STRT -A- 0x10614 0x614 0x1e2 .dynstr
0 0 0x1 0
[5] VERN -A- 0x107f8 0x7f8 0x30 .SUNW_version
4 1 0x4 0
[6] RELA -A- 0x10828 0x828 0xc .rela.data
3 17 0x4 0xc
[7] RELA -A- 0x10834 0x834 0x24 .rela.bss
3 19 0x4 0xc
[8] RELA -A- 0x10858 0x858 0x150 .rela.plt
3 15 0x4 0xc
[9] PBIT -AI 0x109a8 0x9a8 0x1200 .text
0 0 0x4 0
[10] PBIT -AI 0x11ba8 0x1ba8 0xc .init
0 0 0x4 0
[11] PBIT -AI 0x11bb4 0x1bb4 0xc .fini
0 0 0x4 0
[12] PBIT -A- 0x11bc0 0x1bc0 0x4 .rodata
0 0 0x4 0
[13] PBIT -A- 0x11bc4 0x1bc4 0x1a9 .rodata1
0 0 0x4 0
[14] PBIT WA- 0x22000 0x2000 0x4 .got
0 0 0x2000 0x4
[15] PBIT WAI 0x22004 0x2004 0x184 .plt
0 0 0x4 0xc
[16] DYNM WA- 0x22188 0x2188 0xb8 .dynamic
4 0 0x4 0x8
[17] PBIT WA- 0x22240 0x2240 0x48 .data
0 0 0x8 0
[18] PBIT WA- 0x22288 0x2288 0x17 .data1
0 0 0x4 0
[19] NOBI WA- 0x222a0 0x22a0 0x8360 .bss
0 0 0x8 0
[20] PBIT --- 0 0x229f 0x25 .comment
0 0 0x1 0
[21] STRT --- 0 0x22c4 0xa7 .shstrtab
0 0 0x1 0
[22] SIGN ---E 0 0x236b 0x10e .SUNW_signature
0 0 0x1 0
On Linux, there is no dump
command, and greadelf
can be found under the name readelf
.
Looking at the Contents of Sections
When binary files are large, it can be of interest to figure out what is inside each section that takes up space. There are multiple ways to look inside the sections of a binary.
When sections contain strings only, the easiest way is to use mcs
:
% mcs -p -n .shstrtab /bin/cat
The mcs
command generates a list of section names, as shown in Listing 3.
Listing 3. Output of mcs
Command
% mcs -p -n .shstrtab /bin/cat
.interp
.hash
.dynsym
.dynstr
.SUNW_version
.rela.data
.rela.bss
.rela.plt
.text
.init
.fini
.rodata
.rodata1
.got
.dynamic
.data1
.comment
.shstrtab
.SUNW_signature
However, many sections have strings interspersed with binary data. For these, mcs
will generate unprintable characters, which is not likely to be helpful. Instead, use gobjdump
, which generates the binary data together with any printable ASCII characters found inside it, for example:
% gobjdump -s -j .rodata1 /bin/cat
Using the gobjdump
command generates the output shown in Listing 4.
Listing 4. Output of gobjdump
Command
% gobjdump -s -j .rodata1 /bin/cat
/bin/cat: file format elf32-sparc
Contents of section .rodata1:
11bc4 00000000 53554e57 5f4f5354 5f4f5343 ....SUNW_OST_OSC
11bd4 4d440000 75737674 65626e00 75736167 MD..usvtebn.usag
11be4 653a2063 6174205b 202d7573 76746562 e: cat [ -usvteb
11bf4 6e205d20 5b2d7c66 696c655d 202e2e2e n ] [-|file] ...
11c04 0a000000 6361743a 2043616e 6e6f7420 ....cat: Cannot
11c14 73746174 20737464 6f75740a 00000000 stat stdout.....
11c24 72000000 6361743a 2063616e 6e6f7420 r...cat: cannot
11c34 6f70656e 2025730a 00000000 6361743a open %s.....cat:
11c44 2063616e 6e6f7420 73746174 2025730a cannot stat %s.
11c54 00000000 6361743a 20696e70 75742f6f ....cat: input/o
11c64 75747075 74206669 6c657320 27257327 utput files '%s'
11c74 20696465 6e746963 616c0a00 6361743a identical..cat:
11c84 20636c6f 73652065 72726f72 0a000000 close error....
11c94 6361743a 20636c6f 73652065 72726f72 cat: close error
11ca4 0a000000 6361743a 20636c6f 73652065 ....cat: close e
11cb4 72726f72 00000000 6361743a 2063616e rror....cat: can
11cc4 6e6f7420 72656164 2025733a 20000000 not read %s: ...
11cd4 6361743a 20777269 74652065 72726f72 cat: write error
11ce4 3a200000 00000000 6361743a 206d6d61 : ......cat: mma
11cf4 70206572 726f7200 6361743a 206e6f20 p error.cat: no
11d04 6d656d6f 72790000 6361743a 206f7574 memory..cat: out
11d14 70757420 6572726f 72202825 642f2564 put error (%d/%d
11d24 20636861 72616374 65727320 77726974 characters writ
11d34 74656e29 0a000000 00000000 6361743a ten)........cat:
11d44 20696e70 75742065 72726f72 206f6e20 input error on
11d54 25733a20 00000000 00000000 25366409 %s: ........%6d.
11d64 00000000 25366409 00 ....%6d..
You can see the contents of the .text
section using the disassembler, for example:
% dis /bin/cat
Using the dis
command generates full disassembly, including the .init
and .fini
sections, as shown in Listing 5.
Listing 5. Output of dis
Command
% dis /bin/cat
section .text
_start()
_start: bc 10 20 00 clr %fp
_start+0x4: e0 03 a0 40 ld [%sp + 0x40], %l0
_start+0x8: 13 00 00 a9 sethi %hi(0x2a400), %o1
_start+0xc: e0 22 61 f8 st %l0, [%o1 + 0x1f8]
...
main()
main: 9d e3 be 60 save %sp, -0x1a0, %sp
main+0x4: 19 00 00 46 sethi %hi(0x11800), %o4
main+0x8: ba 10 20 00 clr %i5
main+0xc: f0 23 a0 5c st %i0, [%sp + 0x5c]
...
section .init
_init()
_init: 9d e3 bf a0 save %sp, -0x60, %sp
_init+0x4: 81 c7 e0 08 ret
_init+0x8: 81 e8 00 00 restore
section .fini
_fini()
_fini: 9d e3 bf a0 save %sp, -0x60, %sp
_fini+0x4: 81 c7 e0 08 ret
_fini+0x8: 81 e8 00 00 restore
Looking for Symbols Inside a Data Section
Looking inside a data section, you will find a lot of binary data that might not be particularly helpful for understanding where the data came from. To figure out what is taking space in a data section, you can extract a list of symbol names for the section using elfdump
, for example:
% elfdump -sN.dynsym /bin/cat | fgrep .data
Using elfdump
generates the list of symbols from the .dynsym
section that are located in .data
or .data1
:
% elfdump -sN.dynsym /bin/cat | fgrep .data
[4] 0x0002229f 0x00000000 OBJT GLOB D 0 .data1 _edata
[18] 0x00022260 0x00000004 OBJT GLOB D 0 .data __cg92_used
[19] 0x00022240 0x00000018 OBJT GLOB D 0 .data __environ_lock
[26] 0x00022264 0x00000004 OBJT GLOB D 0 .data ___Argv
[36] 0x00022258 0x00000004 OBJT WEAK D 0 .data environ
[40] 0x00022258 0x00000004 OBJT GLOB D 0 .data _environ
You can do the same for the symbol table using -sN.symtab
for a binary that does not have the symbol table stripped.
Deleting or Extracting Sections of a Binary File
Deleting sections can be useful as part of size analysis to see file size differences with and without some sections. You can do this using mcs -d
, specifying the sections to be deleted with -n
, for example:
% cc -g -o hello hello.c
% cp hello hello-without-comment-and-debugline
% mcs -d -n .comment -n .debug_line hello-without-comment-and-debugline
% ls -l hello hello-without-comment-and-debugline
-rwxr-xr-x 1 mblatt ptg 5592 Oct 27 18:27 hello
-rwxr-xr-x 1 mblatt ptg 4872 Oct 27 18:28 hello-without-comment-and-debugline
The mcs
command above overwrites the file with the .comment
and .debug_line
sections removed. It cannot be used to remove sections critical to correct execution of an executable binary. The mcs
command generates an error message if you attempt to delete, for example, the .interp
section.
Sometimes, you might want to extract a section to look at it separately, without removing it from the binary. Sections can be extracted to a binary file using elfdump
, for example:
% elfdump -N .text -w cat.text /bin/cat
The command above generates the file cat.text
containing the .text
section from /bin/cat
.
Determining the Size of Sections
Binary size analysis requires detailed accounting for how the sizes of the sections add up to the byte size reported by ls -l
.
You might be tempted to use the Size
field from the greadelf
or dump
output to derive the size of each section. This is often correct, but not always. It is a lot safer to instead extract the Offset
fields and subtract them. This is because Size
sometimes records the amount of actual data, without including the space left unused between sections for alignment. If there is a notable difference between the section size total and the file size reported by ls -l
, switching from using Size
to using Offset
subtraction can help.
For Oracle Solaris, the Section Header Table
is not included in the output of greadelf
or dump
. This is generally around 2 KB, but it can be much larger for GNU g++
binaries, due to .linkonce
fields. The size of this table can be estimated as follows:
% elfdump -e /bin/cat | grep e_shoff
e_shoff: 0x247c e_shentsize: 40 e_shnum: 23
Multiply shentsize
and shnum
to approximate the table size. In this case, the estimated size of the Section Header Table
is 40 * 23 = 920 bytes.
Usually, it is fine to ignore this field. However, if there is a large difference between the total of the section sizes and the size in bytes from ls -l
, this could be the reason. The .linkonce
sections are generated by the Gnu g++
compiler for C++ in Oracle Solaris 10 9/10 and earlier.
Here's an example of what a .linkonce
field looks like in dump -hv
output from a Gnu g++
build on Oracle Solaris 10:
[15] 1 6 0x13ea28 0x12ea28 0x408
.gnu.linkonce.t._ZNK22AutoDerivativeFunctionILi1EE20vector_gradient_listERKSt6ve
ctorI5PointILi1EESaIS3_EERS1_IS1_I6TensorILi1ELi1EESaIS9_EESaISB_EE
When viewed with greadelf
, everything after the underscore is truncated:
[15] .gnu.linkonce.t._ PROGBITS 0013ea28 12ea28 000408 00 AX 0 0 4
The binary this output came from had over 7,000 fields called .gnu.linkonce.t_
. They all look identical when truncated by greadelf
, so you must use dump -hv
to see the full name.
Generating Smaller Binary Files
It is possible to use Oracle Solaris Studio compiler flags to adjust the size of binaries. This is not recommended unless disk space is a critical concern.
Debug information can be one of the largest contributors to the size of compiled executables. When space is an issue, removing -g
makes a large difference to file size for all languages, especially C++.
Note, however, that using the -g
flag is highly recommended, because it helps with both debugging and performance analysis. For most programs, there is no measurable performance difference from adding -g
. In the absence of space problems, it is recommended that -g
be used all the time, including with optimized builds.
After removing -g
, additional space reduction comes from adding the -xspace
flag. When compiling with -O
, adding -xspace
removes some optimizations, namely those that use the most space for the least performance impact. This will significantly reduce the size of the .text
segment, while performance decreases only a few percent. Note, however, that .text
is only one piece of the binary, so the overall file size might not go down much.
For further reductions, the following flags reduce space in sections other than .text
:
-mc
removes duplicates from the.comment
section with version numbers of system header files. This makes a difference when there are many files that use#include
to include the same system headers.-xannotate=no
prevents the generation of the.annotate
section. The.annotate
data helps with performance analysis, code coverage, and datarace detection, but if the number one issue is binary size, you can manage without it.
Finally, you can strip the symbol table from the compiled binary.
Using these steps, size reduction for a C program's binary file is shown in Figure 1. The binary size is roughly half when -g
is removed. With all the other steps, binary size is reduced by an additional third, ending up at around a third of the full size with -g
.
Figure 1. Example of Binary File Size Reduction
For More Information
Here are some additional resources:
Download Oracle Solaris 11
Access the Oracle Solaris 11 Linker and Libraries Guide
Access the Oracle Solaris 10 Linker and Libraries Guide
-
Access the man pages for the following tools:
Access all Oracle Solaris 11 how-to guides
Learn more with Oracle Solaris 11 training and support
See the official Oracle Solaris blog
Check out The Observatory blog for Oracle Solaris tips and tricks
Revision 1.1, 1/23/2012