Embedding Binary Blobs With GCC

For a long time I've wanted to know how to embed binary blobs into executables. This would be most useful for files like Glade and and UI Manager definitions, which are required for a given program to work at all but either cannot be embedded as a string literal (Glade) or can be but is annoying (UI Manager). I finally asked the Interweb, and Daniel Jacobowitz replied with some pointers. It turns out that doing this is remarkable simple.

First, a caveat. This probably requires GNU ld, which may or may not be a deal breaker for many people.

First, create a data file. Let's call it foo.txt, and put some text in it.

Hello, World!

Using ld this can be read in as a plain binary blob, and then written as a standard relocatable ELF object.

ld -r -b binary -o foo.o foo.txt

Now we have a standard ELF object with the data and some useful symbols defined. objdump will show you the contents.

$ objdump -x foo.o 
foo.o:     file format elf32-i386

Idx Name          Size      VMA       LMA       File off  Algn
  0 .data         0000000d  00000000  00000000  00000034  2**0
                  CONTENTS, ALLOC, LOAD, DATA
00000000 l    d  .data  00000000 .data
0000000d g       .data  00000000 _binary_foo_txt_end
0000000d g       *ABS*  00000000 _binary_foo_txt_size
00000000 g       .data  00000000 _binary_foo_txt_start

Here we see 13 bytes of data, and a symbol which contains the address of the data. This is all we need to access it from a C program.

#include <stdio.h>
extern char _binary_foo_txt_start[];

int main (void) {
  puts (_binary_foo_txt_start);
  return 0;

Now if we compile this and link it against the generated object, we'll have a binary.

$ gcc -o test test.c foo.o
$ ./test
Hello, World!

Hooray! One small problem which alert people should have noticed: the string itself is in the .data section, which is read/write. For my use, I want it to be read-only data in the .rodata section so that it isn't copied for every instance of the application. As far as I know, this isn't possible with ld but objcopy will let us rename sections on the fly.

$ objcopy --rename-section .data=.rodata,alloc,load,readonly,data,contents foo.o foo.o
$ objdump  -h foo.o
  0 .rodata       0000000d  00000000  00000000  00000034  2**0

Excellent, problem solved. If you want to download this sample, I have a tarball. Many thanks to Daniel Jacobowitz for pointing out how to achieve this.

Update: note that any data embedded in the binary like this won't be terminated with a NULL. This is obvious in hindsight, but due to luck my example still worked. There might be a way of asking objcopy to append a 0 to the end of the data, but if not always remember to use the start and end pointers or size instead of just the start, or append a NULL yourself before converting to an ELF.

NP: (), Sigur Rós

15:50 Friday, 13 Jul 2007 [#] [computers] (20 comments)

Posted by Murray CUmming at Fri Jul 13 16:29:54 2007:
Why can't .glade XML files be included as a string literal?
Posted by Steve at Fri Jul 13 16:31:47 2007:
That is very sexy, thanks for sharing.
Posted by Ross at Fri Jul 13 16:32:39 2007:
Murray: I meant that you can't get Glade to edit glade data embedded in a C file, you need to have it seperate and then use some magic tool to process the glade file into a C string literal (handling the escaping) which is then included in the build.

Why do that, when its less work to just embed the data into the executable directly, and skip the C parsing phase entirely.
Posted by bartman at Fri Jul 13 16:38:10 2007:
Awesome!  Thanks.
Posted by Jerome Haltom at Fri Jul 13 17:05:53 2007:
Neat and all, but why do this? Is not share/ designed the way it is for a reason? I like being able to easily browse the files involved in a program... it makes debugging and maintenance way easier. I don't like opaque giant binaries.
Posted by Ross at Fri Jul 13 17:27:19 2007:
Why not do this?  How many people actually edit the GtkUIManager file for an application?

If someone wants to work on an application then they can get the source like everyone else manages too.  The bonus is that it makes having to hunt around disk for files a thing of the past, as the latest file is embedded in the binary: always useful when the application is hard-coded to locate a file in the install directory, but you are testing changes locally.

Basically there are no serious arguments either way.  I find this a useful tool and I'm sure I'll use it in an application at some point in time.
Posted by Jerome Haltom at Fri Jul 13 18:38:34 2007:
You're right. There are no serious arguments either way. I do however have a lot of experience with Windows and OS X and such where most apps are written like this, and I quite simply "like" being able to pop open a glade file that just happens to be sitting on my disk and check it out. Or see the application pop it open in strace. It is nothing more than simply making it "easier". Everything is fully visible and expanded by default. I don't have to hunt through the source, or whatever.
Posted by Murray Cumming at Fri Jul 13 18:54:16 2007:
I think I'd rather have my build system generate a header with the C string literal for me to include rather than use this technique. That would at least be more portable.

But I'm sure this is likely to be useful in some situation.
Posted by Zack Weinberg at Fri Jul 13 21:53:47 2007:
You might find GNU as's .incbin directive more useful for this purpose - you can control the symbol name, section, and add a NUL terminator.  You do have to write little wrapper .s files though.  For example

$ echo hello world > demo.txt
$ cat > demo.s
  .section ".rodata"
  .globl demo
  .type demo, @object
  .incbin "demo.txt"
  .byte 0
  .size demo, .-demo
$ as demo.s -o demo.o
$ nm demo.o
00000000 R demo
$ strings demo.o
hello world
Posted by napsy at Sat Jul 14 01:21:12 2007:
Been wondering if this was possible. Great post, thanx.
Posted by Philip Van Hoof at Sat Jul 14 11:40:08 2007:
Although glade's XML files might be an interesting application for this, for me this sounds even more useful for adding generated introspection data to GObject based libs.
Posted by Daniel at Sun Feb 1 23:41:43 2009:
Why doing this...

there are environments (BIOSes, rather minimal embedded OSes, ...) where you don't have a filesystem, for example
Posted by MikeW at Wed Feb 25 17:52:43 2009:
FYI - I found that the way to obtain the value of "_binary_foo_txt_size" (an absolute symbol) from within the program is to take its address ...
Posted by Daniel Svensson at Tue Sep 1 09:34:26 2009:
The asm example a few comments up don't seem to work on ARM, no idea why :/ Anyone knows?
Posted by Sam Morris at Tue Jan 12 11:02:29 2010:
Late to the party, but... I found that doing this does not guarantee that the data gets the correct alignment. So sometimes accessing it will make it appear as garbage! Be careful.
Posted by Sam Morris at Tue Jan 12 12:39:41 2010:
Late to the party, but... I found that doing this does not guarantee that the data gets the correct alignment. So sometimes accessing it will make it appear as garbage! Be careful.
Posted by Jon Mayo at Fri Jul 2 22:47:33 2010:
A hack to make use of _size:
extern const void _binary_foo_txt_size;
const size_t _binary_foo_txt_len=(size_t)&_binary_foo_txt_size;

I prefer:
.section .rodata
.global _foo
.incbin "foo.txt"
.byte 0
.equ foo_len, . - _foo
.align 4 /* TODO: adjust for 64-bit platforms */
.globl _foo_size
  .int foo_len

from C:
extern const char foo[];
extern const int foo_len;
Posted by John Kamp at Thu Oct 20 10:22:26 2011:
Is there a way to change the symbol name? I want to load many files into an executable and want them to be uniquely named. Any hints welcome, ideally it would be via the ld command (due to how our build system works).

I have read the ld manual, but nothing strikes me as obvious.
Posted by Conor at Wed Jan 11 16:00:07 2012:
Last question - yes: use --redefine-sym in objcopy.

objcopy -I binary -O elf32-i386
  --redefine-sym _binary_foo_txt_start=_README
foo.txt foo.o
Posted by Conor at Wed Jan 11 16:03:16 2012:
I'll add that this concerns me:

Idx Name  Size  VMA  LMA  File off  Algn
  0 .data  0000000d  00000000  00000000  00000034  2**0

Does that mean that the alignment for the .data section is set as "does not matter"? That could be a source of problems on an ARM if you accessed the data pointer as int * rather than char *.




Add 4 and 2 (required):