link garbage collection

link gc can be activated with :

CFLAGS+=-fdata-sections -fno-common
CFLAGS+=-ffunction-sections
LDOPTS+=-Wl,--gc-sections -Wl,--print-gc-sections -Wl,--entry=entry

The compiler (CFLAGS) options will make each function it is own section. The linker (LDOPTS) option will make the linker, include all code/date used by the entry function, and garbage collect all other code.

These option can be a win on large project, but this imply overhead in code.

In the normal mode, gcc put all code/data of a file in one section. And in this section stuff can't be moved by the linker.

But now gcc don't know how the linker will organise section and can cause overhead.

fdata-sections overhead

For example it will incread code size when accessing global data :

int bar;
int titi;
int tata=1;
int foo=2;

int toto(void) { return foo+tata+titi+bar; }

bar and titi are in bss tata and foo in data

arm-none-eabi-gcc -Os -c

00000000 <toto>:
   0:	e59f3020 	ldr	r3, [pc, #32]	; 28 <toto+0x28>
   4:	e8930005 	ldm	r3, {r0, r2}
   8:	e0800002 	add	r0, r0, r2
   c:	e59f3018 	ldr	r3, [pc, #24]	; 2c <toto+0x2c>
  10:	e5933000 	ldr	r3, [r3]
  14:	e0800003 	add	r0, r0, r3
  18:	e59f3010 	ldr	r3, [pc, #16]	; 30 <toto+0x30>
  1c:	e5933000 	ldr	r3, [r3]
  20:	e0800003 	add	r0, r0, r3
  24:	e12fff1e 	bx	lr
  28:
  2c:
  30:

arm-none-eabi-gcc -Os -fno-common -c

00000000 <toto>:
   0:	e59f3018 	ldr	r3, [pc, #24]	; 20 <toto+0x20>
   4:	e8930005 	ldm	r3, {r0, r2}
   8:	e0800002 	add	r0, r0, r2
   c:	e59f3010 	ldr	r3, [pc, #16]	; 24 <toto+0x24>
  10:	e893000c 	ldm	r3, {r2, r3}
  14:	e0800002 	add	r0, r0, r2
  18:	e0800003 	add	r0, r0, r3
  1c:	e12fff1e 	bx	lr
  20:
  24:

arm-none-eabi-gcc -Os -fno-common -fdata-sections -c

00000000 <toto>:
   0:	e59f3028 	ldr	r3, [pc, #40]	; 30 <toto+0x30>
   4:	e5930000 	ldr	r0, [r3]
   8:	e59f3024 	ldr	r3, [pc, #36]	; 34 <toto+0x34>
   c:	e5933000 	ldr	r3, [r3]
  10:	e0800003 	add	r0, r0, r3
  14:	e59f301c 	ldr	r3, [pc, #28]	; 38 <toto+0x38>
  18:	e5933000 	ldr	r3, [r3]
  1c:	e0800003 	add	r0, r0, r3
  20:	e59f3014 	ldr	r3, [pc, #20]	; 3c <toto+0x3c>
  24:	e5933000 	ldr	r3, [r3]
  28:	e0800003 	add	r0, r0, r3
  2c:	e12fff1e 	bx	lr
  30:
  34:
  38:
  3c:

Note that -fno-common can help to generate better code with bss data.

optimisation

  • 2 pass build : detect unused stuff and build and optimised version.
  • linker to patch the generated code ?

ffunction-sections overhead

Gcc sometimes need to use trampoline.

For example on armv4t, there is not blx instruction. codesourcery arm-2011.03 (elf target) generate code like :

000c7848 <conf_load_defaults>:
   c7848:       b538            push    {r3, r4, r5, lr}
[]
   c7870:       f000 f812       bl      c7898 <memcpy_from_thumb>
[]
   c7888:       bc01            pop     {r0}
   c788a:       4700            bx      r0

000c7898 <memcpy_from_thumb>: c7898: 4778 bx pc c789a: 46c0 nop ; (mov r8, r8) c789c: eaff630f b a04e0 <memcpy>

and with ffunction-sections, there is lot's of memcpy_from_thumb in different section and the linker doesn't merge them.

In fact gcc generate

[]
   6:   f7ff fffe       bl      0 <memcpy>
[]
and the linker patch the code !!!

Note : there was lot's of memcpy_from_thumb if we din't merge .text* in the linker script.

armv5t

using armv5t, we got

000c538c <conf_load_defaults>:
   c538c:       b538            push    {r3, r4, r5, lr}
[]
   c53b4:       f7da eea4       blx     a0100 <memcpy>
[]
   c53c8:       bd38            pop     {r3, r4, r5, pc}

other optimisation

build one big source file

make static the default stuff :

  • -fwhole-program

agregate all source file in one :

  • -combine

Eat lot's of memory

LTO

Extra notes

script to compare code

For comparing function size of 2 binaries, we can use

readelf -W -s prog1.elf | grep FUNC | sort -k8 | sort -n -s -k 3,3 | awk '{ print $3" "$8 }' > dump1
readelf -W -s prog2.elf | grep FUNC | sort -k8 | sort -n -s -k 3,3 | awk '{ print $3" "$8 }' > dump2
diff -u dump1 dump2

Thumb interworking

http://wiki.debian.org/ArmEabiPort#Choice_of_minimum_CPU

Instruction safe for interworking :

  1. mov pc,lr : starting armv7
  2. bx lr : starting armv4t
  3. ldm/ldr : starting armv5t
  4. blx : starting armv5t

This is a shame that arm did add thumb support from the start for normal branch operation

Comments on this page are closed.