Function calls in the C programming language make heavy use of the stack, also called the call stack. When functions are called they create the so-called stack of frames. Each function call creates a frame, and these frames are allocated on the stack. This stack frame is used to allocate memory for local variables and intermediate values. The stack frame also contains the previous frame pointer and program counter’s value to execute from once the frame is popped off the stack. We will disassemble C function calls to understand the stack of frames in ARM assembly.
Animation of C code being Executed on an ARM Processor
The animation below shows the execution of the C function add(1,2)
. Before the add
function is called, the caller, has stored the function arguments of 1
into r0
, and the value 2
into r1
. When the add
function returns it will have the value 3
which will be stored in the r0
register, overwriting the first argument when the function was called.
In three columns we have C Code, corresponding ARM Assembly, and the Stack alongside one another. As the C Code executes we can see the corresponding assembly code. Note, there are typically multiple instructions for one line of C code. The right most diagram shows the stack and what is pushed onto the stack. Watch the diagram for a couple of repetitions and then we’ll get into the theory, then explain some details this diagram leaves out.
Details not shown in the Animation
These are some deep details. Feel free to skip to the other example below for another explanation.
As mentioned above, the arguments of the function are in registers r0
and r1
before the function is called.
The frame, which is 3 words long. These 3 words hold a value for fp
, lr
and the local variable int c
. Depending on the optimization and what is done the frame could be bigger and the arguments to the function int a
, and int b
could be stored in the frame. Notice that the r3
register which is the result of c = a + b
is stored into the frame.
We make another function call to some_func
. Because this function call requires a bl
we have to previously push lr
onto the stack so that the program counter can be restored. On the bl some_func
instruction at 0x00010428
the lr
will have the value 0x0001042c
. This is because the some_func
function call will end with a bx lr
as it will push lr
onto the stack. If we didn’t have another function call in add
, we would have no need to push lr
onto the stack, and this is in fact what gcc
will do if there isn’t another function call.
Due to the function call some_func
we are storing the value of r3
into the frame to protect if r3
is destroyed. The frame gives us a way to protect what is local to our function when other functions are called since values can be stored outside of registers. We then restore r3
from the frame into r0
where the return of the function is stored.
Again, with the function some_func
another frame is created on the stack by this function call. This frame is then popped off the stack leaving us with the frame of 3 words that add
has put on the stack.
Before the animation, the caller of add
executed a bl add
instruction which would store in the lr
the instruction right after the C code of add(1,2)
What you need to Know First
To understand how the stack of frames works the following is required knowledge.
- Call Stack: The call stack - generally called the stack stores information about active function calls in a program. The stack starts at high memory and goes lower.
- ARM
push/pop
Instructions: These instructions allow us to push registers onto, and pop registers off a full descending stack. - The
sp
register: Thesp
register stands for stack pointer which stores the value of the top of the stack. Apush
will decrement the stack pointer by 1 word or 4 bytes on a 32-bit ARM machine and store the value where thesp
is pointing to. Thepop
instruction will restore values from the stack into registers and increment the stack pointer. - The
fp
register. The frame pointer register stores the value of the stack just before the function is called. It points to the top of the frame. From the value of thefp
down to the value of thesp
is the “frame” that is allocated for the function call. Thefp
register isr11
. - The
lr
register. Thelr
stores a value of an instruction for thepc
to execute from after the function call. When the branch function call is made to call the function thelr
will be the instruction after the function call. So once the function returns thepc
can pick up at the instruction directly after the function call is over. - The
bl/bx
functions: Understanding of thebl
andbx
instructions. Thebl
instruction places the return address in thelr
and sets thepc
to the address of the subroutine. Thebx
instruction sets the value of thepc
to the value of thelr
and starts executing from there. - Addressing Modes. Understanding of the offset, pre-indexed, and post-indexed addressing mode . Are generally essential, I did the math below so you can connect the dots.
- How registers correspond to Function Calls: Arguments to function calls are passed to registers
r0-r3
, and the return value is placed inr0
. There are calling conventions for Arm which we won’t discuss here.
Explanation of Another Example in C
Let’s look at a full picture with the following example:
int one(int, int);
int two(int, int);
int three(int, int);
int
main(int argc, char *argv[])
{
int ia, ib, ic;
ia = 1;
ib = 2;
ic = one(ia, ib);
return ic;
}
int
one(int a, int b)
{
int c;
c = two(a,b);
return c;
}
int
two(int a, int b)
{
int c;
c = three(a,b);
return c;
}
int
three(int a, int b)
{
int c;
c = a+b;
return c;
}
Description of the Stack of Frames
Let’s describe how the stack of frames will look:
- Four Frames will be allocated for functions
main
,one
,two
andthree
- When
main
is called it will have valuesargc
andargv
stored inr0
andr1
- When
main
completes it will have the value ofc
in registerr0
- The frame for
main
will allocate space forfp
,lr
,int ia
,int ib
andint ic
at least. You’ll see more space allocated typically around twenty words. - The functions
one
,two
, andthree
will have valuesint a
, andint b
stored inr0
andr1
- The functions
one
,two
, andthree
will have the return value in registerr0
- The function
three
will not need to preserve thelr
since it doesn’t call any other functions
Disassembly of the Example
We can disassemble this example to see the instructions using gdb
. I find the disassemble
function in gdb
much better than looking at the .s
file from gcc
. This is compiled with CFLAGS=-O0 -g
. You’ll notice with -O0
that the code can definitely be optimized. This is especially present when arguments to functions are pushed into the frame and pull back without being modified.
I’ve added numerous comments to see what the lr
, fp
and sp
are doing. You’ll probably need a calculator. The best way to do this is to compile the example and run gdb
. You can then inspect memory with p/x *(0x0xbefff4d8)
and x/20w 0xbefff4d8
for example. The registers can be viewed with info registers
.
Here is the disassembly:
(gdb) disassemble main
Dump of assembler code for function main:
0x000103d0 <+0>: push {r11, lr} ; lr=0xbfe84718 r11 at lowest address
0x000103d4 <+4>: add r11, sp, #4 ; r11=fp=0x0
0x000103d8 <+8>: sub sp, sp, #24 ; sp=0xbefff4d8, frame is size 28=24+4
0x000103dc <+12>: str r0, [r11, #-24] ; 0xffffffe8
0x000103e0 <+16>: str r1, [r11, #-28] ; 0xffffffe4
0x000103e4 <+20>: mov r3, #1
0x000103e8 <+24>: str r3, [r11, #-8]
0x000103ec <+28>: mov r3, #2
0x000103f0 <+32>: str r3, [r11, #-12]
0x000103f4 <+36>: ldr r1, [r11, #-12]
0x000103f8 <+40>: ldr r0, [r11, #-8]
0x000103fc <+44>: bl 0x10414 <one> ; here the lr will be set to 0x00010400
0x00010400 <+48>: str r0, [r11, #-16] ; r0 has the return value from function one
0x00010404 <+52>: ldr r3, [r11, #-16]
0x00010408 <+56>: mov r0, r3 ; r0 will return with the value of int ic
0x0001040c <+60>: sub sp, r11, #4 ; point sp one word above fp
0x00010410 <+64>: pop {r11, pc} ; pc will be restored to 0xbfe84718
End of assembler dump.
(gdb) disassemble one
Dump of assembler code for function one:
0x00010414 <+0>: push {r11, lr} ; lr=0x00010400 r11=fp=0xbefff4d0
0x00010418 <+4>: add r11, sp, #4 ; r11=fp=0xbefff4d4
0x0001041c <+8>: sub sp, sp, #16 ; sp=0xbefff4c0 frame is size 20=16+4
0x00010420 <+12>: str r0, [r11, #-16]
0x00010424 <+16>: str r1, [r11, #-20] ; 0xffffffec
0x00010428 <+20>: ldr r1, [r11, #-20] ; 0xffffffec
0x0001042c <+24>: ldr r0, [r11, #-16]
0x00010430 <+28>: bl 0x10448 <two> ; lr will be 0x00010434
0x00010434 <+32>: str r0, [r11, #-8]
0x00010438 <+36>: ldr r3, [r11, #-8]
0x0001043c <+40>: mov r0, r3
0x00010440 <+44>: sub sp, r11, #4 ; point sp one word above fp
0x00010444 <+48>: pop {r11, pc} ; fp=0xbefff4f4, lr=0x00010400
End of assembler dump.
(gdb) disassemble two
Dump of assembler code for function two:
0x00010448 <+0>: push {r11, lr} ; lr=0x00010434, r11=fp=0xbefff4d4
0x0001044c <+4>: add r11, sp, #4 ; fp=0xbefff4bc
0x00010450 <+8>: sub sp, sp, #16 ; sp=0xbefff4a8 frame is 20=16+4 words
0x00010454 <+12>: str r0, [r11, #-16]
0x00010458 <+16>: str r1, [r11, #-20] ; 0xffffffec
0x0001045c <+20>: ldr r1, [r11, #-20] ; 0xffffffec
0x00010460 <+24>: ldr r0, [r11, #-16]
0x00010464 <+28>: bl 0x1047c <three> ; lr will be set to 0x00010468
0x00010468 <+32>: str r0, [r11, #-8]
0x0001046c <+36>: ldr r3, [r11, #-8]
0x00010470 <+40>: mov r0, r3
0x00010474 <+44>: sub sp, r11, #4
0x00010478 <+48>: pop {r11, pc}
End of assembler dump.
(gdb) disassemble three
Dump of assembler code for function three:
0x0001047c <+0>: push {r11} ; (str r11, [sp, #-4]!) NOTICE no lr!!
0x00010480 <+4>: add r11, sp, #0 ; dont add #4 here since no frp=0xbefff4a4
0x00010484 <+8>: sub sp, sp, #20 ; stack is size 20 sp=0xbfff490
0x00010488 <+12>: str r0, [r11, #-16]
0x0001048c <+16>: str r1, [r11, #-20] ; 0xffffffec
0x00010490 <+20>: ldr r2, [r11, #-16]
0x00010494 <+24>: ldr r3, [r11, #-20] ; 0xffffffec
0x00010498 <+28>: add r3, r2, r3
0x0001049c <+32>: str r3, [r11, #-8]
0x000104a0 <+36>: ldr r3, [r11, #-8]
0x000104a4 <+40>: mov r0, r3
0x000104a8 <+44>: add sp, r11, #0
0x000104ac <+48>: pop {r11} ; (ldr r11, [sp], #4)
0x000104b0 <+52>: bx lr ; lr=0x10468
End of assembler dump.
(gdb)
The Stack Frame in the Real World
From what I’ve seen by disassembling functions is that stack frame isn’t always necessary and can be optimized out for performance. See this post Disassembly of Recurision in C where the stack frame is removed by the compiler with optimization.