Let’s disassemble a recursive function in C to ARM assembly. We can use the textbook example of a recursive factorial function. We’ll play around with optimization levels and touch on Tail Recursion or Tail Calls at the end of the blog post.
C Code
Below is the C code we’ll use to disassemble. This code is a text book function that implements a factorial using recursion.
// file: recursion.c
long int
factorial(int n)
{
if (n>=1)
return n*factorial(n1);
else
return 1;
}
int
main(int argc, char *argv[])
{
factorial(3);
return 0;
}
Disassembly
Below is the corresponding ARM Assembly resulting from the C Factorial function above. This assembly is compiled with O0
, so most optimizations are completely disabled. In the first section we deal with the stack frame. See my post The Stack of Frames in C with ARM Assembly. I highlighted the section dealing with the stack frame. This highlighted section will push the frame pointer, and link register (PC value) onto the stack. It will then set the current value of the frame pointer to the top of the frame and the stack pointer to the bottom of the frame. From there it will store and load some values into the stack. At end the fp
and pc
will be popped off the stack. I mention this as the stack frame is a large part of a factorial function. Each time we recurse we need to set up a new stack frame. We’re using memory on the stack each time we push these registers onto the stack. Visualize the function call executing from line 211 each time, then branching on line 12 back to line 2. The stack will grow and grow until we either run out of memory, or 12 falls through. The body of the factorial function is quite simple with only a compare, subtraction and multiplication.


The argument passed into factorial
named n
is stored in the register r0
, the assembly also loads register r3
with the same value for a comparison. In the C code we evaluate if(n>=1)
, whereas, the ARM assembly inverts this logic and tests if(n<=0)
on line 8. Thus, if n<=0
we will jump to label .L1
load the value 1
into r0
and return.
For the math portion of the factorial in C we have:
n*factorial(n1)
This math portion will get converted to the following assembly. Note r3
contains the C variable n
:
sub r3, r3, #1 ; n = n1
mov r0, r3 ; put n into r0 for factorial call
bl factorial ; recursive call
mov r2, r0 ; move result of factorial into r2
ldr r3, [fp, #8] ; r3 will also contain n that was passed in
mul r3, r3, r2 ; multiply n by factorial(n1)
The order of operations are n1
, then factorial(n1)
, and lastly the multiplication *
.
Compiler Optimization of factorial
In our original disassembly I left out some annotations. Here are those annotations:
factorial:
> @ args = 0, pretend = 0, frame = 8
> @ frame_needed = 1, uses_anonymous_args = 0
Take note of the @ frame_needed = 1
requires many additional instructions. If we recompile with O3
we’ll see the frame is not needed. Also, the code is indeed optimized.
factorial:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
subs r3, r0, #0 ; r3 = r0
mov r0, #1 ; r0 = 1
bxle lr ; return 1 if r0 is less than 1
.L3:
mul r0, r3, r0 ; r0 <= n*1
subs r3, r3, #1 ; r3 <= n = n 1
bne .L3 ; goto .L3 when n != 0
bx lr
You can keep following along the ARM instructions and corresponding comments. Each time the function call will multiply n*(n1)
and store the result in r0
. This will be done until r3
is 0. This code doesn’t use a stack frame and is essentially a Tail Call or Tail Recursion.
Tail Recursion
In many references you’ll see Tail Recursion has the last recursive call at the very end. Let’s look at any differences in the disassembly.
long int
factorial_tail(int n, int acc) {
if (n==0)
return acc;
else
return factorial_tail(n1, acc*n);
}
Where we can call this function with long r = factorial_tail(n, 1)
.
Disassembly of the factorial_tail
function
factorial_tail:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
cmp r0, #0
beq .L12
.L9:
mul r1, r0, r1
subs r0, r0, #1
bne .L9
.L12:
mov r0, r1
bx lr
Indeed the stack frame code is removed, however, it’s not much more optimized than our factorial(int n)
function. They both have 7 instructions.
Now I can ask the question is Tail Recursion more efficient than a standard Factorial call compiled with 03
or greater? Leave your answer in the comments below!