# Disassembly of Recursion in C

### 2020-09-25 c assembly arm

Let’s disassemble a recursive function in C to ARM assembly. We can use the textbook example of a recursive factorial function. We’ll play around with optimization levels and touch on Tail Recursion or Tail Calls at the end of the blog post.

## C Code

Below is the C code we’ll use to disassemble. This code is a text book function that implements a factorial using recursion.

``````// file: recursion.c

long int
factorial(int n)
{
if (n>=1)
return n*factorial(n-1);
else
return 1;
}

int
main(int argc, char *argv[])
{
factorial(3);
return 0;
}``````

## Disassembly

Below is the corresponding ARM Assembly resulting from the C Factorial function above. This assembly is compiled with `-O0`, so most optimizations are completely disabled. In the first section we deal with the stack frame. See my post The Stack of Frames in C with ARM Assembly. I highlighted the section dealing with the stack frame. This highlighted section will push the frame pointer, and link register (PC value) onto the stack. It will then set the current value of the frame pointer to the top of the frame and the stack pointer to the bottom of the frame. From there it will store and load some values into the stack. At end the `fp` and `pc` will be popped off the stack. I mention this as the stack frame is a large part of a factorial function. Each time we recurse we need to set up a new stack frame. We’re using memory on the stack each time we push these registers onto the stack. Visualize the function call executing from line 2-11 each time, then branching on line 12 back to line 2. The stack will grow and grow until we either run out of memory, or 12 falls through. The body of the factorial function is quite simple with only a compare, subraction and multiplication.

 `````` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 `````` ``````factorial: push {fp, lr} add fp, sp, #4 sub sp, sp, #8 str r0, [fp, #-8] ldr r3, [fp, #-8] cmp r3, #0 ble .L2 ldr r3, [fp, #-8] sub r3, r3, #1 mov r0, r3 bl factorial mov r2, r0 ldr r3, [fp, #-8] mul r3, r3, r2 b .L3 .L2: mov r3, #1 .L3: mov r0, r3 sub sp, fp, #4 pop {fp, pc}``````

The argument passed into `factorial` named `n` is stored in the register `r0`, the assembly also loads register `r3` with the same value for a comparision. In the C code we evaluate `if(n>=1)`, whereas, the ARM assembly inverts this logic and tests `if(n<=0)` on line 8. Thus, if `n<=0` we will jump to label `.L1` load the value `1` into `r0` and return.

For the math portion of the factorial in C we have:

``n*factorial(n-1)``

This math portion will get converted to the following assembly. Note `r3` contains the C variable `n`:

``````	sub	r3, r3, #1        ; n = n-1
mov	r0, r3            ; put n into r0 for factorial call
bl	factorial         ; recursive call
mov	r2, r0            ; move result of factorial into r2
ldr	r3, [fp, #-8]     ; r3 will also contain n that was passed in
mul	r3, r3, r2        ; multiply n by factorial(n-1)``````

The order of operations are `n-1`, then `factorial(n-1)`, and lastly the multiplication `*`.

## Compiler Optimization of `factorial`

In our original disassembly I left out some annotations. Here are those annotations:

``````factorial:
> @ args = 0, pretend = 0, frame = 8
> @ frame_needed = 1, uses_anonymous_args = 0``````

Take note of the `@ frame_needed = 1` requires many additional instructions. If we re-compile with `-O3` we’ll see the frame is not needed. Also, the code is indeed optimized.

``````factorial:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
subs	r3, r0, #0                 ; r3 = r0
mov	r0, #1                         ; r0 = 1
bxle	lr                         ; return 1 if r0 is less than 1
.L3:
mul	r0, r3, r0                     ; r0 <= n*1
subs	r3, r3, #1                 ; r3 <= n = n -1
bne	.L3                            ; goto .L3 when n != 0
bx	lr``````

You can keep following along the ARM instructions and corresponding comments. Each time the function call will multiply `n*(n-1)` and store the result in `r0`. This will be done until `r3` is 0. This code doesn’t use a stack frame and is essentially a Tail Call or Tail Recursion.

## Tail Recursion

In many references you’ll see Tail Recursion has the last recursive call at the very end. Let’s look at any differences in the disassembly.

``````long int
factorial_tail(int n, int acc) {
if (n==0)
return acc;
else
return factorial_tail(n-1, acc*n);
}``````

Where we can call this function with `long r = factorial_tail(n, 1)`.

### Disassembly of the `factorial_tail` function

``````factorial_tail:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
cmp	r0, #0
beq	.L12
.L9:
mul	r1, r0, r1
subs	r0, r0, #1
bne	.L9
.L12:
mov	r0, r1
bx	lr``````

Indeed the stack frame code is removed, however, it’s not much more optimized than our `factorial(int n)` function. They both have 7 instructions.

Now I can ask the question is Tail Recursion more efficient than a standard Factorial call compiled with `-03` or greater? Leave your answer in the comments below!