Stack Management In a Function

Section 10.3 Stack Management In a Function

The protocol that specifies the interaction between functions needs to be followed very precisely, or the program will almost surely not work correctly. The usual result is a crash. In this section we look at how we can use the stack to ensure we follow the correct protocol.

The first issue to consider is how a called function can return to the calling function. A calling function uses the bl instruction to call a function, which places the return address in the lr (r13) register. The called function needs to preserve this address in order to know where to return to. It could simply avoid using this register, but that strategy fails if this function needs to call another function. The solution to this problem is to save the contents of the lr register on the stack.

Looking at Table 10.1.1, we see that the called function only has free use of registers r0–r3. But you even need to treat these with care. If the calling function passed any arguments in these registers, the called function must make sure that it will no longer need the value before changing it in the register. It should be clear that most functions will use the stack quite a bit.

Now we can return to the program in Listing 10.1.4 and describe how the stack is managed in this function. The first two instructions:

push    {fp, lr}
add     fp, sp, #4

set up a portion of the stack for use in this function. This acts as a prologue before performing the algorithm that is the purpose of this function.

The push instruction at the beginning of this function:

Pushes the return address, the value contained in the lr register, onto the stack.
Pushes the caller's frame pointer, the value contained in the fp register, onto the stack.
Updates the sp to show that two 32-bit values have been pushed onto the top of the stack.

The add instruction adds \(4\) to the value in the sp register and stores the sum in the fp register, thus setting the frame pointer for this function such that it points to the location on the stack where the frame pointer of the calling function is stored.

The frame pointer is used as a reference point within the area of the stack that the function is allowed to access, called the stack frame. This will be explained in Section 10.5.

After this function has performed its action, the stack needs to be restored to the state that the calling function was using. This is accomplished with the instruction:

pop     {fp, pc}

So this instruction effectively pops the caller's frame pointer off the top of the stack, back into the fp register. Then the return address is popped into the pc, and the stack pointer, sp, is updated to the new top of the stack. This acts as an epilogue to clean up the stack after performing the algorithm that is the purpose of this function.

We now look at an assembly language version of this program so we can use gdb to observe how the stack actually changes. My assembly language is shown in Listing 10.3.1.

@ helloWorld2.s
@ Hello World program, in assembly language.
@ 2017-09-29: Bob Plantz

@ Define my Raspberry Pi
        .cpu    cortex-a53
        .fpu    neon-fp-armv8
        .syntax unified         @ modern syntax

@ Useful source code constant
        .equ    STDOUT,1

@ Constant program data
        .section  .rodata
        .align  2
helloMsg:
        .asciz	 "Hello, World!\n"
        .equ    helloLngth,.-helloMsg

@ Program code
        .text
        .align  2
        .global main
        .type   main, %function
main:
        sub     sp, sp, 8       @ space for fp, lr
        str     fp, [sp, 0]     @ save fp
        str     lr, [sp, 4]     @   and lr
        add     fp, sp, 4       @ set our frame pointer

        mov     r0, STDOUT      @ file number to write to
        ldr     r1, helloMsgAddr   @ pointer to message
        mov     r2, helloLngth  @ number of bytes to write
        bl      write           @ write the message
        
        mov     r0, 0           @ return 0;
        ldr     fp, [sp, 0]     @ restore caller fp
        ldr     lr, [sp, 4]     @       lr
        add     sp, sp, 8       @   and sp
        bx      lr              @ return
        
        .align  2
helloMsgAddr:
        .word   helloMsg

Listing 10.3.1. “Hello World” program using the write system call function (prog asm).

The first thing you probably notice is that I have used

sub     sp, sp, 8       @ space for fp, lr
str     fp, [sp, 0]     @ save fp
str     lr, [sp, 4]     @   and lr

instead of the push instruction used by the compiler, and

ldr     fp, [sp, 0]     @ restore caller fp
ldr     lr, [sp, 4]     @       lr
add     sp, sp, 8       @   and sp

instead of the pop instruction near the end of the program.

The reasons I have chosen this different technique for using the stack are:

The AARCH64 architecture does not include the push and pop instructions. If you work with ARM assembly language in the future, there is a good chance it will be the AARCH64 architecture, and you will have to use this method. (AARCH64 includes ldp and stp instructions, which allow you to load and store pairs of registers.)
The purpose of this book is to help you learn the details of how a computer works. My implementation explicitly shows how the stack is used to temporarily store data.

Using several instructions to save data on the stack probably looks less efficient to you that using a single instruction. The internal architecture of the AMD employs a pipeline to execute instructions. A description of a pipeline is beyond the scope of this book, but for now, you can think of it as an assembly line for executing instructions. It allows the CPU to be working on several instructions at the same time. In some situations, this explicit way of using the stack may be faster than using push and pop. In any case, the timing differences are negligible in nearly all applications. And this is a book about how things work, not about writing the fastest code.

Before walking through this code with gdb, we see another assembler directive:

.equ    STDOUT,1

which defines the identifier, STDOUT, and gives it the value \(1\text{.}\) In the C version of this program we included the file unistd.h, which defined the identifier STDOUT_FILENO to be \(1\text{.}\) That's a C file, and we are using assembly language here, so we need to do this ourselves.

We can even direct the assembler to do some arithmetic for us:

helloMsg:
        .asciz  "Hello, World!\n"
        .equ    helloLngth,.-helloMsg

The assembler computes the value of this new identifier, helloLngth, as the arithmetic expression “.-helloMsg”. In this context the ‘.’ character means “here” with respect to the memory address. So this arithmetic expression subtracts the memory address of the helloMsg label from “here”, thus giving the length of the text string, including the NUL terminating character.

You can probably figure out that the assembler directive .asciz does the same thing as .ascii but adds the NUL character at the end of the text string.

I loaded this program using gdb and set a breakpoint at the add fp, sp, 4 instruction so I could see how the stack has been set up. I then ran the program:

(gdb) run
Starting program: /home/pi/chap10/helloWorld2 

Breakpoint 1, main () at helloWorld2.s:29
29	        add     fp, sp, 4       @ set our frame pointer

Inspecting the registers gave me (your numbers may vary):

(gdb) i r
r0             0x1	1
r1             0x7efff2f4	2130703092
r2             0x7efff2fc	2130703100
r3             0x1043c	66620
r4             0x10474	66676
r5             0x0	0
r6             0x10314	66324
r7             0x0	0
r8             0x0	0
r9             0x0	0
r10            0x76fff000	1996484608
r11            0x0	0
r12            0x7efff220	2130702880
sp             0x7efff198	0x7efff198
lr             0x76e7b678	1994897016
pc             0x10448	0x10448 <main+12>
cpsr           0x60000010	1610612752

Looking at the stack:

(gdb) x/4xw 0x7efff198
0x7efff198:	0x00000000	0x76e7b678	0x76fa0000	0x7efff2f4

we see that the return address (in the lr register) has been pushed onto the stack. Let us rearrange the display to get a more intuitive view of the stack:

`sp:`	\(\hex{0x7efff198:}\)	\(\hex{0x00000000}\)
	\(\hex{0x7efff19c:}\)	\(\hex{0x76e7b678}\)

Our function has pushed only two 32-bit values onto the stack, so this function's view of the stack ends with these two values. The value at the top of the stack is \(\hex{0x0000000}\) suggesting that the caller (the C runtime environment) has not set up a frame pointer. Regardless of what that code did, we must observe the agreed-upon protocol, so the next instruction sets up a frame pointer for our function. Executing this instruction gives us:

(gdb) si
31	        mov     r0, STDOUT     @ file number to write to
(gdb) i r fp lr sp pc
fp             0x7efff19c	0x7efff19c
lr             0x76e7b678	1994897016
sp             0x7efff198	0x7efff198
pc             0x1044c	0x1044c <main+16>
(gdb) x/4xw 0x7efff198
0x7efff198:	0x00000000	0x76e7b678	0x76fa0000	0x7efff2f4

Now we have established a frame pointer, which serves as a reference point into the stack for this function. The area of the stack this function is allowed to access, as defined by the stack protocol, is called a stack frame. Although the frame pointer is not needed in this function, you will see its usefulness soon. Here is the view of the stack from this function:

`sp:`	\(\hex{0x7eff19f8:}\)	\(\hex{0x00000000}\)
`fp:`	\(\hex{0x7eff19fc:}\)	\(\hex{0x76e7b678}\)

I set a breakpoint at the bx instruction at the end of the program, continued, and checked that the stack was in the same state as at the end of the prologue:

(gdb) cont
Continuing.
Hello, World!

Breakpoint 2, main () at helloWorld2.s:40
40	        bx      lr              @ return
(gdb) i r fp lr sp pc
(gdb) i r fp lr sp pc

fp             0xfffffffc	0xfffffffc
lr             0x76e7b678	1994897016
sp             0x7efff1a0	0x7efff1a0
pc             0x1046c	0x1046c <main+48>
(gdb) x/4xw 0x7efff5f8
0x7eff19f8:	0x00000000	0x76e7f294	0x76fa3000	0x7efff754

The stack pointer, sp, and link register, lr, have been restored to their respective states when this function was entered, and the bx lr instruction transfers control back to the calling function.