Section 10.3 Stack Management In a Function
The protocol that specifies the interaction between functions needs to be followed very precisely, or the program will almost surely not work correctly. The usual result is a crash. In this section we look at how we can use the stack to ensure we follow the correct protocol.
The first issue to consider is how a called function can return to the calling function. A calling function uses the bl
instruction to call a function, which places the return address in the lr
(r13
) register. The called function needs to preserve this address in order to know where to return to. It could simply avoid using this register, but that strategy fails if this function needs to call another function. The solution to this problem is to save the contents of the lr
register on the stack.
Looking at Table 10.1.1, we see that the called function only has free use of registers r0
–r3
. But you even need to treat these with care. If the calling function passed any arguments in these registers, the called function must make sure that it will no longer need the value before changing it in the register. It should be clear that most functions will use the stack quite a bit.
Now we can return to the program in Listing 10.1.4 and describe how the stack is managed in this function. The first two instructions:
push {fp, lr} add fp, sp, #4
set up a portion of the stack for use in this function. This acts as a prologue before performing the algorithm that is the purpose of this function.
The push
instruction at the beginning of this function:
Pushes the return address, the value contained in the
lr
register, onto the stack.Pushes the caller's frame pointer, the value contained in the
fp
register, onto the stack.Updates the
sp
to show that two 32-bit values have been pushed onto the top of the stack.
The add
instruction adds \(4\) to the value in the sp
register and stores the sum in the fp
register, thus setting the frame pointer for this function such that it points to the location on the stack where the frame pointer of the calling function is stored.
The frame pointer is used as a reference point within the area of the stack that the function is allowed to access, called the stack frame. This will be explained in Section 10.5.
After this function has performed its action, the stack needs to be restored to the state that the calling function was using. This is accomplished with the instruction:
pop {fp, pc}
So this instruction effectively pops the caller's frame pointer off the top of the stack, back into the fp
register. Then the return address is popped into the pc
, and the stack pointer, sp
, is updated to the new top of the stack. This acts as an epilogue to clean up the stack after performing the algorithm that is the purpose of this function.
We now look at an assembly language version of this program so we can use gdb
to observe how the stack actually changes. My assembly language is shown in Listing 10.3.1.
The first thing you probably notice is that I have used
sub sp, sp, 8 @ space for fp, lr str fp, [sp, 0] @ save fp str lr, [sp, 4] @ and lr
instead of the push
instruction used by the compiler, and
ldr fp, [sp, 0] @ restore caller fp ldr lr, [sp, 4] @ lr add sp, sp, 8 @ and sp
instead of the pop
instruction near the end of the program.
The reasons I have chosen this different technique for using the stack are:
- The AARCH64 architecture does not include the
push
andpop
instructions. If you work with ARM assembly language in the future, there is a good chance it will be the AARCH64 architecture, and you will have to use this method. (AARCH64 includesldp
andstp
instructions, which allow you to load and store pairs of registers.) - The purpose of this book is to help you learn the details of how a computer works. My implementation explicitly shows how the stack is used to temporarily store data.
Using several instructions to save data on the stack probably looks less efficient to you that using a single instruction. The internal architecture of the AMD employs a pipeline to execute instructions. A description of a pipeline is beyond the scope of this book, but for now, you can think of it as an assembly line for executing instructions. It allows the CPU to be working on several instructions at the same time. In some situations, this explicit way of using the stack may be faster than using push
and pop
. In any case, the timing differences are negligible in nearly all applications. And this is a book about how things work, not about writing the fastest code.
Before walking through this code with gdb
, we see another assembler directive:
.equ STDOUT,1
which defines the identifier, STDOUT
, and gives it the value \(1\text{.}\) In the C version of this program we included the file unistd.h
, which defined the identifier STDOUT_FILENO
to be \(1\text{.}\) That's a C file, and we are using assembly language here, so we need to do this ourselves.
We can even direct the assembler to do some arithmetic for us:
helloMsg: .asciz "Hello, World!\n" .equ helloLngth,.-helloMsg
The assembler computes the value of this new identifier, helloLngth
, as the arithmetic expression “.-helloMsg
”. In this context the ‘.
’ character means “here” with respect to the memory address. So this arithmetic expression subtracts the memory address of the helloMsg
label from “here”, thus giving the length of the text string, including the NUL
terminating character.
You can probably figure out that the assembler directive .asciz
does the same thing as .ascii
but adds the NUL
character at the end of the text string.
I loaded this program using gdb
and set a breakpoint at the add fp, sp, 4
instruction so I could see how the stack has been set up. I then ran the program:
(gdb) run
Starting program: /home/pi/chap10/helloWorld2
Breakpoint 1, main () at helloWorld2.s:29
29 add fp, sp, 4 @ set our frame pointer
Inspecting the registers gave me (your numbers may vary):
(gdb) i r
r0 0x1 1
r1 0x7efff2f4 2130703092
r2 0x7efff2fc 2130703100
r3 0x1043c 66620
r4 0x10474 66676
r5 0x0 0
r6 0x10314 66324
r7 0x0 0
r8 0x0 0
r9 0x0 0
r10 0x76fff000 1996484608
r11 0x0 0
r12 0x7efff220 2130702880
sp 0x7efff198 0x7efff198
lr 0x76e7b678 1994897016
pc 0x10448 0x10448 <main+12>
cpsr 0x60000010 1610612752
Looking at the stack:
(gdb) x/4xw 0x7efff198
0x7efff198: 0x00000000 0x76e7b678 0x76fa0000 0x7efff2f4
we see that the return address (in the lr
register) has been pushed onto the stack. Let us rearrange the display to get a more intuitive view of the stack:
sp: |
\(\hex{0x7efff198:}\) | \(\hex{0x00000000}\) |
\(\hex{0x7efff19c:}\) | \(\hex{0x76e7b678}\) |
Our function has pushed only two 32-bit values onto the stack, so this function's view of the stack ends with these two values. The value at the top of the stack is \(\hex{0x0000000}\) suggesting that the caller (the C runtime environment) has not set up a frame pointer. Regardless of what that code did, we must observe the agreed-upon protocol, so the next instruction sets up a frame pointer for our function. Executing this instruction gives us:
(gdb) si 31 mov r0, STDOUT @ file number to write to (gdb) i r fp lr sp pc fp 0x7efff19c 0x7efff19c lr 0x76e7b678 1994897016 sp 0x7efff198 0x7efff198 pc 0x1044c 0x1044c <main+16> (gdb) x/4xw 0x7efff198 0x7efff198: 0x00000000 0x76e7b678 0x76fa0000 0x7efff2f4
Now we have established a frame pointer, which serves as a reference point into the stack for this function. The area of the stack this function is allowed to access, as defined by the stack protocol, is called a stack frame. Although the frame pointer is not needed in this function, you will see its usefulness soon. Here is the view of the stack from this function:
sp: |
\(\hex{0x7eff19f8:}\) | \(\hex{0x00000000}\) |
fp: |
\(\hex{0x7eff19fc:}\) | \(\hex{0x76e7b678}\) |
I set a breakpoint at the bx
instruction at the end of the program, continued, and checked that the stack was in the same state as at the end of the prologue:
(gdb) cont Continuing. Hello, World! Breakpoint 2, main () at helloWorld2.s:40 40 bx lr @ return (gdb) i r fp lr sp pc (gdb) i r fp lr sp pc fp 0xfffffffc 0xfffffffc lr 0x76e7b678 1994897016 sp 0x7efff1a0 0x7efff1a0 pc 0x1046c 0x1046c <main+48> (gdb) x/4xw 0x7efff5f8 0x7eff19f8: 0x00000000 0x76e7f294 0x76fa3000 0x7efff754
The stack pointer, sp
, and link register, lr
, have been restored to their respective states when this function was entered, and the bx lr
instruction transfers control back to the calling function.