Section 10.5 Local Variables on the Stack
We have seen that we can allocate memory space on the stack by subtracting the number of bytes from the stack pointer, since the stack grows toward lower memory addresses. We can then save values on the stack in this newly allocated memory area. Once we have retrieved the saved values, we deallocate the stack memory area by simply adding the number of bytes to the stack pointer.
In this section, we look at a way to use stack memory for local variables. We could use push
and pop
, but if the algorithm requires a value that is not at the top of the stack, I am sure you realize that it would be very tedious to keep track of where all the values are at each point in execution of the function.
Instead, we will create local variables on the stack in the same way we stored saved values there, by simply subtracting the number of bytes required by each variable from the stack pointer. This does not store any data in the variables, it simply sets aside memory that we can use. (Perhaps you have experienced the error of forgetting to initialize a local variable in C!)
Next, we have to figure out a way to access the variables in this reserved data area on the stack. There are no labels in this area of memory, so we cannot directly use a name like we did when accessing memory in the .rodata
segment.
A much simpler solution is to establish a point on the stack and use this address to directly access values on the stack relative to this address. It may seem tempting to use the stack pointer as the reference pointer, but this creates complications if we wish to use the stack within the function.
Subsection 10.5.1 The Frame Pointer
Now you know the purpose of the Frame Pointer. Each function has control over an area on the stack, its Stack Frame. The frame pointer register, fp
, contains an address that serves as a reference point for the stack frame. Items that are stored in the stack frame can be readily accessed relative to the frame pointer using the offset forms of the str
and ldr
instructions. To see how this is done, let us return to the C program in Listing 2.15.2 and look at the compiler-generated assembly language in Listing 10.5.1.
You learned in Section 8.5 that when reading from the keyboard we need to pass an address of a memory location to the reading function. In that Section we were using scanf
, but the same holds true for read
. So we need a local variable to hold the character. The memory for this local variable is allocated on the stack with the instruction:
sub sp, sp, #8 @@ allocate memory for local var
This simply moves the stack pointer eight bytes. It may seem wasteful to allocate eight bytes when only one byte is needed for a char
variable. The protocol specification, Procedure Call Standard for the ARM Architecture[3], provides two constraints on the stack pointer:
The stack pointer must always be aligned to a word (4 bytes) boundary.
The stack pointer must be double-word (8 bytes) aligned at a “public interface.”
“Public interface” essentially means a global function that can be called from anywhere in the program. For example, this function calls write
and read
, which are functions in the C libraries. All the functions used in this book are public, so always align the stack pointer on 8-byte boundaries. This precaution has no effect on performance, but not doing it has the potential of creating obscure bugs in your program.
The location in the stack frame for storing the input character is computed relative to the frame pointer:
sub r3, fp, #5 @@ compute address
This instruction subtracts \(5\) from the address in the frame pointer register and stores the result in register r3
, ready to be passed to the read
function.
Let us switch to my assembly language solution, Listing 10.5.2 for further explanation of how this code works.
The first thing to notice is that I have defined several symbols using the .equ
directive to make it easier to read the code.
.equ STDIN,0 .equ STDOUT,1 .equ aLetter,-5 .equ locals,8
STDIN
and STDOUT
are good names for the keyboard and screen file numbers.
Like the gcc
compiler, I use the byte at \(-5\) from the frame pointer as the char
variable to store the input character. I know that local variables are always at negative offsets from the frame pointer, so I define the symbolic name, aLetter
, as a negative value. Be careful to notice that the values of offsets for the local variables need to take into account the registers that will be saved on the stack. These offsets are relative to the frame pointer, fp
, which points to a place on the stack “below” the saved registers.
And since this is the place where I am defining the local variable offset, it is a good place to also compute and define the amount of (properly aligned) stack space needed for the local variable(s). I called this constant locals
. It is the amount that needs to be subtracted from sp
.
I also recognize that when I call the read
function, the address of aLetter
needs to be in r1
. So instead of using r3
for the computation and then moving it to r1
, I compute it directly into r1
:
add r1, fp, aLetter @ address of aLetter
You may wonder why the addresses of the two messages are not placed in the .rodata
section since they should not be changed by the program. The answer is related to the machine code for the ldr
instruction. All ARM instructions are 32 bits long. Part of those 32 bits need to be used to define the instruction, so the space left for specifying the address is less than 32 bits long. The form of the ldr
instruction in this program uses pc-Relative Addressing. The number that gets stored as part of the ldr
instruction is the distance, in bytes, from the location of the instruction to the location of the 32-bit data (in this case, an address) that the program needs. When the operating system loads a program into memory, we have no way of knowing where the .rodata
section will be located in memory relative to the .text
section. So the assembler cannot compute the distance from the ldr
instruction to the location where the address of the text string is stored. By placing this location in the .text
segment (along with the instructions) the assembler can compute the distance. We will go into more detail in Chapter 11.
Subsection 10.5.2 Designing the Prologue and Epilogue
Figure 10.5.3 shows the state of our stack frame after executing the prologue instructions:
main: stmfd sp!, {fp, lr} @ save caller's info add fp, sp, 4 @ set up our frame pointer add sp, sp, locals @ allocate memory for local var
To generalize:
stmfd sp!, {<registers>} @ save caller's info add fp, sp, <m> @ set up our frame pointer sub sp, sp, <n> @ allocate memory for local var
where <…> are values you need to provide. These three instructions make up the prologue to the function:
Save the
lr
(return address) andfp
(caller's frame pointer), plus any other regiters that need to be saved (Table 10.1.1), on the stack. The registers are specified in the register list,{<registers>}
.Establish the frame pointer for this function. Add \(m = 4 \times (r - 1)\) to the address in the
sp
, where \(r\) is the number of registers saved on the stack. This ensures that the frame pointer will point to the return address.-
Allocate memory on the stack for the local variables in this function. The number subtracted from the
sp
, \(n\text{,}\) must meet two criteria:\(n \geq \) total number of bytes required by local variables.
\((m + n + 4)\) is a multiple of \(8\text{.}\)
I strongly urge you to draw pictures like Figure 10.5.3 when designing your stack frame. It is the only way I am able to get all the numbers to be correct, and I have been designing stack frames for over forty years.
I mention at this point that arithmetic/logic expressions can be used to automatically compute the values of symbols in .equ
statements. I would do this in a production environment, but such automation tends to mask the basic concepts. Helping you learn the basic concepts is the whole point of this book, so I recommend that you avoid such automation. Another reason for avoiding it is that when there is a mistake in the automation expressions, it is much more difficult to debug it.
At this point in the function, the stack frame is treated as a record (or struct
in C), with the frame pointer used as the reference point. When determining the offsets from the fp
to locations of local data in the stack frame, remember to take into account the memory that is occupied by the saved register values. Thus the first available byte for local variables is \(-5\) from the fp
. Records are discussed in more detail in Section 15.3.
The current location of the stack pointer is considered as the “bottom” of the stack during the rest of this function.
After this function has performed its work we deallocate the local variables area on the stack by moving the stack pointer, sp
. Then we restore the register values that were saved on the stack:
add sp, sp, local @ deallocate local var ldmfd sp!, {fp, lr} @ restore caller's info
These two instructions make up the epilogue of the function:
Referring to Figure 10.5.4, subtracting \(n\) from the address in the stack pointer will place it so it now points at the registers we saved so they can be restored. At this point, the values “above” the stack pointer are considered invalid, and thus, deallocated.
-
The second instruction pops two values off the top of the stack into:
the frame pointer, and
the link register.
This restores the calling function's value in the frame pointer and return address.
The state of the stack after deallocating the local variables, but before restoring register contents, is shown in Figure 10.5.4.
It is important to understand that although you know the values above the stack pointer (the gray area in Figure 10.5.4) are probably still in memory, attempting to access them is a violation of stack protocol.
You may wonder why my epilogue and return:
add sp, sp, local @ deallocate local var ldmfd sp!, {fp, lr} @ restore caller's info bx lr @ return
differs from the epilogue and return that the compiler generated:
sub sp, fp, #4 @@ deallocate local var ldmfd sp!, {fp, pc}
I believe that my style makes the prologue and epilogue more symmetrical, thus less prone to errors. In my view, that outweighs the extremely small cost of adding one more instruction.