Local Variables on the Stack

Section 10.5 Local Variables on the Stack

We have seen that we can allocate memory space on the stack by subtracting the number of bytes from the stack pointer, since the stack grows toward lower memory addresses. We can then save values on the stack in this newly allocated memory area. Once we have retrieved the saved values, we deallocate the stack memory area by simply adding the number of bytes to the stack pointer.

In this section, we look at a way to use stack memory for local variables. We could use push and pop, but if the algorithm requires a value that is not at the top of the stack, I am sure you realize that it would be very tedious to keep track of where all the values are at each point in execution of the function.

Instead, we will create local variables on the stack in the same way we stored saved values there, by simply subtracting the number of bytes required by each variable from the stack pointer. This does not store any data in the variables, it simply sets aside memory that we can use. (Perhaps you have experienced the error of forgetting to initialize a local variable in C!)

Next, we have to figure out a way to access the variables in this reserved data area on the stack. There are no labels in this area of memory, so we cannot directly use a name like we did when accessing memory in the .rodata segment.

A much simpler solution is to establish a point on the stack and use this address to directly access values on the stack relative to this address. It may seem tempting to use the stack pointer as the reference pointer, but this creates complications if we wish to use the stack within the function.

Subsection 10.5.1 The Frame Pointer

Now you know the purpose of the Frame Pointer. Each function has control over an area on the stack, its Stack Frame. The frame pointer register, fp, contains an address that serves as a reference point for the stack frame. Items that are stored in the stack frame can be readily accessed relative to the frame pointer using the offset forms of the str and ldr instructions. To see how this is done, let us return to the C program in Listing 2.15.2 and look at the compiler-generated assembly language in Listing 10.5.1.

        .arch   armv6
        .file   "echoChar1.c"
        .section  .rodata
        .align  2
.LC0:
        .ascii  "Enter one character: \000"
        .align  2
.LC1:
        .ascii  "You entered: \000"
        .text
        .align  2
        .global main
        .syntax unified
        .arm
        .fpu    vfp
        .type   main, %function
main:
        @ args = 0, pretend = 0, frame = 8
        @ frame_needed = 1, uses_anonymous_args = 0
        push    {fp, lr}
        add     fp, sp, #4      @@ set up our frame pointer
        sub     sp, sp, #8      @@ allocate memory for local var
        mov     r2, #21
        ldr     r1, .L3
        mov     r0, #1
        bl      write           @@ prompt user for input
        sub     r3, fp, #5      @@ compute address
        mov     r2, #1          @@ one char
        mov     r1, r3          @@ address for storing input char
        mov     r0, #0          @@ standard in (keyboard)
        bl      read
        mov     r2, #13         @@ nice message to user
        ldr     r1, .L3+4
        mov     r0, #1
        bl      write
        sub     r3, fp, #5      @@ address where char was stored
        mov     r2, #1
        mov     r1, r3
        mov     r0, #1
        bl      write
        mov     r3, #0
        mov     r0, r3
        sub     sp, fp, #4      @@ deallocate local var
        @ sp needed
     pop        {fp, pc}
.L4:
        .align  2
.L3:
        .word   .LC0
        .word   .LC1
        .size   main, .-main
        .ident  "GCC: (Raspbian 6.3.0-18+rpi1) 6.3.0 20170516"

Listing 10.5.1. Echoing single characters entered on the keyboard. (gcc asm).

You learned in Section 8.5 that when reading from the keyboard we need to pass an address of a memory location to the reading function. In that Section we were using scanf, but the same holds true for read. So we need a local variable to hold the character. The memory for this local variable is allocated on the stack with the instruction:

sub     sp, sp, #8      @@ allocate memory for local var

This simply moves the stack pointer eight bytes. It may seem wasteful to allocate eight bytes when only one byte is needed for a char variable. The protocol specification, Procedure Call Standard for the ARM Architecture[3], provides two constraints on the stack pointer:

The stack pointer must always be aligned to a word (4 bytes) boundary.
The stack pointer must be double-word (8 bytes) aligned at a “public interface.”

“Public interface” essentially means a global function that can be called from anywhere in the program. For example, this function calls write and read, which are functions in the C libraries. All the functions used in this book are public, so always align the stack pointer on 8-byte boundaries. This precaution has no effect on performance, but not doing it has the potential of creating obscure bugs in your program.

The location in the stack frame for storing the input character is computed relative to the frame pointer:

sub     r3, fp, #5      @@ compute address

This instruction subtracts \(5\) from the address in the frame pointer register and stores the result in register r3, ready to be passed to the read function.

Let us switch to my assembly language solution, Listing 10.5.2 for further explanation of how this code works.

@ echoChar2.s
@ Prompts user to enter a character and echoes it.
@ 2017-09-29: Bob Plantz

@ Define my Raspberry Pi
        .cpu    cortex-a53
        .fpu    neon-fp-armv8
        .syntax unified         @ modern syntax

@ Useful source code constants
        .equ    STDIN,0
        .equ    STDOUT,1
        .equ    aLetter,-5
        .equ    local,8

@ Constant program data
        .section  .rodata
        .align  2
promptMsg:
        .asciz	 "Enter one character: "
        .equ    promptLngth,.-promptMsg
responseMsg:
        .asciz	 "You entered: "
        .equ    responseLngth,.-responseMsg

@ Program code
        .text
        .align  2
        .global main
        .type   main, %function
main:
        sub     sp, sp, 8       @ space for fp, lr
        str     fp, [sp, 0]     @ save fp
        str     lr, [sp, 4]     @   and lr
        add     fp, sp, 4       @ set our frame pointer
        sub     sp, sp, local   @ allocate memory for local var

        mov     r0, STDOUT      @ prompt user for input
        ldr     r1, promptMsgAddr
        mov     r2, promptLngth
        bl      write

        mov     r0, STDIN       @ from keyboard
        add     r1, fp, aLetter @ address of aLetter
        mov     r2, 1           @ one char
        bl      read

        mov     r0, STDOUT      @ nice message for user
        ldr     r1, responseMsgAddr
        mov     r2, responseLngth
        bl      write

        mov     r0, STDOUT      @ echo user's character
        add     r1, fp, aLetter @ address of aLetter
        mov     r2, 1           @ one char
        bl      write

        mov     r0, 0           @ return 0;
        add     sp, sp, local   @ deallocate local var
        ldr     fp, [sp, 0]     @ restore caller fp
        ldr     lr, [sp, 4]     @       lr
        add     sp, sp, 8       @   and sp
        bx      lr              @ return

@ Addresses of messages
        .align  2
promptMsgAddr:
        .word   promptMsg
responseMsgAddr:
        .word   responseMsg

Listing 10.5.2. Echoing single characters entered on the keyboard. (prog asm).

The first thing to notice is that I have defined several symbols using the .equ directive to make it easier to read the code.

.equ    STDIN,0
.equ    STDOUT,1
.equ    aLetter,-5
.equ    locals,8

STDIN and STDOUT are good names for the keyboard and screen file numbers.

Like the gcc compiler, I use the byte at \(-5\) from the frame pointer as the char variable to store the input character. I know that local variables are always at negative offsets from the frame pointer, so I define the symbolic name, aLetter, as a negative value. Be careful to notice that the values of offsets for the local variables need to take into account the registers that will be saved on the stack. These offsets are relative to the frame pointer, fp, which points to a place on the stack “below” the saved registers.

And since this is the place where I am defining the local variable offset, it is a good place to also compute and define the amount of (properly aligned) stack space needed for the local variable(s). I called this constant locals. It is the amount that needs to be subtracted from sp.

I also recognize that when I call the read function, the address of aLetter needs to be in r1. So instead of using r3 for the computation and then moving it to r1, I compute it directly into r1:

add     r1, fp, aLetter    @ address of aLetter

You may wonder why the addresses of the two messages are not placed in the .rodata section since they should not be changed by the program. The answer is related to the machine code for the ldr instruction. All ARM instructions are 32 bits long. Part of those 32 bits need to be used to define the instruction, so the space left for specifying the address is less than 32 bits long. The form of the ldr instruction in this program uses pc-Relative Addressing. The number that gets stored as part of the ldr instruction is the distance, in bytes, from the location of the instruction to the location of the 32-bit data (in this case, an address) that the program needs. When the operating system loads a program into memory, we have no way of knowing where the .rodata section will be located in memory relative to the .text section. So the assembler cannot compute the distance from the ldr instruction to the location where the address of the text string is stored. By placing this location in the .text segment (along with the instructions) the assembler can compute the distance. We will go into more detail in Chapter 11.

Subsection 10.5.2 Designing the Prologue and Epilogue

Figure 10.5.3 shows the state of our stack frame after executing the prologue instructions:

main:
        stmfd   sp!, {fp, lr}   @ save caller's info
        add     fp, sp, 4       @ set up our frame pointer
        add     sp, sp, locals  @ allocate memory for local var

Figure 10.5.3. The local variable in the program from Listing 10.5.2 is allocated on the stack. Numbers on the left are offsets from the address in the frame pointer (fp) register.

To generalize:

stmfd   sp!, {<registers>}   @ save caller's info
add     fp, sp, <m>          @ set up our frame pointer
sub     sp, sp, <n>          @ allocate memory for local var

where <…> are values you need to provide. These three instructions make up the prologue to the function:

Save the lr (return address) and fp (caller's frame pointer), plus any other regiters that need to be saved (Table 10.1.1), on the stack. The registers are specified in the register list, {<registers>}.
Establish the frame pointer for this function. Add \(m = 4 \times (r - 1)\) to the address in the sp, where \(r\) is the number of registers saved on the stack. This ensures that the frame pointer will point to the return address.
Allocate memory on the stack for the local variables in this function. The number subtracted from the sp, \(n\text{,}\) must meet two criteria:
1. \(n \geq \) total number of bytes required by local variables.
2. \((m + n + 4)\) is a multiple of \(8\text{.}\)

I strongly urge you to draw pictures like Figure 10.5.3 when designing your stack frame. It is the only way I am able to get all the numbers to be correct, and I have been designing stack frames for over forty years.

I mention at this point that arithmetic/logic expressions can be used to automatically compute the values of symbols in .equ statements. I would do this in a production environment, but such automation tends to mask the basic concepts. Helping you learn the basic concepts is the whole point of this book, so I recommend that you avoid such automation. Another reason for avoiding it is that when there is a mistake in the automation expressions, it is much more difficult to debug it.

At this point in the function, the stack frame is treated as a record (or struct in C), with the frame pointer used as the reference point. When determining the offsets from the fp to locations of local data in the stack frame, remember to take into account the memory that is occupied by the saved register values. Thus the first available byte for local variables is \(-5\) from the fp. Records are discussed in more detail in Section 15.3.

The current location of the stack pointer is considered as the “bottom” of the stack during the rest of this function.

After this function has performed its work we deallocate the local variables area on the stack by moving the stack pointer, sp. Then we restore the register values that were saved on the stack:

add     sp, sp, local   @ deallocate local var
ldmfd   sp!, {fp, lr}   @ restore caller's info

These two instructions make up the epilogue of the function:

Referring to Figure 10.5.4, subtracting \(n\) from the address in the stack pointer will place it so it now points at the registers we saved so they can be restored. At this point, the values “above” the stack pointer are considered invalid, and thus, deallocated.
The second instruction pops two values off the top of the stack into:
1. the frame pointer, and
2. the link register.
This restores the calling function's value in the frame pointer and return address.

The state of the stack after deallocating the local variables, but before restoring register contents, is shown in Figure 10.5.4.

Figure 10.5.4. Local variable stack area in the program from Listing 10.5.2 before restoring the caller's frame pointer and the link register. Although the values in the gray area may remain they are invalid; using them at this point is a programming error.

It is important to understand that although you know the values above the stack pointer (the gray area in Figure 10.5.4) are probably still in memory, attempting to access them is a violation of stack protocol.

You may wonder why my epilogue and return:

add     sp, sp, local   @ deallocate local var
ldmfd   sp!, {fp, lr}   @ restore caller's info
bx      lr              @ return

differs from the epilogue and return that the compiler generated:

sub     sp, fp, #4      @@ deallocate local var
ldmfd   sp!, {fp, pc}

I believe that my style makes the prologue and epilogue more symmetrical, thus less prone to errors. In my view, that outweighs the extremely small cost of adding one more instruction.