Good software engineering practice generally includes breaking problems down into functionally distinct subproblems. This leads to software solutions with many functions, each of which solves a subproblem. This “divide and conquer” approach has some distinct advantages:
The main disadvantage of breaking a problem down like this is coordinating the many subsolutions so that they work together correctly to provide a correct overall solution. In software, this translates to making sure that the interface between a calling function and a called function works correctly. In order to ensure correct operation of the interface, it must be specified in a very explicit way.
In Chapter 8 you learned how to pass arguments into a function and call it. In this chapter you will learn how to use these arguments inside the called function.
Be careful to distinguish data input/output to/from a called function from user input/output. User input typically comes from an input device (keyboard, mouse, etc.) and user output is typically sent to an output device (screen, printer, speaker, etc.).
Functions can interact with the data in other parts of the program in three ways:
All three interactions can be performed if the called function also knows the location of the data item. This can be done by the calling function passing the address to the called function or by making the address globally known to both functions. Updates require that the address be known by the called function.
Outputs can also be implemented by placing the new data item in a location that is accessible to both the called and the calling function. In C/C++ this is done by placing the return value from a function in the eax register. And inputs can be implemented by passing a copy of the data item to the called function. In both of these cases the called function does not know the location of the original data item, and thus does not have access to it.
In addition to global data, C syntax allows three ways for functions to exchange data:
The last method, pass by pointer, can also be used to pass large inputs, or to pass inputs that should be changed — also called updates. It is also the method by which C++ implements pass by reference.
When one function calls another, the information that is required to provide the interface between the two is called an activation record. Since both the registers and the call stack are common to all the functions within a program, both the calling function and the called function have access to them. So arguments can be passed either in registers or on the call stack. Of course, the called function must know exactly where each of the arguments is located when program flow transfers to it.
In principle, the locations of arguments need only be consistent within a program. As long as all the programmers working on the program observe the same rules, everything should work. However, designing a good set of rules for any real-world project is a very time-consuming process. Fortunately, the ABI [25] for the x86-64 architecture specifies a good set of rules. They rules are very tedious because they are meant to cover all possible situations. In this book we will consider only the simpler rules in order to get an overall picture of how this works.
In 64-bit mode six of the general purpose registers and a portion of the call stack are used for the activation record. The area of the stack used for the activation record is called a stack frame. Within any function, the stack frame contains the following information:
and often includes:
Some general memory usage rules (64-bit mode) are:
We can see how this works by studying the program in Listing 11.1.
The compiler-generated assembly language for the sumInts function is shown in Listing 11.2 with comments added.
After saving the caller’s frame pointer and establishing its own frame pointer, this function stores the argument values in the local variable area:
The arguments are in the following registers (see Table 8.2, page 548):
Storing them in the local variable area frees up the registers so they can be used in this function. Although this is not very efficient, the compiler does not need to work very hard to optimize register usage within the function. The only local variable, overflow, is initialized on line 11.
The observant reader will note that no memory has been allocated on the stack for local variables or saving the arguments. The ABI [25] defines the 128 bytes beyond the stack pointer — that is, the 128 bytes at addresses lower than the one in the rsp register — as a red zone. The operating system is not allowed to use this area, so the function can use it for temporary storage of values that do not need to be saved when another function is called. In particular, leaf functions can store local variables in this area without moving the stack pointer because they do not call other functions.
Notice that both the argument save area and the local variable area are aligned on 16-byte address boundaries. Figure 11.1 provides a pictorial view of where the three arguments and the local variable are in the red zone.
As you know, some functions take a variable number of arguments. In these functions, the ABI [25] specifies the relative offsets of the register save area. The offsets are shown in Table 11.1.
One of the problems with the C version of sumInts is that it requires a separate check for overflow:
Writing the function in assembly language allows us to directly check the overflow flag, as shown in Listing 11.3.
The code to perform the addition and overflow check is much simpler.
The body of the function begins by assuming there will not be overflow, so 0 is stored in eax, ready to be the return value. The value of the first argument is added to the second, because the programmer realizes that the values in the argument registers do not need to be saved. If this addition produces overflow, the cmovo instruction changes the return value to 1. Finally, in either case the sum is stored at the memory location whose address was passed to the function as the third argument.
When a calling function needs to pass more than six arguments to another function, the additional arguments beyond the first six are passed on the call stack. They are effectively pushed onto the stack in eight-byte chunks before the call. The order of pushing is from right to left in the C argument list. (As you will see shortly the compiler actually uses a more efficient method than pushes.) Since these arguments are on the call stack, they are within the called function’s stack frame, so the called function can access them.
Consider the program in Listing 11.4.
The assembly language generated by gcc from the program in Listing 11.4 is shown in Listing 11.5, with comments added to explain parts of the code.
Before main calls sumNine the values of the second through sixth arguments, b – f, are moved to the appropriate registers, and the first argument, a is loaded into a temporary register:
The the values of the seventh, eighth, and ninth arguments, g – i, are moved to their appropriate locations on the call stack. Enough space was allocated at the beginning of the function to allow for these arguments. They are moved into their correct locations on lines 27 – 32:
The stack pointer, rsp, is used as the reference point for storing the arguments on the stack here because the main function is starting a new stack frame for the function it is about to call, sumNine.
Then the first argument, a, is moved to the appropriate register:
When program control is transferred to the sumNine function, the partial stack frame appears as shown in Figure 11.2. Even though each argument is only four bytes (int), each is passed in an 8-byte portion of stack memory. Compare this with passing arguments in registers; only one data item is passed per register even if the data item does not take up the entire eight bytes in the register.
The return address is at the top of the stack, immediately followed by the three arguments (beyond the six passed in registers). Notice that each argument is in the same position on the stack as it would have been if it had been pushed onto the stack just before the call instruction. Since the address in the stack pointer (rsp) was 16-byte aligned before the call to this function, and the call instruction pushed the 8-byte return address onto the stack, the address in rsp is now 8-byte aligned.
The prologue of sumNine completes the stack frame. Then the function saves the register arguments in the register save area of the stack frame:
The state of the stack frame at this point is shown in Figure 11.3.
You may question why the compiler did not simply use the red zone. The sumNine function is not a leaf function. It calls another function, which may require use of the call stack. So space must be explicitly allocated on the call stack for local variables and the register argument save areas.
By the way, the compiler has replaced this function call, a call to printf, with a call to puts:
Since the only thing to be written to the screen is a text string, the puts function is equivalent.
After the register arguments are safely stored in the argument save area, they can be easily summed and the total saved in the local variable:
Notice that the seventh, eighth, and ninth arguments are accessed by positive offsets from the frame pointer, rbp. They were stored in the stack frame by the calling function. The called function “owns” the entire stack frame so it does not need to make additional copies of these arguments.
It is important to realize that once the stack frame has been completed within a function, that area of the call stack cannot be treated as a stack. That is, it cannot be accessed through pushes and pops. It must be treated as a record. (You will learn more about records in Section 13.2, page 802.)
If we were to recompile these functions with higher levels of optimization, many of these assembly language operations would be removed (see Exercise 11-2). But the point here is to examine the mechanisms that can be used to work with arguments and to write easily read code, so we study the unoptimized code.
A version of this program written in assembly language is shown in Listing 11.6.
The assembly language programmer realizes that all nine integers can be summed in the sumNine function before it calls another function. In addition, none of the values will be needed after this summation. So there is no reason to store the register arguments locally:
However, the edi register will be needed for passing an argument to puts, so the total is saved in a local variable in the stack frame:
Then it is loaded into eax for return to the calling function:
The overall pattern of a stack frame is shown in Figure 11.4. The rbp register serves as the frame pointer to the stack frame. Once the frame pointer address has been established in a function, its value must never be changed. The return address is always located +8 bytes offset from the frame pointer. Arguments to the function are positive offsets from the frame pointer, and local variables are negative offsets from the frame pointer.
It is essential that you follow the register usage and argument passing disciplines precisely. Any deviation can cause errors that are very difficult to debug.
The best way to design a stack frame for a function is to make a drawing on paper following the pattern in Figure 11.3. Show all the local variables and arguments to the function. To be safe, assume that all the register-passed arguments will be saved in the function. Compute and write down all the offset values on your drawing. When writing the source code for your function, use the .equ directive to give meaningful names to each of the numerical offsets. If you do this planning before writing the executable code, you can simply use the name(%rbp) syntax to access the value stored at name .
In 32-bit mode, all arguments are passed on the call stack. The 32-bit assembly language generated by gcc is shown in Listing 11.7.
The argument passing sequence can be seen on lines 22 – 39 in the main function. Rather than pushing each argument onto the stack, the compiler has used the technique of allocating space on the stack for the arguments, then storing each argument directly in the appropriate location. The result is the same as if they had been pushed onto the stack, but the direct storage technique is more efficient.
I find it odd that the compiler writer has chosen to set up a base pointer in ebp but not used it in this function. This is NOT a recommended technique when writing in assembly language.
The state of the call stack just before calling the nineInts function is shown in Figure 11.5. Comparing this with the 64-bit version in Figure 11.3, we see that the local variables are treated in essentially the same way. But the 32-bit version differs in the way it passes arguments:
This summary shows the assembly language instructions introduced thus far in the book. The page number where the instruction is explained in more detail, which may be in a subsequent chapter, is also given. This book provides only an introduction to the usage of each instruction. You need to consult the manuals ([2] – [6], [14] – [18]) in order to learn all the possible uses of the instructions.
data movement: | ||||
opcode | source | destination | action | page |
cbtw | convert byte to word, al → ax | 696 | ||
cwtl | convert word to long, ax → eax | 696 | ||
cltq | convert long to quad, eax → rax | 696 | ||
cmovcc | %reg/mem | %reg | conditional move | 706 |
movs | $imm/%reg | %reg/mem | move | 506 |
movs | mem | %reg | move | 506 |
movsss | $imm/%reg | %reg/mem | move, sign extend | 693 |
movzss | $imm/%reg | %reg/mem | move, zero extend | 693 |
popw | %reg/mem | pop from stack | 566 | |
pushw | $imm/%reg/mem | push onto stack | 566 | |
s = b, w, l, q; w = l, q; cc = condition codes
| ||||
arithmetic/logic:
| ||||
opcode | source | destination | action | page |
adds | $imm/%reg | %reg/mem | add | 607 |
adds | mem | %reg | add | 607 |
cmps | $imm/%reg | %reg/mem | compare | 676 |
cmps | mem | %reg | compare | 676 |
decs | %reg/mem | decrement | 699 | |
incs | %reg/mem | increment | 698 | |
leaw | mem | %reg | load effective address | 579 |
subs | $imm/%reg | %reg/mem | subtract | 612 |
subs | mem | %reg | subtract | 612 |
tests | $imm/%reg | %reg/mem | test bits | 676 |
tests | mem | %reg | test bits | 676 |
s = b, w, l, q; w = l, q
| ||||
program flow control:
| |||
opcode | location | action | page |
call | label | call function | 546 |
ja | label | jump above (unsigned) | 683 |
jae | label | jump above/equal (unsigned) | 683 |
jb | label | jump below (unsigned) | 683 |
jbe | label | jump below/equal (unsigned) | 683 |
je | label | jump equal | 679 |
jg | label | jump greater than (signed) | 686 |
jge | label | jump greater than/equal (signed) | 686 |
jl | label | jump less than (signed) | 686 |
jle | label | jump less than/equal (signed) | 686 |
jmp | label | jump | 691 |
jne | label | jump not equal | 679 |
jno | label | jump no overflow | 679 |
jcc | label | jump on condition codes | 679 |
leave | undo stack frame | 580 | |
ret | return from function | 583 | |
syscall | call kernel function | 587 | |
cc = condition codes
| |||
register direct: | The data value is located in a CPU register. |
| syntax: name of the register with a “%” prefix. |
| example: movl %eax, %ebx |
immediate data: | The data value is located immediately after the instruction. Source operand only. |
| syntax: data value with a “$” prefix. |
| example: movl $0xabcd1234, %ebx |
base register plus offset: | The data value is located in memory. The address of the memory location is the sum of a value in a base register plus an offset value. |
| syntax: use the name of the register with parentheses around the name and the offset value immediately before the left parenthesis. |
| example: movl $0xaabbccdd, 12(%eax) |
rip-relative: | The target is a memory address determined by adding an offset to the current address in the rip register. |
| syntax: a programmer-defined label |
| example: je somePlace |
(§11.2) Enter the program in Listing 11.6. Single-step through the program with gdb and record the changes in the rsp and rip registers and the changes in the stack on paper. Use drawings similar to Figure 11.3.
Note: Each of the two functions should be in its own source file. You can single-step into the subfunction with gdb at the call instruction in main, then single-step back into main at the ret instruction in addConst.
(§11.2) Enter the C program in Listing 11.4. Using the “-S” compiler option, compile it with differing levels of optimization, i.e., “-O1, -O2, -O3,” and discuss the assembly language that is generated. Is the optimized code easier or more difficult to read?
(§11.2, §10.1) Write the function, writeStr, in assembly language. The function takes one argument, a char *, which is a pointer to a C-style text string. It displays the text string on the screen. It returns the number of characters displayed.
Demonstrate that your function works correctly by writing a main function that calls writeStr to display “Hello world” on the screen.
Note that the main function will not do anything with the character count that is returned by writeStr.
(§11.2, §10.1) Write the function, readLn, in assembly language. The function takes one argument, a char *, which is a pointer to a char array for storing a text string. It reads characters from the keyboard and stores them in the array as a C-style text string. It does not store the ’\n’ character. It returns the number of characters, excluding the NUL character, that were stored in the array.
Demonstrate that your function works correctly by writing a main function that prompts the user to enter a text string, then echoes the user’s input.
When testing your program, be careful not to enter more characters than the allocated space. Explain what would occur if you did enter too many characters.
Note that the main function will not do anything with the character count that is returned by readLn.
(§11.2, §10.1) Write a program in assembly language that
prompts the user to enter any text string,
reads the entered text string, and
echoes the user’s input.
Use the writeStr function from Exercise 11-3 and the readLn function from Exercise 11-4 to implement the user interface in this program.
(§11.2, §10.1) Modify the readLn function in Exercise 11-4 so that it takes a second argument, the maximum length of the text string, including the NULL character. Excess characters entered by the user are discarded.