Chapter 11
Writing Your Own Functions

Good software engineering practice generally includes breaking problems down into functionally distinct subproblems. This leads to software solutions with many functions, each of which solves a subproblem. This “divide and conquer” approach has some distinct advantages:

The main disadvantage of breaking a problem down like this is coordinating the many subsolutions so that they work together correctly to provide a correct overall solution. In software, this translates to making sure that the interface between a calling function and a called function works correctly. In order to ensure correct operation of the interface, it must be specified in a very explicit way.

In Chapter 8 you learned how to pass arguments into a function and call it. In this chapter you will learn how to use these arguments inside the called function.

11.1 Overview of Passing Arguments

Be careful to distinguish data input/output to/from a called function from user input/output. User input typically comes from an input device (keyboard, mouse, etc.) and user output is typically sent to an output device (screen, printer, speaker, etc.).

Functions can interact with the data in other parts of the program in three ways:

  1. Input. The data comes from another part of the program and is used by the function, but is not modified by it.
  2. Output. The function provides new data to another part of the program.
  3. Update. The function modifies a data item that is held by another part of the program. The new value is based on the value before the function was called.

All three interactions can be performed if the called function also knows the location of the data item. This can be done by the calling function passing the address to the called function or by making the address globally known to both functions. Updates require that the address be known by the called function.

Outputs can also be implemented by placing the new data item in a location that is accessible to both the called and the calling function. In C/C++ this is done by placing the return value from a function in the eax register. And inputs can be implemented by passing a copy of the data item to the called function. In both of these cases the called function does not know the location of the original data item, and thus does not have access to it.

In addition to global data, C syntax allows three ways for functions to exchange data:

The last method, pass by pointer, can also be used to pass large inputs, or to pass inputs that should be changed — also called updates. It is also the method by which C++ implements pass by reference.

When one function calls another, the information that is required to provide the interface between the two is called an activation record. Since both the registers and the call stack are common to all the functions within a program, both the calling function and the called function have access to them. So arguments can be passed either in registers or on the call stack. Of course, the called function must know exactly where each of the arguments is located when program flow transfers to it.

In principle, the locations of arguments need only be consistent within a program. As long as all the programmers working on the program observe the same rules, everything should work. However, designing a good set of rules for any real-world project is a very time-consuming process. Fortunately, the ABI [25] for the x86-64 architecture specifies a good set of rules. They rules are very tedious because they are meant to cover all possible situations. In this book we will consider only the simpler rules in order to get an overall picture of how this works.

In 64-bit mode six of the general purpose registers and a portion of the call stack are used for the activation record. The area of the stack used for the activation record is called a stack frame. Within any function, the stack frame contains the following information:

and often includes:

Some general memory usage rules (64-bit mode) are:

We can see how this works by studying the program in Listing 11.1.

 
1/* 
2 * addProg.c 
3 * Adds two integers 
4 * Bob Plantz - 13 June 2009 
5 */ 
6 
7#include <stdio.h> 
8#include "sumInts1.h" 
9 
10int main(void) 
11{ 
12    int x, y, z; 
13    int overflow; 
14 
15    printf("Enter two integers: "); 
16    scanf("%i %i", &x, &y); 
17    overflow = sumInts(x, y, &z); 
18    printf("%i + %i = %i\n", x, y, z); 
19    if (overflow) 
20        printf("*** Overflow occurred ***\n"); 
21 
22    return 0; 
23}
 
1/* 
2 * sumInts1.h 
3 * Returns N + (N-1) + ... + 1 
4 * Bob Plantz - 4 June 2008 
5 */ 
6 
7#ifndef SUMINTS1_H 
8#define SUMINTS1_H 
9int sumInts(int, int, int *); 
10#endif
 
1/* 
2 * sumInts1.c 
3 * Adds two integers and outputs their sum. 
4 * Returns 0 if no overflow, else returns 1. 
5 * Bob Plantz - 13 June 2009 
6 */ 
7 
8#include "sumInts1.h" 
9 
10int sumInts(int a, int b, int *sum) 
11{ 
12    int overflow = 0;   // assume no overflow 
13 
14    *sum = a + b; 
15 
16    if (((a > 0) && (b > 0) && (*sum < 0)) || 
17            ((a < 0) && (b < 0) && (*sum > 0))) 
18    { 
19        overflow = 1; 
20    } 
21    return overflow; 
22}
Listing 11.1: Passing arguments to a function (C). (There are three files here.)

The compiler-generated assembly language for the sumInts function is shown in Listing 11.2 with comments added.

 
1        .file  "sumInts1.c" 
2        .text 
3        .globlsumInts 
4        .type  sumInts, @function 
5sumInts: 
6        pushq  %rbp 
7        movq  %rsp, %rbp 
8        movl  %edi, -20(%rbp)   # save a 
9        movl  %esi, -24(%rbp)   # save b 
10        movq  %rdx, -32(%rbp)   # save pointer to sum 
11        movl  $0, -4(%rbp)      # overflow = 0; 
12        movl  -24(%rbp), %eax   # load b 
13        movl  -20(%rbp), %edx   # load a 
14        addl  %eax, %edx        # add them 
15        movq  -32(%rbp), %rax   # load address of sum 
16        movl  %edx, (%rax)      # *sum = a + b; 
17        cmpl  $0, -20(%rbp) 
18        jle    .L2 
19        cmpl  $0, -24(%rbp) 
20        jle    .L2 
21        movq  -32(%rbp), %rax 
22        movl  (%rax), %eax 
23        testl  %eax, %eax 
24        js    .L3 
25.L2: 
26        cmpl  $0, -20(%rbp) 
27        jns    .L4 
28        cmpl  $0, -24(%rbp) 
29        jns    .L4 
30        movq  -32(%rbp), %rax 
31        movl  (%rax), %eax 
32        testl  %eax, %eax 
33        jle    .L4 
34.L3: 
35        movl  $1, -4(%rbp) 
36.L4: 
37        movl  -4(%rbp), %eax   # return overflow; 
38        popq  %rbp 
39        ret 
40        .size  sumInts, .-sumInts 
41        .ident"GCC: (Ubuntu/Linaro 4.7.0-7ubuntu3) 4.7.0" 
42        .section      .note.GNU-stack,"",@progbits
Listing 11.2: Accessing arguments in the sumInts function from Listing 11.1 (gcc assembly language).
As we go through this description, it is very easy to confuse the frame pointer (rbp register) and the stack pointer (rsp register). They each are used to access different areas of the stack.

After saving the caller’s frame pointer and establishing its own frame pointer, this function stores the argument values in the local variable area:

5sumInts: 
6        pushq  %rbp 
7        movq  %rsp, %rbp 
8        movl  %edi, -20(%rbp)  # save a 
9        movl  %esi, -24(%rbp)  # save b 
10        movq  %rdx, -32(%rbp)  # save pointer to sum 
11        movl  $0, -4(%rbp)     # overflow = 0;

The arguments are in the following registers (see Table 8.2, page 548):

Storing them in the local variable area frees up the registers so they can be used in this function. Although this is not very efficient, the compiler does not need to work very hard to optimize register usage within the function. The only local variable, overflow, is initialized on line 11.

The observant reader will note that no memory has been allocated on the stack for local variables or saving the arguments. The ABI [25] defines the 128 bytes beyond the stack pointer — that is, the 128 bytes at addresses lower than the one in the rsp register — as a red zone. The operating system is not allowed to use this area, so the function can use it for temporary storage of values that do not need to be saved when another function is called. In particular, leaf functions can store local variables in this area without moving the stack pointer because they do not call other functions.

Notice that both the argument save area and the local variable area are aligned on 16-byte address boundaries. Figure 11.1 provides a pictorial view of where the three arguments and the local variable are in the red zone.


PIC

Figure 11.1: Arguments and local variables in the stack frame, sumInts function. The two input values and the address for the output are passed in registers, then stored in the Argument Save Area by the called function. Since this is a leaf function, the Red Zone is used for this function’s stack frame.


As you know, some functions take a variable number of arguments. In these functions, the ABI [25] specifies the relative offsets of the register save area. The offsets are shown in Table 11.1.


Register Offset


rdi 0
rsi 8
rdx 16
rcx 24
r8 32
r9 40
xmm0 48
xmm1 64
xmm15 288

Table 11.1: Argument register save area in stack frame. These relative offsets should be used in functions with a variable number of arguments.

One of the problems with the C version of sumInts is that it requires a separate check for overflow:

16sumInts: 
17    if (((a > 0) && (b > 0) && (*sum < 0)) || 
18            ((a < 0) && (b < 0) && (*sum > 0))) 
19    { 
20        overflow = 1; 
21    }

Writing the function in assembly language allows us to directly check the overflow flag, as shown in Listing 11.3.

 
1# sumInts.s 
2# Adds two 32-bit integers. Returns 0 if no overflow 
3# else returns 1 
4# Bob Plantz - 13 June 2009 
5# Calling sequence: 
6#       rdx <- address of output 
7#       esi <- 1st int to be added 
8#       edi <- 2nd int to be added 
9#       call    sumInts 
10#       returns 0 if no overflow, else returns 1 
11# Read only data 
12        .section  .rodata 
13overflow: 
14        .word   1 
15# Code 
16        .text 
17        .globl sumInts 
18        .type  sumInts, @function 
19sumInts: 
20        pushq   %rbp         # save callers frame pointer 
21        movq    %rsp, %rbp   # establish our frame pointer 
22 
23        movl    $0, %eax     # assume no overflow 
24        addl    %edi, %esi   # add values 
25        cmovo   overflow, %eax  # overflow occurred 
26        movl    %esi, (%rdx) # output sum 
27 
28        movq    %rbp, %rsp   # restore stack pointer 
29        popq    %rbp         # restore callers frame pointer 
30        ret
Listing 11.3: Accessing arguments in the sumInts function from Listing 11.1 (programmer assembly language)

The code to perform the addition and overflow check is much simpler.

23        movl    $0, %eax     # assume no overflow 
24        addl    %edi, %esi   # add values 
25        cmovo   overflow, %eax  # overflow occurred 
26        movl    %esi, (%rdx) # output sum

The body of the function begins by assuming there will not be overflow, so 0 is stored in eax, ready to be the return value. The value of the first argument is added to the second, because the programmer realizes that the values in the argument registers do not need to be saved. If this addition produces overflow, the cmovo instruction changes the return value to 1. Finally, in either case the sum is stored at the memory location whose address was passed to the function as the third argument.

11.2 More Than Six Arguments, 64-Bit Mode

When a calling function needs to pass more than six arguments to another function, the additional arguments beyond the first six are passed on the call stack. They are effectively pushed onto the stack in eight-byte chunks before the call. The order of pushing is from right to left in the C argument list. (As you will see shortly the compiler actually uses a more efficient method than pushes.) Since these arguments are on the call stack, they are within the called function’s stack frame, so the called function can access them.

Consider the program in Listing 11.4.

 
1/* 
2 * nineInts1.c 
3 * Declares and adds nine integers. 
4 * Bob Plantz - 13 June 2009 
5 */ 
6#include <stdio.h> 
7#include "sumNine1.h" 
8 
9int main(void) 
10{ 
11    int total; 
12    int a = 1; 
13    int b = 2; 
14    int c = 3; 
15    int d = 4; 
16    int e = 5; 
17    int f = 6; 
18    int g = 7; 
19    int h = 8; 
20    int i = 9; 
21 
22    total = sumNine(a, b, c, d, e, f, g, h, i); 
23    printf("The sum is %i\n", total); 
24    return 0; 
25}
 
1/* 
2 * sumNine1.h 
3 * Computes sum of nine integers. 
4 * Bob Plantz - 13 June 2009 
5 */ 
6#ifndef SUMNINE_H 
7#define SUMNINE_H 
8int sumNine(int one, int two, int three, int four, int five, 
9           int six, int seven, int eight, int nine); 
10#endif
 
1/* 
2 * sumNine1.c 
3 * Computes sum of nine integers. 
4 * Bob Plantz - 13 June 2009 
5 */ 
6#include <stdio.h> 
7#include "sumNine1.h" 
8 
9int sumNine(int one, int two, int three, int four, int five, 
10           int six, int seven, int eight, int nine) 
11{ 
12    int x; 
13 
14    x = one + two + three + four + five + six 
15            + seven + eight + nine; 
16    printf("sumNine done.\n"); 
17    return x; 
18}
Listing 11.4: Passing more than six arguments to a function (C). (There are three files here.)

The assembly language generated by gcc from the program in Listing 11.4 is shown in Listing 11.5, with comments added to explain parts of the code.

 
1        .file  "nineInts1.c" 
2        .section      .rodata 
3.LC0: 
4        .string"The sum is %i\n" 
5        .text 
6        .globlmain 
7        .type  main, @function 
8main: 
9        pushq  %rbp 
10        movq  %rsp, %rbp 
11        subq  $80, %rsp 
12        movl  $1, -40(%rbp) 
13        movl  $2, -36(%rbp) 
14        movl  $3, -32(%rbp) 
15        movl  $4, -28(%rbp) 
16        movl  $5, -24(%rbp) 
17        movl  $6, -20(%rbp) 
18        movl  $7, -16(%rbp) 
19        movl  $8, -12(%rbp) 
20        movl  $9, -8(%rbp) 
21        movl  -20(%rbp), %r9d   # f is 6th argument 
22        movl  -24(%rbp), %r8d   # e is 5th argument 
23        movl  -28(%rbp), %ecx   # d is 4th argument 
24        movl  -32(%rbp), %edx   # c is 3rd argument 
25        movl  -36(%rbp), %esi   # b is 2nd argument 
26        movl  -40(%rbp), %eax   # load a 
27        movl  -8(%rbp), %edi    # load i 
28        movl  %edi, 16(%rsp)    # insert on stack 
29        movl  -12(%rbp), %edi   # load h 
30        movl  %edi, 8(%rsp)     # insert on stack 
31        movl  -16(%rbp), %edi   # load g 
32        movl  %edi, (%rsp)      # insert on stack 
33        movl  %eax, %edi        # a is 1st argument 
34        call  sumNine 
35        movl  %eax, -4(%rbp) 
36        movl  -4(%rbp), %eax 
37        movl  %eax, %esi 
38        movl  $.LC0, %edi 
39        movl  $0, %eax 
40        call  printf 
41        movl  $0, %eax 
42        leave 
43        ret 
44        .size  main, .-main 
45        .ident"GCC: (Ubuntu/Linaro 4.7.0-7ubuntu3) 4.7.0" 
46        .section      .note.GNU-stack,"",@progbits
 
1        .file  "sumNine1.c" 
2        .section      .rodata 
3.LC0: 
4        .string"sumNine done." 
5        .text 
6        .globlsumNine 
7        .type  sumNine, @function 
8sumNine: 
9        pushq  %rbp 
10        movq  %rsp, %rbp 
11        subq  $48, %rsp 
12        movl  %edi, -20(%rbp)  # save one 
13        movl  %esi, -24(%rbp)  # save two 
14        movl  %edx, -28(%rbp)  # save three 
15        movl  %ecx, -32(%rbp)  # save four 
16        movl  %r8d, -36(%rbp)  # save five 
17        movl  %r9d, -40(%rbp)  # save six 
18        movl  -24(%rbp), %eax  # load two 
19        movl  -20(%rbp), %edx  # load one, subtotal 
20        addl  %eax, %edx       # add two to subtotal 
21        movl  -28(%rbp), %eax  # load three 
22        addl  %eax, %edx       # add to subtotal 
23        movl  -32(%rbp), %eax  # load four 
24        addl  %eax, %edx       # add to subtotal 
25        movl  -36(%rbp), %eax  # load five 
26        addl  %eax, %edx       # add to subtotal 
27        movl  -40(%rbp), %eax  # load six 
28        addl  %eax, %edx       # add to subtotal 
29        movl  16(%rbp), %eax   # load seven 
30        addl  %eax, %edx       # add to subtotal 
31        movl  24(%rbp), %eax   # load eight 
32        addl  %eax, %edx       # add to subtotal 
33        movl  32(%rbp), %eax   # load nine 
34        addl  %edx, %eax       # add to subtotal 
35        movl  %eax, -4(%rbp)   # x <- total 
36        movl  $.LC0, %edi 
37        call  puts 
38        movl  -4(%rbp), %eax 
39        leave 
40        ret 
41        .size  sumNine, .-sumNine 
42        .ident"GCC: (Ubuntu/Linaro 4.7.0-7ubuntu3) 4.7.0" 
43        .section      .note.GNU-stack,"",@progbits
Listing 11.5: Passing more than six arguments to a function (gcc assembly language). (There are two files here.)

Before main calls sumNine the values of the second through sixth arguments, b f, are moved to the appropriate registers, and the first argument, a is loaded into a temporary register:

21        movl  -20(%rbp), %r9d   # f is 6th argument 
22        movl  -24(%rbp), %r8d   # e is 5th argument 
23        movl  -28(%rbp), %ecx   # d is 4th argument 
24        movl  -32(%rbp), %edx   # c is 3rd argument 
25        movl  -36(%rbp), %esi   # b is 2nd argument 
26        movl  -40(%rbp), %eax   # load a

The the values of the seventh, eighth, and ninth arguments, g i, are moved to their appropriate locations on the call stack. Enough space was allocated at the beginning of the function to allow for these arguments. They are moved into their correct locations on lines 27 – 32:

27        movl  -8(%rbp), %edi    # load i 
28        movl  %edi, 16(%rsp)    # insert on stack 
29        movl  -12(%rbp), %edi   # load h 
30        movl  %edi, 8(%rsp)     # insert on stack 
31        movl  -16(%rbp), %edi   # load g 
32        movl  %edi, (%rsp)      # insert on stack

The stack pointer, rsp, is used as the reference point for storing the arguments on the stack here because the main function is starting a new stack frame for the function it is about to call, sumNine.

Then the first argument, a, is moved to the appropriate register:

33        movl  %eax, %edi        # a is 1st argument

When program control is transferred to the sumNine function, the partial stack frame appears as shown in Figure 11.2. Even though each argument is only four bytes (int), each is passed in an 8-byte portion of stack memory. Compare this with passing arguments in registers; only one data item is passed per register even if the data item does not take up the entire eight bytes in the register.


PIC

Figure 11.2: Arguments 7 – 9 are passed on the stack to the sumNine function. State of the stack when control is first transfered to this function.


The return address is at the top of the stack, immediately followed by the three arguments (beyond the six passed in registers). Notice that each argument is in the same position on the stack as it would have been if it had been pushed onto the stack just before the call instruction. Since the address in the stack pointer (rsp) was 16-byte aligned before the call to this function, and the call instruction pushed the 8-byte return address onto the stack, the address in rsp is now 8-byte aligned.

The prologue of sumNine completes the stack frame. Then the function saves the register arguments in the register save area of the stack frame:

9        pushq  %rbp 
10        movq  %rsp, %rbp 
11        subq  $48, %rsp 
12        movl  %edi, -20(%rbp)  # save one 
13        movl  %esi, -24(%rbp)  # save two 
14        movl  %edx, -28(%rbp)  # save three 
15        movl  %ecx, -32(%rbp)  # save four 
16        movl  %r8d, -36(%rbp)  # save five 
17        movl  %r9d, -40(%rbp)  # save six

The state of the stack frame at this point is shown in Figure 11.3.


PIC

Figure 11.3: Arguments and local variables in the stack frame, sumNine function. The first six arguments are passed in registers but saved in the stack frame. Arguments beyond six are passed in the portion of the stack frame that is created by the calling function.


You may question why the compiler did not simply use the red zone. The sumNine function is not a leaf function. It calls another function, which may require use of the call stack. So space must be explicitly allocated on the call stack for local variables and the register argument save areas.

By the way, the compiler has replaced this function call, a call to printf, with a call to puts:

36        movl  $.LC0, %edi 
37        call  puts

Since the only thing to be written to the screen is a text string, the puts function is equivalent.

After the register arguments are safely stored in the argument save area, they can be easily summed and the total saved in the local variable:

18        movl  -24(%rbp), %eax  # load two 
19        movl  -20(%rbp), %edx  # load one, subtotal 
20        addl  %eax, %edx       # add two to subtotal 
21        movl  -28(%rbp), %eax  # load three 
22        addl  %eax, %edx       # add to subtotal 
23        movl  -32(%rbp), %eax  # load four 
24        addl  %eax, %edx       # add to subtotal 
25        movl  -36(%rbp), %eax  # load five 
26        addl  %eax, %edx       # add to subtotal 
27        movl  -40(%rbp), %eax  # load six 
28        addl  %eax, %edx       # add to subtotal 
29        movl  16(%rbp), %eax   # load seven 
30        addl  %eax, %edx       # add to subtotal 
31        movl  24(%rbp), %eax   # load eight 
32        addl  %eax, %edx       # add to subtotal 
33        movl  32(%rbp), %eax   # load nine 
34        addl  %edx, %eax       # add to subtotal 
35        movl  %eax, -4(%rbp)   # x <- total

Notice that the seventh, eighth, and ninth arguments are accessed by positive offsets from the frame pointer, rbp. They were stored in the stack frame by the calling function. The called function “owns” the entire stack frame so it does not need to make additional copies of these arguments.

It is important to realize that once the stack frame has been completed within a function, that area of the call stack cannot be treated as a stack. That is, it cannot be accessed through pushes and pops. It must be treated as a record. (You will learn more about records in Section 13.2, page 802.)

If we were to recompile these functions with higher levels of optimization, many of these assembly language operations would be removed (see Exercise 11-2). But the point here is to examine the mechanisms that can be used to work with arguments and to write easily read code, so we study the unoptimized code.

A version of this program written in assembly language is shown in Listing 11.6.

 
1# nineInts2.s 
2# Demonstrate how integral arguments are passed in 64-bit mode. 
3# Bob Plantz - 13 June 2009 
4# Bob Plantz - 5 November 2013 - deleted unneeded register usage (lines 48 - 53) 
5 
6# Stack frame 
7#   passing arguments on stack (rsp) 
8#     need 3x8 = 24 -> 32 bytes 
9        .equ    seventh,0 
10        .equ    eighth,8 
11        .equ    ninth,16 
12#   local vars (rbp) 
13#     need 10x4 = 40 -> 48 bytes 
14        .equ    i,-4 
15        .equ    h,-8 
16        .equ    g,-12 
17        .equ    f,-16 
18        .equ    e,-20 
19        .equ    d,-24 
20        .equ    c,-28 
21        .equ    b,-32 
22        .equ    a,-36 
23        .equ    total,-40 
24        .equ    localSize,-80 
25# Read only data 
26        .section  .rodata 
27format: 
28        .string "The sum is %i\n" 
29# Code 
30        .text 
31        .globl  main 
32        .type   main, @function 
33main: 
34        pushq  %rbp              # save callers base pointer 
35        movq  %rsp, %rbp        # establish ours 
36        addq  $localSize, %rsp  # space for local variables 
37                                  #  + argument passing 
38        movl  $1, a(%rbp)       # initialize local variables 
39        movl  $2, b(%rbp)       # etc... 
40        movl  $3, c(%rbp) 
41        movl  $4, d(%rbp) 
42        movl  $5, e(%rbp) 
43        movl  $6, f(%rbp) 
44        movl  $7, g(%rbp) 
45        movl  $8, h(%rbp) 
46        movl  $9, i(%rbp) 
47 
48        movl  i(%rbp), %eax       # load i 
49        movl  %eax, ninth(%rsp)   #   9th argument 
50        movl  h(%rbp), %eax       # load h 
51        movl  %eax, eighth(%rsp)  #   8th argument 
52        movl  g(%rbp), %eax       # load g 
53        movl  %eax, seventh(%rsp) #   7th argument 
54        movl  f(%rbp), %r9d       # f is 6th 
55        movl  e(%rbp), %r8d       # e is 5th 
56        movl  d(%rbp), %ecx       # d is 4th 
57        movl  c(%rbp), %edx       # c is 3rd 
58        movl  b(%rbp), %esi       # b is 2nd 
59        movl  a(%rbp), %edi       # a is 1st 
60        call  sumNine 
61        movl  %eax, total(%rbp)   # total = nineInts(...) 
62 
63        movl    total(%rbp), %esi 
64        movl    $format, %edi 
65        movl    $0, %eax 
66        call    printf 
67 
68        movl  $0, %eax          # return 0; 
69        movq    %rbp, %rsp        # delete locals 
70        popq    %rbp              # restore callers base pointer 
71        ret                       # back to OS
 
1# sumNine2.s 
2# Sums nine integer arguments and returns the total. 
3# Bob Plantz - 13 June 2009 
4 
5# Stack frame 
6#    arguments already in stack frame 
7        .equ    seven,16 
8        .equ    eight,24 
9        .equ    nine,32 
10#    local variables 
11        .equ    total,-4 
12        .equ    localSize,-16 
13# Read only data 
14        .section  .rodata 
15doneMsg: 
16        .string"sumNine done" 
17# Code 
18        .text 
19        .globl  sumNine 
20        .type   sumNine, @function 
21sumNine: 
22        pushq  %rbp               # save callers base pointer 
23        movq  %rsp, %rbp         # set our base pointer 
24        addq    $localSize, %rsp   # for local variables 
25 
26        addl  %esi, %edi         # add two to one 
27        addl  %ecx, %edi         # plus three 
28        addl  %edx, %edi         # plus four 
29        addl  %r8d, %edi         # plus five 
30        addl  %r9d, %edi         # plus six 
31        addl  seven(%rbp), %edi  # plus seven 
32        addl  eight(%rbp), %edi  # plus eight 
33        addl  nine(%rbp), %edi   # plus nine 
34        movl  %edi, total(%rbp)  # save total 
35 
36        movl  $doneMsg, %edi 
37        call  puts 
38 
39        movl    total(%rbp), %eax  # return total; 
40        movq    %rbp, %rsp         # delete local vars. 
41        popq    %rbp               # restore callers base pointer 
42        ret
Listing 11.6: Passing more than six arguments to a function (programmer assembly language). (There are two files here.)

The assembly language programmer realizes that all nine integers can be summed in the sumNine function before it calls another function. In addition, none of the values will be needed after this summation. So there is no reason to store the register arguments locally:

26        addl  %esi, %edi         # add two to one 
27        addl  %ecx, %edi         # plus three 
28        addl  %edx, %edi         # plus four 
29        addl  %r8d, %edi         # plus five 
30        addl  %r9d, %edi         # plus six 
31        addl  seven(%rbp), %edi  # plus seven 
32        addl  eight(%rbp), %edi  # plus eight 
33        addl  nine(%rbp), %edi   # plus nine

However, the edi register will be needed for passing an argument to puts, so the total is saved in a local variable in the stack frame:

34        movl  %edi, total(%rbp)  # save total

Then it is loaded into eax for return to the calling function:

39        movl    total(%rbp), %eax  # return total;

The overall pattern of a stack frame is shown in Figure 11.4. The rbp register serves as the frame pointer to the stack frame. Once the frame pointer address has been established in a function, its value must never be changed. The return address is always located +8 bytes offset from the frame pointer. Arguments to the function are positive offsets from the frame pointer, and local variables are negative offsets from the frame pointer.


PIC

Figure 11.4: Overall layout of the stack frame.


It is essential that you follow the register usage and argument passing disciplines precisely. Any deviation can cause errors that are very difficult to debug.

  1. In the calling function:
    1. Assume that the values in the rax, rcx, rdx, rsi, rdi and r8 r11 registers will be changed by the called function.
    2. The first six arguments are passed in the rdi, rsi, rdx, rcx, r8, and r9 registers in left-to-right order.
    3. Arguments beyond six are stored on the stack as though they had been pushed onto the stack in right-to-left order.
    4. Use the call instruction to invoke the function you wish to call.
  2. Upon entering the called function:
    1. Save the caller’s frame pointer by pushing rbp onto the stack.
    2. Establish a new frame pointer at the current top of stack by copying rsp to rbp.
    3. Allocate space on the stack for all the local variables, plus any required register save space, by subtracting the number of bytes required from rsp; this value must be a multiple of sixteen.
    4. If a called function changes any of the values in the rbx, rbp, rsp, or r12 r15 registers, they must be saved in the register save area, then restored before returning to the calling function.
    5. If the function calls another function, save the arguments passed in registers on the stack.
  3. Within the called function:
    1. rsp is pointing to the current bottom of the stack that is accessible to this function. Observe the usual stack discipline (see §8.2). In particular, DO NOT use the stack pointer to access arguments or local variables.
    2. Arguments passed in registers to the function and saved on the stack are accessed by negative offsets from the frame pointer, rbp.
    3. Arguments passed on the stack to the function are accessed by positive offsets from the frame pointer, rbp.
    4. Local variables are accessed by negative offsets from the frame pointer, rbp.
  4. When leaving the called function:
    1. Place the return value, if any, in eax.
    2. Restore the values in the rbx, rbp, rsp, and r12 r15 registers from the register save area in the stack frame.
    3. Delete the local variable space and register save area by copying rbp to rsp.
    4. Restore the caller’s frame pointer by popping rbp off the stack save area.
    5. Return to calling function with ret.

The best way to design a stack frame for a function is to make a drawing on paper following the pattern in Figure 11.3. Show all the local variables and arguments to the function. To be safe, assume that all the register-passed arguments will be saved in the function. Compute and write down all the offset values on your drawing. When writing the source code for your function, use the .equ directive to give meaningful names to each of the numerical offsets. If you do this planning before writing the executable code, you can simply use the name(%rbp) syntax to access the value stored at name .

11.3 Interface Between Functions, 32-Bit Mode

In 32-bit mode, all arguments are passed on the call stack. The 32-bit assembly language generated by gcc is shown in Listing 11.7.

 
1        .file  "nineInts1.c" 
2        .section      .rodata 
3.LC0: 
4        .string"The sum is %i\n" 
5        .text 
6        .globlmain 
7        .type  main, @function 
8main: 
9        pushl  %ebp 
10        movl  %esp, %ebp 
11        andl  $-16, %esp 
12        subl  $96, %esp 
13        movl  $1, 56(%esp) 
14        movl  $2, 60(%esp) 
15        movl  $3, 64(%esp) 
16        movl  $4, 68(%esp) 
17        movl  $5, 72(%esp) 
18        movl  $6, 76(%esp) 
19        movl  $7, 80(%esp) 
20        movl  $8, 84(%esp) 
21        movl  $9, 88(%esp) 
22        movl  88(%esp), %eax   # load i 
23        movl  %eax, 32(%esp)   # store in stack frame 
24        movl  84(%esp), %eax   # load h 
25        movl  %eax, 28(%esp)   # store in stack frame 
26        movl  80(%esp), %eax   # load g 
27        movl  %eax, 24(%esp)   # etc.... 
28        movl  76(%esp), %eax   # load f 
29        movl  %eax, 20(%esp) 
30        movl  72(%esp), %eax   # load e 
31        movl  %eax, 16(%esp) 
32        movl  68(%esp), %eax   # load d 
33        movl  %eax, 12(%esp) 
34        movl  64(%esp), %eax   # load c 
35        movl  %eax, 8(%esp) 
36        movl  60(%esp), %eax   # load b 
37        movl  %eax, 4(%esp) 
38        movl  56(%esp), %eax   # load a 
39        movl  %eax, (%esp)     # store in stack frame 
40        call  sumNine 
41        movl  %eax, 92(%esp)   # total <- sum 
42        movl  92(%esp), %eax 
43        movl  %eax, 4(%esp) 
44        movl  $.LC0, (%esp) 
45        call  printf 
46        movl  $0, %eax 
47        leave 
48        ret 
49        .size  main, .-main 
50        .ident"GCC: (Ubuntu/Linaro 4.7.0-7ubuntu3) 4.7.0" 
51        .section      .note.GNU-stack,"",@progbits
 
1        .file  "sumNine1.c" 
2        .section      .rodata 
3.LC0: 
4        .string"sumNine done." 
5        .text 
6        .globlsumNine 
7        .type  sumNine, @function 
8sumNine: 
9        pushl  %ebp 
10        movl  %esp, %ebp 
11        subl  $40, %esp 
12        movl  12(%ebp), %eax   # load two 
13        movl  8(%ebp), %edx    # load one, subtotal 
14        addl  %eax, %edx       # add two 
15        movl  16(%ebp), %eax   # load three 
16        addl  %eax, %edx       # add to subtotal 
17        movl  20(%ebp), %eax   # load four 
18        addl  %eax, %edx       # etc... 
19        movl  24(%ebp), %eax   # load five 
20        addl  %eax, %edx 
21        movl  28(%ebp), %eax   # load six 
22        addl  %eax, %edx 
23        movl  32(%ebp), %eax   # load seven 
24        addl  %eax, %edx 
25        movl  36(%ebp), %eax   # load eight 
26        addl  %eax, %edx 
27        movl  40(%ebp), %eax   # load nine 
28        addl  %edx, %eax       # total 
29        movl  %eax, -12(%ebp)  # x <- total 
30        movl  $.LC0, (%esp) 
31        call  puts 
32        movl  -12(%ebp), %eax  # return x; 
33        leave 
34        ret 
35        .size  sumNine, .-sumNine 
36        .ident"GCC: (Ubuntu/Linaro 4.7.0-7ubuntu3) 4.7.0" 
37        .section      .note.GNU-stack,"",@progbits
Listing 11.7: Passing more than six arguments to a function (gcc assembly language, 32-bit). (There are two files here.)

The argument passing sequence can be seen on lines 22 – 39 in the main function. Rather than pushing each argument onto the stack, the compiler has used the technique of allocating space on the stack for the arguments, then storing each argument directly in the appropriate location. The result is the same as if they had been pushed onto the stack, but the direct storage technique is more efficient.

I find it odd that the compiler writer has chosen to set up a base pointer in ebp but not used it in this function. This is NOT a recommended technique when writing in assembly language.

The state of the call stack just before calling the nineInts function is shown in Figure 11.5. Comparing this with the 64-bit version in Figure 11.3, we see that the local variables are treated in essentially the same way. But the 32-bit version differs in the way it passes arguments:


PIC

Figure 11.5: Calling function’s stack frame, 32-bit mode. Local variables are accessed relative to the frame pointer (ebp register). In this example, they are all 4-byte values. Arguments are accessed relative to the stack pointer (esp register). Arguments are passed in 4-byte blocks.


11.4 Instructions Introduced Thus Far

This summary shows the assembly language instructions introduced thus far in the book. The page number where the instruction is explained in more detail, which may be in a subsequent chapter, is also given. This book provides only an introduction to the usage of each instruction. You need to consult the manuals ([2][6], [14][18]) in order to learn all the possible uses of the instructions.

11.4.1 Instructions

data movement:
opcode source destination action page





cbtw convert byte to word, al ax 696





cwtl convert word to long, ax eax 696





cltq convert long to quad, eax rax 696





cmovcc %reg/mem %reg conditional move 706





movs $imm/%reg %reg/mem move 506





movs mem %reg move 506





movsss $imm/%reg %reg/mem move, sign extend 693





movzss $imm/%reg %reg/mem move, zero extend 693





popw %reg/mem pop from stack 566





pushw $imm/%reg/mem push onto stack 566










s = b, w, l, q; w = l, q; cc = condition codes

arithmetic/logic:
opcode source destination action page





adds $imm/%reg %reg/mem add 607





adds mem %reg add 607





cmps $imm/%reg %reg/mem compare 676





cmps mem %reg compare 676





decs %reg/mem decrement 699





incs %reg/mem increment 698





leaw mem %reg load effective address 579





subs $imm/%reg %reg/mem subtract 612





subs mem %reg subtract 612





tests $imm/%reg %reg/mem test bits 676





tests mem %reg test bits 676










s = b, w, l, q; w = l, q

program flow control:
opcode location action page




call label call function 546




ja label jump above (unsigned) 683




jae label jump above/equal (unsigned) 683




jb label jump below (unsigned) 683




jbe label jump below/equal (unsigned) 683




je label jump equal 679




jg label jump greater than (signed) 686




jge label jump greater than/equal (signed) 686




jl label jump less than (signed) 686




jle label jump less than/equal (signed) 686




jmp label jump 691




jne label jump not equal 679




jno label jump no overflow 679




jcc label jump on condition codes 679




leave undo stack frame 580




ret return from function 583




syscall call kernel function 587








cc = condition codes

11.4.2 Addressing Modes

__________________________________________________________

register direct:

The data value is located in a CPU register.

syntax: name of the register with a “%” prefix.

example: movl    %eax, %ebx



immediate data:

The data value is located immediately after the instruction. Source operand only.

syntax: data value with a “$” prefix.

example: movl    $0xabcd1234, %ebx



base register plus offset:

The data value is located in memory. The address of the memory location is the sum of a value in a base register plus an offset value.

syntax: use the name of the register with parentheses around the name and the offset value immediately before the left parenthesis.

example: movl    $0xaabbccdd, 12(%eax)



rip-relative:

The target is a memory address determined by adding an offset to the current address in the rip register.

syntax: a programmer-defined label

example: je     somePlace



11.5 Exercises

11-1

11.2) Enter the program in Listing 11.6. Single-step through the program with gdb and record the changes in the rsp and rip registers and the changes in the stack on paper. Use drawings similar to Figure 11.3.

Note: Each of the two functions should be in its own source file. You can single-step into the subfunction with gdb at the call instruction in main, then single-step back into main at the ret instruction in addConst.

11-2

11.2) Enter the C program in Listing 11.4. Using the “-S” compiler option, compile it with differing levels of optimization, i.e., “-O1, -O2, -O3,” and discuss the assembly language that is generated. Is the optimized code easier or more difficult to read?

11-3

11.2, §10.1) Write the function, writeStr, in assembly language. The function takes one argument, a char *, which is a pointer to a C-style text string. It displays the text string on the screen. It returns the number of characters displayed.

Demonstrate that your function works correctly by writing a main function that calls writeStr to display “Hello world” on the screen.

Note that the main function will not do anything with the character count that is returned by writeStr.

11-4

11.2, §10.1) Write the function, readLn, in assembly language. The function takes one argument, a char *, which is a pointer to a char array for storing a text string. It reads characters from the keyboard and stores them in the array as a C-style text string. It does not store the ’\n’ character. It returns the number of characters, excluding the NUL character, that were stored in the array.

Demonstrate that your function works correctly by writing a main function that prompts the user to enter a text string, then echoes the user’s input.

When testing your program, be careful not to enter more characters than the allocated space. Explain what would occur if you did enter too many characters.

Note that the main function will not do anything with the character count that is returned by readLn.

11-5

11.2, §10.1) Write a program in assembly language that

a)

prompts the user to enter any text string,

b)

reads the entered text string, and

c)

echoes the user’s input.

Use the writeStr function from Exercise 11-3 and the readLn function from Exercise 11-4 to implement the user interface in this program.

11-6

11.2, §10.1) Modify the readLn function in Exercise 11-4 so that it takes a second argument, the maximum length of the text string, including the NULL character. Excess characters entered by the user are discarded.