11 Writing Your Own Functions

Chapter 11
Writing Your Own Functions

Good software engineering practice generally includes breaking problems down into functionally distinct subproblems. This leads to software solutions with many functions, each of which solves a subproblem. This “divide and conquer” approach has some distinct advantages:

It is easier to solve a small subproblem.
Previous solutions to subproblems are often reusable.
Several people can be working on diﬀerent parts of the overall problems simultaneously.

The main disadvantage of breaking a problem down like this is coordinating the many subsolutions so that they work together correctly to provide a correct overall solution. In software, this translates to making sure that the interface between a calling function and a called function works correctly. In order to ensure correct operation of the interface, it must be speciﬁed in a very explicit way.

In Chapter 8 you learned how to pass arguments into a function and call it. In this chapter you will learn how to use these arguments inside the called function.

11.1 Overview of Passing Arguments

Be careful to distinguish data input/output to/from a called function from user input/output. User input typically comes from an input device (keyboard, mouse, etc.) and user output is typically sent to an output device (screen, printer, speaker, etc.).

Functions can interact with the data in other parts of the program in three ways:

Input. The data comes from another part of the program and is used by the function, but is not modiﬁed by it.
Output. The function provides new data to another part of the program.
Update. The function modiﬁes a data item that is held by another part of the program. The new value is based on the value before the function was called.

All three interactions can be performed if the called function also knows the location of the data item. This can be done by the calling function passing the address to the called function or by making the address globally known to both functions. Updates require that the address be known by the called function.

Outputs can also be implemented by placing the new data item in a location that is accessible to both the called and the calling function. In C/C++ this is done by placing the return value from a function in the eax register. And inputs can be implemented by passing a copy of the data item to the called function. In both of these cases the called function does not know the location of the original data item, and thus does not have access to it.

In addition to global data, C syntax allows three ways for functions to exchange data:

Pass by value — an input value is passed by making a copy of it available to the function.
Return value — an output value can be returned to the calling function.
Pass by pointer — an output value can be stored for the calling function by passing the address where the output value should be stored to the called function. This can also be used to update a data item.

The last method, pass by pointer, can also be used to pass large inputs, or to pass inputs that should be changed — also called updates. It is also the method by which C++ implements pass by reference.

When one function calls another, the information that is required to provide the interface between the two is called an activation record. Since both the registers and the call stack are common to all the functions within a program, both the calling function and the called function have access to them. So arguments can be passed either in registers or on the call stack. Of course, the called function must know exactly where each of the arguments is located when program ﬂow transfers to it.

In principle, the locations of arguments need only be consistent within a program. As long as all the programmers working on the program observe the same rules, everything should work. However, designing a good set of rules for any real-world project is a very time-consuming process. Fortunately, the ABI [25] for the x86-64 architecture speciﬁes a good set of rules. They rules are very tedious because they are meant to cover all possible situations. In this book we will consider only the simpler rules in order to get an overall picture of how this works.

In 64-bit mode six of the general purpose registers and a portion of the call stack are used for the activation record. The area of the stack used for the activation record is called a stack frame. Within any function, the stack frame contains the following information:

Arguments (in excess of six) passed from the calling function.
The return address back to the calling function.
The calling function’s frame pointer.
Local variables for the current function.

and often includes:

Copies of arguments passed in registers.
Copies of values in the registers that must be preserved by a function — rbx, r12 – r15.

Some general memory usage rules (64-bit mode) are:

Each argument is passed within an 8-byte unit. For example, passing three char values requires three registers. This 8-byte rule also applies to arguments passed on the stack.
Local variables can be allocated to take up only the amount of memory they require. For example, three char values can be accommodated in a three-byte memory area.
The address in the frame pointer (rbp register) must always be a multiple of sixteen. It should never be changed within a function, except during the prologue and epilogue.
The address in the stack pointer (rsp register) must always be a multiple of sixteen before transferring program ﬂow to another function.

We can see how this works by studying the program in Listing 11.1.

1/*
2 * addProg.c
3 * Adds two integers
4 * Bob Plantz - 13 June 2009
5 */
6
7#include <stdio.h>
8#include "sumInts1.h"
9
10int main(void)
11{
12    int x, y, z;
13    int overflow;
14
15    printf("Enter two integers: ");
16    scanf("%i %i", &x, &y);
17    overflow = sumInts(x, y, &z);
18    printf("%i + %i = %i\n", x, y, z);
19    if (overflow)
20        printf("*** Overflow occurred ***\n");
21
22    return 0;
23}

1/*
2 * sumInts1.h
3 * Returns N + (N-1) + ... + 1
4 * Bob Plantz - 4 June 2008
5 */
6
7#ifndef SUMINTS1_H
8#define SUMINTS1_H
9int sumInts(int, int, int *);
10#endif

1/*
2 * sumInts1.c
3 * Adds two integers and outputs their sum.
4 * Returns 0 if no overflow, else returns 1.
5 * Bob Plantz - 13 June 2009
6 */
7
8#include "sumInts1.h"
9
10int sumInts(int a, int b, int *sum)
11{
12    int overflow = 0;   // assume no overflow
13
14    *sum = a + b;
15
16    if (((a > 0) && (b > 0) && (*sum < 0)) ||
17            ((a < 0) && (b < 0) && (*sum > 0)))
18    {
19        overflow = 1;
20    }
21    return overflow;
22}

Listing 11.1: Passing arguments to a function (C). (There are three ﬁles here.)

The compiler-generated assembly language for the sumInts function is shown in Listing 11.2 with comments added.

1        .file  "sumInts1.c"
2        .text
3        .globlsumInts
4        .type  sumInts, @function
5sumInts:
6        pushq  %rbp
7        movq  %rsp, %rbp
8        movl  %edi, -20(%rbp)   # save a
9        movl  %esi, -24(%rbp)   # save b
10        movq  %rdx, -32(%rbp)   # save pointer to sum
11        movl  $0, -4(%rbp)      # overflow = 0;
12        movl  -24(%rbp), %eax   # load b
13        movl  -20(%rbp), %edx   # load a
14        addl  %eax, %edx        # add them
15        movq  -32(%rbp), %rax   # load address of sum
16        movl  %edx, (%rax)      # *sum = a + b;
17        cmpl  $0, -20(%rbp)
18        jle    .L2
19        cmpl  $0, -24(%rbp)
20        jle    .L2
21        movq  -32(%rbp), %rax
22        movl  (%rax), %eax
23        testl  %eax, %eax
24        js    .L3
25.L2:
26        cmpl  $0, -20(%rbp)
27        jns    .L4
28        cmpl  $0, -24(%rbp)
29        jns    .L4
30        movq  -32(%rbp), %rax
31        movl  (%rax), %eax
32        testl  %eax, %eax
33        jle    .L4
34.L3:
35        movl  $1, -4(%rbp)
36.L4:
37        movl  -4(%rbp), %eax   # return overflow;
38        popq  %rbp
39        ret
40        .size  sumInts, .-sumInts
41        .ident"GCC: (Ubuntu/Linaro 4.7.0-7ubuntu3) 4.7.0"
42        .section      .note.GNU-stack,"",@progbits

Listing 11.2: Accessing arguments in the sumInts function from Listing 11.1 (gcc assembly language).

As we go through this description, it is very easy to confuse the frame pointer (rbp register) and the stack pointer (rsp register). They each are used to access diﬀerent areas of the stack.

The frame pointer (rbp register) remains unchanged. It is used to access the area of the stack that belongs to the current function, including local variables and arguments passed into the current function.
The stack pointer (rsp register) can be changed. It is used to create a new stack frame for a function about to be called, including storing the return address and passing arguments beyond the ﬁrst six.

After saving the caller’s frame pointer and establishing its own frame pointer, this function stores the argument values in the local variable area:

5sumInts:
6        pushq  %rbp
7        movq  %rsp, %rbp
8        movl  %edi, -20(%rbp)  # save a
9        movl  %esi, -24(%rbp)  # save b
10        movq  %rdx, -32(%rbp)  # save pointer to sum
11        movl  $0, -4(%rbp)     # overflow = 0;

The arguments are in the following registers (see Table 8.2, page 548):

a is in edi.
b is in esi.
The pointer to sum is in rdx.

Storing them in the local variable area frees up the registers so they can be used in this function. Although this is not very eﬃcient, the compiler does not need to work very hard to optimize register usage within the function. The only local variable, overflow, is initialized on line 11.

The observant reader will note that no memory has been allocated on the stack for local variables or saving the arguments. The ABI [25] deﬁnes the 128 bytes beyond the stack pointer — that is, the 128 bytes at addresses lower than the one in the rsp register — as a red zone. The operating system is not allowed to use this area, so the function can use it for temporary storage of values that do not need to be saved when another function is called. In particular, leaf functions can store local variables in this area without moving the stack pointer because they do not call other functions.

Notice that both the argument save area and the local variable area are aligned on 16-byte address boundaries. Figure 11.1 provides a pictorial view of where the three arguments and the local variable are in the red zone.

Figure 11.1: Arguments and local variables in the stack frame, sumInts function. The two input values and the address for the output are passed in registers, then stored in the Argument Save Area by the called function. Since this is a leaf function, the Red Zone is used for this function’s stack frame.

As you know, some functions take a variable number of arguments. In these functions, the ABI [25] speciﬁes the relative oﬀsets of the register save area. The oﬀsets are shown in Table 11.1.

Register	Oﬀset

rdi	0
rsi	8
rdx	16
rcx	24
r8	32
r9	40
xmm0	48
xmm1	64
…	…
xmm15	288

Table 11.1: Argument register save area in stack frame. These relative oﬀsets should be used in functions with a variable number of arguments.

One of the problems with the C version of sumInts is that it requires a separate check for overﬂow:

16sumInts:
17    if (((a > 0) && (b > 0) && (*sum < 0)) ||
18            ((a < 0) && (b < 0) && (*sum > 0)))
19    {
20        overflow = 1;
21    }

Writing the function in assembly language allows us to directly check the overﬂow ﬂag, as shown in Listing 11.3.

1# sumInts.s
2# Adds two 32-bit integers. Returns 0 if no overflow
3# else returns 1
4# Bob Plantz - 13 June 2009
5# Calling sequence:
6#       rdx <- address of output
7#       esi <- 1st int to be added
8#       edi <- 2nd int to be added
9#       call    sumInts
10#       returns 0 if no overflow, else returns 1
11# Read only data
12        .section  .rodata
13overflow:
14        .word   1
15# Code
16        .text
17        .globl sumInts
18        .type  sumInts, @function
19sumInts:
20        pushq   %rbp         # save caller’s frame pointer
21        movq    %rsp, %rbp   # establish our frame pointer
22
23        movl    $0, %eax     # assume no overflow
24        addl    %edi, %esi   # add values
25        cmovo   overflow, %eax  # overflow occurred
26        movl    %esi, (%rdx) # output sum
27
28        movq    %rbp, %rsp   # restore stack pointer
29        popq    %rbp         # restore caller’s frame pointer
30        ret

Listing 11.3: Accessing arguments in the sumInts function from Listing 11.1 (programmer assembly language)

The code to perform the addition and overﬂow check is much simpler.

23        movl    $0, %eax     # assume no overflow
24        addl    %edi, %esi   # add values
25        cmovo   overflow, %eax  # overflow occurred
26        movl    %esi, (%rdx) # output sum

The body of the function begins by assuming there will not be overﬂow, so 0 is stored in eax, ready to be the return value. The value of the ﬁrst argument is added to the second, because the programmer realizes that the values in the argument registers do not need to be saved. If this addition produces overﬂow, the cmovo instruction changes the return value to 1. Finally, in either case the sum is stored at the memory location whose address was passed to the function as the third argument.

11.2 More Than Six Arguments, 64-Bit Mode

When a calling function needs to pass more than six arguments to another function, the additional arguments beyond the ﬁrst six are passed on the call stack. They are eﬀectively pushed onto the stack in eight-byte chunks before the call. The order of pushing is from right to left in the C argument list. (As you will see shortly the compiler actually uses a more eﬃcient method than pushes.) Since these arguments are on the call stack, they are within the called function’s stack frame, so the called function can access them.

Consider the program in Listing 11.4.

1/*
2 * nineInts1.c
3 * Declares and adds nine integers.
4 * Bob Plantz - 13 June 2009
5 */
6#include <stdio.h>
7#include "sumNine1.h"
8
9int main(void)
10{
11    int total;
12    int a = 1;
13    int b = 2;
14    int c = 3;
15    int d = 4;
16    int e = 5;
17    int f = 6;
18    int g = 7;
19    int h = 8;
20    int i = 9;
21
22    total = sumNine(a, b, c, d, e, f, g, h, i);
23    printf("The sum is %i\n", total);
24    return 0;
25}

1/*
2 * sumNine1.h
3 * Computes sum of nine integers.
4 * Bob Plantz - 13 June 2009
5 */
6#ifndef SUMNINE_H
7#define SUMNINE_H
8int sumNine(int one, int two, int three, int four, int five,
9 int six, int seven, int eight, int nine);
10#endif

1/*
2 * sumNine1.c
3 * Computes sum of nine integers.
4 * Bob Plantz - 13 June 2009
5 */
6#include <stdio.h>
7#include "sumNine1.h"
8
9int sumNine(int one, int two, int three, int four, int five,
10           int six, int seven, int eight, int nine)
11{
12    int x;
13
14    x = one + two + three + four + five + six
15            + seven + eight + nine;
16    printf("sumNine done.\n");
17    return x;
18}

Listing 11.4: Passing more than six arguments to a function (C). (There are three ﬁles here.)

The assembly language generated by gcc from the program in Listing 11.4 is shown in Listing 11.5, with comments added to explain parts of the code.

1        .file  "nineInts1.c"
2        .section      .rodata
3.LC0:
4        .string"The sum is %i\n"
5        .text
6        .globlmain
7        .type  main, @function
8main:
9        pushq  %rbp
10        movq  %rsp, %rbp
11        subq  $80, %rsp
12        movl  $1, -40(%rbp)
13        movl  $2, -36(%rbp)
14        movl  $3, -32(%rbp)
15        movl  $4, -28(%rbp)
16        movl  $5, -24(%rbp)
17        movl  $6, -20(%rbp)
18        movl  $7, -16(%rbp)
19        movl  $8, -12(%rbp)
20        movl  $9, -8(%rbp)
21        movl  -20(%rbp), %r9d   # f is 6th argument
22        movl  -24(%rbp), %r8d   # e is 5th argument
23        movl  -28(%rbp), %ecx   # d is 4th argument
24        movl  -32(%rbp), %edx   # c is 3rd argument
25        movl  -36(%rbp), %esi   # b is 2nd argument
26        movl  -40(%rbp), %eax   # load a
27        movl  -8(%rbp), %edi    # load i
28        movl  %edi, 16(%rsp)    # insert on stack
29        movl  -12(%rbp), %edi   # load h
30        movl  %edi, 8(%rsp)     # insert on stack
31        movl  -16(%rbp), %edi   # load g
32        movl  %edi, (%rsp)      # insert on stack
33        movl  %eax, %edi        # a is 1st argument
34        call  sumNine
35        movl  %eax, -4(%rbp)
36        movl  -4(%rbp), %eax
37        movl  %eax, %esi
38        movl  $.LC0, %edi
39        movl  $0, %eax
40        call  printf
41        movl  $0, %eax
42        leave
43        ret
44        .size  main, .-main
45        .ident"GCC: (Ubuntu/Linaro 4.7.0-7ubuntu3) 4.7.0"
46        .section      .note.GNU-stack,"",@progbits

1        .file  "sumNine1.c"
2        .section      .rodata
3.LC0:
4        .string"sumNine done."
5        .text
6        .globlsumNine
7        .type  sumNine, @function
8sumNine:
9        pushq  %rbp
10        movq  %rsp, %rbp
11        subq  $48, %rsp
12        movl  %edi, -20(%rbp)  # save one
13        movl  %esi, -24(%rbp)  # save two
14        movl  %edx, -28(%rbp)  # save three
15        movl  %ecx, -32(%rbp)  # save four
16        movl  %r8d, -36(%rbp)  # save five
17        movl  %r9d, -40(%rbp)  # save six
18        movl  -24(%rbp), %eax  # load two
19        movl  -20(%rbp), %edx  # load one, subtotal
20        addl  %eax, %edx       # add two to subtotal
21        movl  -28(%rbp), %eax  # load three
22        addl  %eax, %edx       # add to subtotal
23        movl  -32(%rbp), %eax  # load four
24        addl  %eax, %edx       # add to subtotal
25        movl  -36(%rbp), %eax  # load five
26        addl  %eax, %edx       # add to subtotal
27        movl  -40(%rbp), %eax  # load six
28        addl  %eax, %edx       # add to subtotal
29        movl  16(%rbp), %eax   # load seven
30        addl  %eax, %edx       # add to subtotal
31        movl  24(%rbp), %eax   # load eight
32        addl  %eax, %edx       # add to subtotal
33        movl  32(%rbp), %eax   # load nine
34        addl  %edx, %eax       # add to subtotal
35        movl  %eax, -4(%rbp)   # x <- total
36        movl  $.LC0, %edi
37        call  puts
38        movl  -4(%rbp), %eax
39        leave
40        ret
41        .size  sumNine, .-sumNine
42        .ident"GCC: (Ubuntu/Linaro 4.7.0-7ubuntu3) 4.7.0"
43        .section      .note.GNU-stack,"",@progbits

Listing 11.5: Passing more than six arguments to a function (gcc assembly language). (There are two ﬁles here.)

Before main calls sumNine the values of the second through sixth arguments, b – f, are moved to the appropriate registers, and the ﬁrst argument, a is loaded into a temporary register:

21        movl  -20(%rbp), %r9d   # f is 6th argument
22        movl  -24(%rbp), %r8d   # e is 5th argument
23        movl  -28(%rbp), %ecx   # d is 4th argument
24        movl  -32(%rbp), %edx   # c is 3rd argument
25        movl  -36(%rbp), %esi   # b is 2nd argument
26        movl  -40(%rbp), %eax   # load a

The the values of the seventh, eighth, and ninth arguments, g – i, are moved to their appropriate locations on the call stack. Enough space was allocated at the beginning of the function to allow for these arguments. They are moved into their correct locations on lines 27 – 32:

27        movl  -8(%rbp), %edi    # load i
28        movl  %edi, 16(%rsp)    # insert on stack
29        movl  -12(%rbp), %edi   # load h
30        movl  %edi, 8(%rsp)     # insert on stack
31        movl  -16(%rbp), %edi   # load g
32        movl  %edi, (%rsp)      # insert on stack

The stack pointer, rsp, is used as the reference point for storing the arguments on the stack here because the main function is starting a new stack frame for the function it is about to call, sumNine.

Then the ﬁrst argument, a, is moved to the appropriate register:

33 movl %eax, %edi # a is 1st argument

When program control is transferred to the sumNine function, the partial stack frame appears as shown in Figure 11.2. Even though each argument is only four bytes (int), each is passed in an 8-byte portion of stack memory. Compare this with passing arguments in registers; only one data item is passed per register even if the data item does not take up the entire eight bytes in the register.

Figure 11.2: Arguments 7 – 9 are passed on the stack to the sumNine function. State of the stack when control is ﬁrst transfered to this function.

The return address is at the top of the stack, immediately followed by the three arguments (beyond the six passed in registers). Notice that each argument is in the same position on the stack as it would have been if it had been pushed onto the stack just before the call instruction. Since the address in the stack pointer (rsp) was 16-byte aligned before the call to this function, and the call instruction pushed the 8-byte return address onto the stack, the address in rsp is now 8-byte aligned.

The prologue of sumNine completes the stack frame. Then the function saves the register arguments in the register save area of the stack frame:

9        pushq  %rbp
10        movq  %rsp, %rbp
11        subq  $48, %rsp
12        movl  %edi, -20(%rbp)  # save one
13        movl  %esi, -24(%rbp)  # save two
14        movl  %edx, -28(%rbp)  # save three
15        movl  %ecx, -32(%rbp)  # save four
16        movl  %r8d, -36(%rbp)  # save five
17        movl  %r9d, -40(%rbp)  # save six

The state of the stack frame at this point is shown in Figure 11.3.

Figure 11.3: Arguments and local variables in the stack frame, sumNine function. The ﬁrst six arguments are passed in registers but saved in the stack frame. Arguments beyond six are passed in the portion of the stack frame that is created by the calling function.

You may question why the compiler did not simply use the red zone. The sumNine function is not a leaf function. It calls another function, which may require use of the call stack. So space must be explicitly allocated on the call stack for local variables and the register argument save areas.

By the way, the compiler has replaced this function call, a call to printf, with a call to puts:

36 movl $.LC0, %edi
37 call puts

Since the only thing to be written to the screen is a text string, the puts function is equivalent.

After the register arguments are safely stored in the argument save area, they can be easily summed and the total saved in the local variable:

18        movl  -24(%rbp), %eax  # load two
19        movl  -20(%rbp), %edx  # load one, subtotal
20        addl  %eax, %edx       # add two to subtotal
21        movl  -28(%rbp), %eax  # load three
22        addl  %eax, %edx       # add to subtotal
23        movl  -32(%rbp), %eax  # load four
24        addl  %eax, %edx       # add to subtotal
25        movl  -36(%rbp), %eax  # load five
26        addl  %eax, %edx       # add to subtotal
27        movl  -40(%rbp), %eax  # load six
28        addl  %eax, %edx       # add to subtotal
29        movl  16(%rbp), %eax   # load seven
30        addl  %eax, %edx       # add to subtotal
31        movl  24(%rbp), %eax   # load eight
32        addl  %eax, %edx       # add to subtotal
33        movl  32(%rbp), %eax   # load nine
34        addl  %edx, %eax       # add to subtotal
35        movl  %eax, -4(%rbp)   # x <- total

Notice that the seventh, eighth, and ninth arguments are accessed by positive oﬀsets from the frame pointer, rbp. They were stored in the stack frame by the calling function. The called function “owns” the entire stack frame so it does not need to make additional copies of these arguments.

It is important to realize that once the stack frame has been completed within a function, that area of the call stack cannot be treated as a stack. That is, it cannot be accessed through pushes and pops. It must be treated as a record. (You will learn more about records in Section 13.2, page 802.)

If we were to recompile these functions with higher levels of optimization, many of these assembly language operations would be removed (see Exercise 11-2). But the point here is to examine the mechanisms that can be used to work with arguments and to write easily read code, so we study the unoptimized code.

A version of this program written in assembly language is shown in Listing 11.6.

1# nineInts2.s
2# Demonstrate how integral arguments are passed in 64-bit mode.
3# Bob Plantz - 13 June 2009
4# Bob Plantz - 5 November 2013 - deleted unneeded register usage (lines 48 - 53)
5
6# Stack frame
7#   passing arguments on stack (rsp)
8#     need 3x8 = 24 -> 32 bytes
9        .equ    seventh,0
10        .equ    eighth,8
11        .equ    ninth,16
12#   local vars (rbp)
13#     need 10x4 = 40 -> 48 bytes
14        .equ    i,-4
15        .equ    h,-8
16        .equ    g,-12
17        .equ    f,-16
18        .equ    e,-20
19        .equ    d,-24
20        .equ    c,-28
21        .equ    b,-32
22        .equ    a,-36
23        .equ    total,-40
24        .equ    localSize,-80
25# Read only data
26        .section  .rodata
27format:
28        .string "The sum is %i\n"
29# Code
30        .text
31        .globl  main
32        .type   main, @function
33main:
34        pushq  %rbp              # save caller’s base pointer
35        movq  %rsp, %rbp        # establish ours
36        addq  $localSize, %rsp  # space for local variables
37                                  #  + argument passing
38        movl  $1, a(%rbp)       # initialize local variables
39        movl  $2, b(%rbp)       # etc...
40        movl  $3, c(%rbp)
41        movl  $4, d(%rbp)
42        movl  $5, e(%rbp)
43        movl  $6, f(%rbp)
44        movl  $7, g(%rbp)
45        movl  $8, h(%rbp)
46        movl  $9, i(%rbp)
47
48        movl  i(%rbp), %eax       # load i
49        movl  %eax, ninth(%rsp)   #   9th argument
50        movl  h(%rbp), %eax       # load h
51        movl  %eax, eighth(%rsp)  #   8th argument
52        movl  g(%rbp), %eax       # load g
53        movl  %eax, seventh(%rsp) #   7th argument
54        movl  f(%rbp), %r9d       # f is 6th
55        movl  e(%rbp), %r8d       # e is 5th
56        movl  d(%rbp), %ecx       # d is 4th
57        movl  c(%rbp), %edx       # c is 3rd
58        movl  b(%rbp), %esi       # b is 2nd
59        movl  a(%rbp), %edi       # a is 1st
60        call  sumNine
61        movl  %eax, total(%rbp)   # total = nineInts(...)
62
63        movl    total(%rbp), %esi
64        movl    $format, %edi
65        movl    $0, %eax
66        call    printf
67
68        movl  $0, %eax          # return 0;
69        movq    %rbp, %rsp        # delete locals
70        popq    %rbp              # restore caller’s base pointer
71        ret                       # back to OS

1# sumNine2.s
2# Sums nine integer arguments and returns the total.
3# Bob Plantz - 13 June 2009
4
5# Stack frame
6#    arguments already in stack frame
7        .equ    seven,16
8        .equ    eight,24
9        .equ    nine,32
10#    local variables
11        .equ    total,-4
12        .equ    localSize,-16
13# Read only data
14        .section  .rodata
15doneMsg:
16        .string"sumNine done"
17# Code
18        .text
19        .globl  sumNine
20        .type   sumNine, @function
21sumNine:
22        pushq  %rbp               # save caller’s base pointer
23        movq  %rsp, %rbp         # set our base pointer
24        addq    $localSize, %rsp   # for local variables
25
26        addl  %esi, %edi         # add two to one
27        addl  %ecx, %edi         # plus three
28        addl  %edx, %edi         # plus four
29        addl  %r8d, %edi         # plus five
30        addl  %r9d, %edi         # plus six
31        addl  seven(%rbp), %edi  # plus seven
32        addl  eight(%rbp), %edi  # plus eight
33        addl  nine(%rbp), %edi   # plus nine
34        movl  %edi, total(%rbp)  # save total
35
36        movl  $doneMsg, %edi
37        call  puts
38
39        movl    total(%rbp), %eax  # return total;
40        movq    %rbp, %rsp         # delete local vars.
41        popq    %rbp               # restore caller’s base pointer
42        ret

Listing 11.6: Passing more than six arguments to a function (programmer assembly language). (There are two ﬁles here.)

The assembly language programmer realizes that all nine integers can be summed in the sumNine function before it calls another function. In addition, none of the values will be needed after this summation. So there is no reason to store the register arguments locally:

26        addl  %esi, %edi         # add two to one
27        addl  %ecx, %edi         # plus three
28        addl  %edx, %edi         # plus four
29        addl  %r8d, %edi         # plus five
30        addl  %r9d, %edi         # plus six
31        addl  seven(%rbp), %edi  # plus seven
32        addl  eight(%rbp), %edi  # plus eight
33        addl  nine(%rbp), %edi   # plus nine

However, the edi register will be needed for passing an argument to puts, so the total is saved in a local variable in the stack frame:

34 movl %edi, total(%rbp) # save total

Then it is loaded into eax for return to the calling function:

39 movl total(%rbp), %eax # return total;

The overall pattern of a stack frame is shown in Figure 11.4. The rbp register serves as the frame pointer to the stack frame. Once the frame pointer address has been established in a function, its value must never be changed. The return address is always located +8 bytes oﬀset from the frame pointer. Arguments to the function are positive oﬀsets from the frame pointer, and local variables are negative oﬀsets from the frame pointer.

Figure 11.4: Overall layout of the stack frame.

It is essential that you follow the register usage and argument passing disciplines precisely. Any deviation can cause errors that are very diﬃcult to debug.

In the calling function:
1. Assume that the values in the rax, rcx, rdx, rsi, rdi and r8 – r11 registers will be changed by the called function.
2. The ﬁrst six arguments are passed in the rdi, rsi, rdx, rcx, r8, and r9 registers in left-to-right order.
3. Arguments beyond six are stored on the stack as though they had been pushed onto the stack in right-to-left order.
4. Use the call instruction to invoke the function you wish to call.
Upon entering the called function:
1. Save the caller’s frame pointer by pushing rbp onto the stack.
2. Establish a new frame pointer at the current top of stack by copying rsp to rbp.
3. Allocate space on the stack for all the local variables, plus any required register save space, by subtracting the number of bytes required from rsp; this value must be a multiple of sixteen.
4. If a called function changes any of the values in the rbx, rbp, rsp, or r12 – r15 registers, they must be saved in the register save area, then restored before returning to the calling function.
5. If the function calls another function, save the arguments passed in registers on the stack.
Within the called function:
1. rsp is pointing to the current bottom of the stack that is accessible to this function. Observe the usual stack discipline (see §8.2). In particular, DO NOT use the stack pointer to access arguments or local variables.
2. Arguments passed in registers to the function and saved on the stack are accessed by negative oﬀsets from the frame pointer, rbp.
3. Arguments passed on the stack to the function are accessed by positive oﬀsets from the frame pointer, rbp.
4. Local variables are accessed by negative oﬀsets from the frame pointer, rbp.
When leaving the called function:
1. Place the return value, if any, in eax.
2. Restore the values in the rbx, rbp, rsp, and r12 – r15 registers from the register save area in the stack frame.
3. Delete the local variable space and register save area by copying rbp to rsp.
4. Restore the caller’s frame pointer by popping rbp oﬀ the stack save area.
5. Return to calling function with ret.

The best way to design a stack frame for a function is to make a drawing on paper following the pattern in Figure 11.3. Show all the local variables and arguments to the function. To be safe, assume that all the register-passed arguments will be saved in the function. Compute and write down all the oﬀset values on your drawing. When writing the source code for your function, use the .equ directive to give meaningful names to each of the numerical oﬀsets. If you do this planning before writing the executable code, you can simply use the name(%rbp) syntax to access the value stored at name .

11.3 Interface Between Functions, 32-Bit Mode

In 32-bit mode, all arguments are passed on the call stack. The 32-bit assembly language generated by gcc is shown in Listing 11.7.

1        .file  "nineInts1.c"
2        .section      .rodata
3.LC0:
4        .string"The sum is %i\n"
5        .text
6        .globlmain
7        .type  main, @function
8main:
9        pushl  %ebp
10        movl  %esp, %ebp
11        andl  $-16, %esp
12        subl  $96, %esp
13        movl  $1, 56(%esp)
14        movl  $2, 60(%esp)
15        movl  $3, 64(%esp)
16        movl  $4, 68(%esp)
17        movl  $5, 72(%esp)
18        movl  $6, 76(%esp)
19        movl  $7, 80(%esp)
20        movl  $8, 84(%esp)
21        movl  $9, 88(%esp)
22        movl  88(%esp), %eax   # load i
23        movl  %eax, 32(%esp)   # store in stack frame
24        movl  84(%esp), %eax   # load h
25        movl  %eax, 28(%esp)   # store in stack frame
26        movl  80(%esp), %eax   # load g
27        movl  %eax, 24(%esp)   # etc....
28        movl  76(%esp), %eax   # load f
29        movl  %eax, 20(%esp)
30        movl  72(%esp), %eax   # load e
31        movl  %eax, 16(%esp)
32        movl  68(%esp), %eax   # load d
33        movl  %eax, 12(%esp)
34        movl  64(%esp), %eax   # load c
35        movl  %eax, 8(%esp)
36        movl  60(%esp), %eax   # load b
37        movl  %eax, 4(%esp)
38        movl  56(%esp), %eax   # load a
39        movl  %eax, (%esp)     # store in stack frame
40        call  sumNine
41        movl  %eax, 92(%esp)   # total <- sum
42        movl  92(%esp), %eax
43        movl  %eax, 4(%esp)
44        movl  $.LC0, (%esp)
45        call  printf
46        movl  $0, %eax
47        leave
48        ret
49        .size  main, .-main
50        .ident"GCC: (Ubuntu/Linaro 4.7.0-7ubuntu3) 4.7.0"
51        .section      .note.GNU-stack,"",@progbits

1        .file  "sumNine1.c"
2        .section      .rodata
3.LC0:
4        .string"sumNine done."
5        .text
6        .globlsumNine
7        .type  sumNine, @function
8sumNine:
9        pushl  %ebp
10        movl  %esp, %ebp
11        subl  $40, %esp
12        movl  12(%ebp), %eax   # load two
13        movl  8(%ebp), %edx    # load one, subtotal
14        addl  %eax, %edx       # add two
15        movl  16(%ebp), %eax   # load three
16        addl  %eax, %edx       # add to subtotal
17        movl  20(%ebp), %eax   # load four
18        addl  %eax, %edx       # etc...
19        movl  24(%ebp), %eax   # load five
20        addl  %eax, %edx
21        movl  28(%ebp), %eax   # load six
22        addl  %eax, %edx
23        movl  32(%ebp), %eax   # load seven
24        addl  %eax, %edx
25        movl  36(%ebp), %eax   # load eight
26        addl  %eax, %edx
27        movl  40(%ebp), %eax   # load nine
28        addl  %edx, %eax       # total
29        movl  %eax, -12(%ebp)  # x <- total
30        movl  $.LC0, (%esp)
31        call  puts
32        movl  -12(%ebp), %eax  # return x;
33        leave
34        ret
35        .size  sumNine, .-sumNine
36        .ident"GCC: (Ubuntu/Linaro 4.7.0-7ubuntu3) 4.7.0"
37        .section      .note.GNU-stack,"",@progbits

Listing 11.7: Passing more than six arguments to a function (gcc assembly language, 32-bit). (There are two ﬁles here.)

The argument passing sequence can be seen on lines 22 – 39 in the main function. Rather than pushing each argument onto the stack, the compiler has used the technique of allocating space on the stack for the arguments, then storing each argument directly in the appropriate location. The result is the same as if they had been pushed onto the stack, but the direct storage technique is more eﬃcient.

I ﬁnd it odd that the compiler writer has chosen to set up a base pointer in ebp but not used it in this function. This is NOT a recommended technique when writing in assembly language.

The state of the call stack just before calling the nineInts function is shown in Figure 11.5. Comparing this with the 64-bit version in Figure 11.3, we see that the local variables are treated in essentially the same way. But the 32-bit version diﬀers in the way it passes arguments:

All the arguments are passed on the call stack, none in registers.
Arguments are passed in 4-byte blocks.

Figure 11.5: Calling function’s stack frame, 32-bit mode. Local variables are accessed relative to the frame pointer (ebp register). In this example, they are all 4-byte values. Arguments are accessed relative to the stack pointer (esp register). Arguments are passed in 4-byte blocks.

11.4 Instructions Introduced Thus Far

This summary shows the assembly language instructions introduced thus far in the book. The page number where the instruction is explained in more detail, which may be in a subsequent chapter, is also given. This book provides only an introduction to the usage of each instruction. You need to consult the manuals ([2] – [6], [14] – [18]) in order to learn all the possible uses of the instructions.

11.4.1 Instructions

data movement:
opcode	source	destination	action	page

cbtw			convert byte to word, al → ax	696

cwtl			convert word to long, ax → eax	696

cltq			convert long to quad, eax → rax	696

cmovcc	%reg/mem	%reg	conditional move	706

movs	$imm/%reg	%reg/mem	move	506

movs	mem	%reg	move	506

movsss	$imm/%reg	%reg/mem	move, sign extend	693

movzss	$imm/%reg	%reg/mem	move, zero extend	693

popw		%reg/mem	pop from stack	566

pushw	$imm/%reg/mem		push onto stack	566


s = b, w, l, q; w = l, q; cc = condition codes

arithmetic/logic:
opcode	source	destination	action	page

adds	$imm/%reg	%reg/mem	add	607

adds	mem	%reg	add	607

cmps	$imm/%reg	%reg/mem	compare	676

cmps	mem	%reg	compare	676

decs	%reg/mem		decrement	699

incs	%reg/mem		increment	698

leaw	mem	%reg	load eﬀective address	579

subs	$imm/%reg	%reg/mem	subtract	612

subs	mem	%reg	subtract	612

tests	$imm/%reg	%reg/mem	test bits	676

tests	mem	%reg	test bits	676


s = b, w, l, q; w = l, q

program ﬂow control:
opcode	location	action	page

call	label	call function	546

ja	label	jump above (unsigned)	683

jae	label	jump above/equal (unsigned)	683

jb	label	jump below (unsigned)	683

jbe	label	jump below/equal (unsigned)	683

je	label	jump equal	679

jg	label	jump greater than (signed)	686

jge	label	jump greater than/equal (signed)	686

jl	label	jump less than (signed)	686

jle	label	jump less than/equal (signed)	686

jmp	label	jump	691

jne	label	jump not equal	679

jno	label	jump no overﬂow	679

jcc	label	jump on condition codes	679

leave		undo stack frame	580

ret		return from function	583

syscall		call kernel function	587


cc = condition codes

11.4.2 Addressing Modes

__________________________________________________________

register direct:	The data value is located in a CPU register.
	syntax: name of the register with a “%” preﬁx.
	example: movl %eax, %ebx

immediate data:	The data value is located immediately after the instruction. Source operand only.
	syntax: data value with a “$” preﬁx.
	example: movl $0xabcd1234, %ebx

base register plus oﬀset:	The data value is located in memory. The address of the memory location is the sum of a value in a base register plus an oﬀset value.
	syntax: use the name of the register with parentheses around the name and the oﬀset value immediately before the left parenthesis.
	example: movl $0xaabbccdd, 12(%eax)

rip-relative:	The target is a memory address determined by adding an oﬀset to the current address in the rip register.
	syntax: a programmer-deﬁned label
	example: je somePlace

11.5 Exercises

11-1

(§11.2) Enter the program in Listing 11.6. Single-step through the program with gdb and record the changes in the rsp and rip registers and the changes in the stack on paper. Use drawings similar to Figure 11.3.

Note: Each of the two functions should be in its own source ﬁle. You can single-step into the subfunction with gdb at the call instruction in main, then single-step back into main at the ret instruction in addConst.

11-2

(§11.2) Enter the C program in Listing 11.4. Using the “-S” compiler option, compile it with diﬀering levels of optimization, i.e., “-O1, -O2, -O3,” and discuss the assembly language that is generated. Is the optimized code easier or more diﬃcult to read?

11-3

(§11.2, §10.1) Write the function, writeStr, in assembly language. The function takes one argument, a char *, which is a pointer to a C-style text string. It displays the text string on the screen. It returns the number of characters displayed.

Demonstrate that your function works correctly by writing a main function that calls writeStr to display “Hello world” on the screen.

Note that the main function will not do anything with the character count that is returned by writeStr.

11-4

(§11.2, §10.1) Write the function, readLn, in assembly language. The function takes one argument, a char *, which is a pointer to a char array for storing a text string. It reads characters from the keyboard and stores them in the array as a C-style text string. It does not store the ’\n’ character. It returns the number of characters, excluding the NUL character, that were stored in the array.

Demonstrate that your function works correctly by writing a main function that prompts the user to enter a text string, then echoes the user’s input.

When testing your program, be careful not to enter more characters than the allocated space. Explain what would occur if you did enter too many characters.

Note that the main function will not do anything with the character count that is returned by readLn.

11-5

(§11.2, §10.1) Write a program in assembly language that

a): prompts the user to enter any text string,
b): reads the entered text string, and
c): echoes the user’s input.

Use the writeStr function from Exercise 11-3 and the readLn function from Exercise 11-4 to implement the user interface in this program.

11-6

(§11.2, §10.1) Modify the readLn function in Exercise 11-4 so that it takes a second argument, the maximum length of the text string, including the NULL character. Excess characters entered by the user are discarded.

[next] [prev] [prev-tail] [front] [up]

Chapter 11Writing Your Own Functions