Chapter 9
Computer Operations

We are now ready to look more closely at the instructions that control the CPU. This will only be an introduction to the topic. We will examine the most common operations — assignment, addition, and subtraction. Additional operations will be described in subsequent chapters.

Each assembly language instruction must be translated into its corresponding machine code, including the locations of any data it manipulates. It is the bit pattern of the machine code that directs the activities of the control unit.

The goal here is to show you that a computer performs its operations based on bit patterns. As you read through this material, keep in mind that even though this material is quite tedious, the operations are very simple. Fortunately, instruction execution is very fast, so lots of meaningful work can be done by the computer.

9.1 The Assignment Operator

The C/C++ assignment operator,=”, causes the expression on the right-hand side of the operator to be evaluated and the result to be associated with the variable that is named on the left-hand side. Subsequent uses of the variable name in the program will evaluate to this same value. For example,

     int x; 
     ..... 
     x = 123;

will assign the integer 123 to the variable x. If x is later used in an expression, the value assigned to x will be used in evaluating the expression. For example, the expression

    2 * x;

would evaluate to 246.

This assumes that the expression on the right-hand side evaluates to the same data type as the variable on the left-hand side. If not, some automatic type casting may occur, or the compiler may indicate an error. We ignore the issue of data type for now and will discuss it at several points when appropriate. For now, we are working with arbitrary bit patterns that have no meaning as “data.”

We now explore what assignment means at the assembly language level. The variable declaration,

     int x;

causes memory to be allocated and the location of that memory to be given the name “x.” That is, other parts of the program can refer to the memory location where the value of x is stored by using the name “x.” The type name in the declaration, int, tells the compiler how many bytes to allocate and the code used to represent the data stored at this location. The int type uses the two’s complement code. The assignment statement,

     x = 123;

sets the bit pattern in the location named x to 0x0000007b, the two’s complement code for the integer 123. The assignment statement

     x = -123;

sets the bit pattern in the location named, x to 0xffffff85, the two’s complement code for the integer -123.

Let us consider the simplest case where

That is, we will consider a program that simply sets a bit pattern in a CPU register. A C program to do this is shown in Listing 9.1.

 
1/* 
2 * assignment1.c 
3 * Assign a 32-bit pattern to a register 
4 * 
5 * Bob Plantz - 11 June 2009 
6 */ 
7 
8#include <stdio.h> 
9 
10int main(void) 
11{ 
12    register int x; 
13 
14    x = 0xabcd1234; 
15 
16    printf("x = %i\n", x); 
17 
18    return 0; 
19}
Listing 9.1: Assignment to a register variable (C).

The register modifier “advises” the compiler to use a CPU register for the integer variable named “x.” And the notation 0xabcd1234 means that abcd1234 is written in hexadecimal. (Recall that hexadecimal is used as a compact notation for representing bit patterns.) When the C program in Listing 9.1 is compiled into its assembly language equivalent with no optimization:

     bob$ gcc -S  -O0 -fno-asynchronous-unwind-tables assignment1.c
the gcc compiler generates the assembly language program shown in Listing 9.2, with a comment added to show where the assignment operation takes place.
 
1        .file  "assignment1.c" 
2        .section      .rodata 
3.LC0: 
4        .string"x = %i\n" 
5        .text 
6        .globlmain 
7        .type  main, @function 
8main: 
9        pushq  %rbp 
10        movq  %rsp, %rbp 
11        pushq  %rbx 
12        subq  $8, %rsp 
13        movl  $-1412623820, %ebx   # x = 0xabcd1234; 
14        movl  %ebx, %esi 
15        movl  $.LC0, %edi 
16        movl  $0, %eax 
17        call  printf 
18        movl  $0, %eax 
19        addq  $8, %rsp 
20        popq  %rbx 
21        popq  %rbp 
22        ret 
23        .size  main, .-main 
24        .ident"GCC: (Ubuntu/Linaro 4.7.0-7ubuntu3) 4.7.0" 
25        .section      .note.GNU-stack,"",@progbits
Listing 9.2: Assignment to a register variable (gcc assembly language). Comment added to show the assignment operation.

The C assignment operation is implemented with the mov instruction. For example, in Listing 9.1,

14    x = 0xabcd1234;

is implemented with

13     movl    $-1412623820, %ebx   # x = 0xabcd1234;

on line 13 in Listing 9.2. We can see that the compiler chose to use the ebx register as the x variable.

The mov instruction has an “l” (“ell”, not “one”) appended to it to indicate that the operand size is 32 bits. This is redundant because the register named as an operand, ebx, is 32 bits, but it is the required syntax. The Intel syntax does not include this redundancy. If we consider the Intel syntax:

Intel® Syntax

mov esi, -1412623820

we see the three other differences noted in Section 7.2.2 (page 509):

These differences are specific to the assembler program being used and are not relevant to the behavior of the CPU. The assembler program will translate the assembly language instruction into the correct machine language code.

The instructions on lines 14 – 17 implement the call to the printf function. One reason for the call to the printf function is to prevent the compiler from eliminating the assignment statement during its optimization of this function. Yes, even with the -O0 option the compiler does some optimization.

Compare the prologue with that of the null program in Listing 7.4 on page 512. Notice that the prologue

8main: 
9        pushq  %rbp 
10        movq  %rsp, %rbp 
11        pushq  %rbx 
12        subq  $8, %rsp

of this function includes saving the contents of a register and an adjustment to keep the stack pointer aligned on a sixteen-byte memory address boundary. However, the epilogue:

19        addq  $8, %rsp 
20        popq  %rbx 
21        popq  %rbp

differs. In the epilogue, we need to restore the stack pointer, and restore any registers we saved on the stack, before restoring the calling function’s base pointer.

You may wonder why the gcc compiler assigns the constant -1412623820 to the variable, while the C version of the program assigns 0xabcd1234. The answer is that they are the same values. The first is expressed in decimal and the second in hexadecimal. We discussed the equivalence of decimal and hexadecimal in Section 2.2 (page 23), and we discussed signed decimal integers in Section 3.3 (page 87).

In Listing 9.3 we show the essential assembly language required to implement the C program from Listing 9.1.

 
1# assignment2.s 
2# Assigns a 32-bit pattern to the esi register. 
3# Bob Plantz - 11 June 2009 
4 
5        .text 
6        .globl  main 
7        .type   main, @function 
8main: 
9        pushq   %rbp        # save callers base pointer 
10        movq    %rsp, %rbp  # establish our base pointer 
11        pushq   %rbx        # save reg. 
12 
13        movl    $0xabcd1234, %ebx # store a bit pattern in ebx 
14 
15        popq    %rbx        # restore reg. 
16        movl    $0, %eax    # return 0 to caller 
17        movq    %rbp, %rsp  # restore stack pointer 
18        popq    %rbp        # restore callers base pointer 
19        ret                 # back to caller
Listing 9.3: Assignment to a register variable (programmer assembly language).

Compare Listing 9.3 to the general pattern in Listing 8.13 on page 587. Note that the single instruction,

13        movl    $0xabcd1234, %ebx # store a bit pattern in ebx

is the only “data processing” performed by this function. From this comparison, you can see that this assembly language statement implements the two C statements:

    register int x; 
    x = 0xabcd1234;

Like the compiler (Listing 9.2), we are using the ebx register as our variable. We can use the registers in Table 6.4 (page 469) as variables, except the stack pointer, %rsp, which has special uses. The “%” prefix tells the assembler that these are names of registers, hence in the CPU and not labels on memory locations.

Let us look more closely at the program in Listing 9.3. I used an editor to enter the code then assembled and linked it. Since it does not produce a display on the screen, I used gdb to observe the changes in the registers.

When using gdb to examine programs written in assembly language, another variant of the break command may be helpful. The version of gdb I used for this book skips over the function prologue. To cause gdb to break at the first instruction of a function, the following form should be used.

My typing is boldface.
$ gdb assignment2

GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /home/bob/my_book_working/progs/chap09/assignment2...done.
(gdb) li

1       # assignment2.s
2       # Assigns a 32-bit pattern to the esi register.
3       # Bob Plantz - 11 Jun 2009
4
5               .text
6               .globl  main
7               .type   main, @function
8       main:
9               pushq   %rbp        # save caller’s frame pointer
10              movq    %rsp, %rbp  # establish our frame pointer
(gdb)

11         pushq   %rbx        # save reg.
12
13         movl    $0xabcd1234, %ebx # store a bit pattern in ebx
14
15         popq    %rbx        # restore reg.
16         movl    $0, %eax    # return 0 to caller
17         movq    %rbp, %rsp  # restore stack pointer
18         popq    %rbp        # restore caller’s base pointer
19         ret                 # back to caller
20

I use the li command to list enough of the program to see where I should set the first breakpoint.

(gdb) br 13

Breakpoint 1 at 0x4004d1: file assignment2.s, line 13.

I set the breakpoint on the instruction that implements the assignment operation.

(gdb) run

Starting program: /home/bob/my_book_working/progs/chap09/assignment2

Breakpoint 1, main () at assignment2.s:13
13         movl    $0xabcd1234, %ebx # store a bit pattern in ebx

I run the program, it breaks at the first breakpoint, and I can display the registers that interest me.

(gdb) i r rbx rip

rbx            0x0 0
rip            0x4004d1 0x4004d1 <main+5>

I use the i r (info registers) command to display the contents of the register that is used for the variable. The value in the rip register (the instruction pointer) is 0x4004d1.

(gdb) si

15         popq    %rbx        # restore reg.

Next I use the single instruction (si) command to execute one instruction.

(gdb) i r rbx rip

rbx            0xabcd1234 2882343476
rip            0x4004d6 0x4004d6 <main+10>

Now I can see that the value, 0xabcd1234, has been assigned to the variable. Notice that the rip register has changed from 0x4004d1 to 0x4004d6. This tells us that the instruction that was just executed (movl $0xabcd1234, %ebx) is 0x4004d6 - 0x4004d1 = 5 bytes long.

(gdb) cont

Continuing.
[Inferior 1 (process 2622) exited normally]

Finally, I use the continue command (cont) to run the program out to its end.

(gdb) q
$

And, of course, I have to tell gdb to quit.

9.2 Addition and Subtraction Operators

The assembly language instruction to perform binary addition is quite simple:

adds source, destination

where s denotes the size of the operand:

s meaning number of bits
b byte 8
w word 16
l longword 32
q quadword 64

The add instruction adds the source operand to the destination operand using the rules of binary addition, leaving the result in the destination operand. As with the mov instruction, no more than one operand can be a memory location. The source operand is not changed. In C/C++ the operation could be expressed as:

destination += source

For example, the instruction

     addq    %rax, %rdx

adds the 64-bit value in the rax register to the 64-bit value in the rdx register, leaving the rax register intact. The instruction

     addw    %dx, %r10w

adds the 16-bit value in the dx register to the 16-bit value in the r10w register, leaving the entire rdx register and the high-order 48 bits of the r10 register intact.

In the Intel syntax, the size of the data is determined by the operand, so the size character (b, w, l, or q) is not appended to the instruction. (And the order of the operands is reversed.)

Intel® Syntax

add destination, source

We saw in Chapter 3 that addition may cause carry or overflow. Carry and overflow are recorded in the 64-bit rflags register. The CF is bit number zero, and the OF is bit number eleven (numbering from right to left). Whenever an add instruction is executed both bits are set as shown in Algorithm 9.1.


Algorithm 9.1: Carry Flag and Overflow Flag after add.
1:  if there is no carry then
2:   CF 0
3:  else
4:   CF 1
5:  end if
6:  if there is no overflow then
7:   OF 0
8:  else
9:   OF 1
10:  end if

If the values being added represent unsigned ints, CF indicates whether the result fits within the operand size or not. If the values represent signed ints, OF indicates whether the result fits within the operand size or not. If the size of the operands is less than 64 bits and the operation produces a carry and/or an overflow, this is not propagated up through the next bits in the destination operand. The carry and overflow conditions are simply recorded in the corresponding bits in the rflags register.

For example, if we consider the initial conditions

    register    contents
       rax:    ffff eeee dddd cccc
        r8:    2222 4444 6666 8888
        CF:    ?
        OF:    ?
the instruction
        addl    %eax, %r8d

would produce

    register contents
       rax:    ffff eeee dddd cccc
        r8:    2222 4444 4444 5554
        CF:    1
        OF:    0
Whereas (starting from the same initial conditions) the instruction
        addb    %al, %r8b

would produce

    register contents
       rax:    ffff eeee dddd cccc
        r8:    2222 4444 6666 8854
        CF:    1
        OF:    1

The assembly language instruction to perform binary subtraction is

subs source, destination

where s denotes the size of the operand:

s meaning number of bits
b byte 8
w word 16
l longword 32
q quadword 64

The sub instruction subtracts the source operand from the destination operand using the rules of binary subtraction, leaving the result in the destination operand. As with the mov instruction, no more than one operand can be a memory location. The source operand is not changed. In C/C++ the operation could be expressed as:

destination -= source

For example, the instruction

     subl    %eax, %edx

subtracts the 32-bit value in the eax register from the 32-bit value in the edx register. The instruction

     subb    %dh, %ah

subtracts the 8-bit value in the dh register from the 8-bit value in the ah register.

In the Intel syntax, the size of the data is determined by the operand, so the size character (b, w, or l) is not appended to the instruction. (And the order of the operands is reversed.)

Intel® Syntax

sub destination, source

Subtraction also affects the CF and the OF. Whenever a sub instruction is executed both bits are set as shown in Algorithm 9.2.


Algorithm 9.2: Carry Flag and Overflow Flag after subtraction.
1:  if there is no borrow then
2:   CF 0
3:  else
4:   CF 1
5:  end if
6:  if there is no overflow then
7:   OF 0
8:  else
9:   OF 1
10:  end if

Just as with addition, if the values being subtracted represent unsigned ints, CF indicates whether there was a borrow from beyond the operand size or not. If the values represent signed ints, OF indicates whether the result fits within the operand size or not. If the size of the operands is less than 64 bits and the operation produces a carry and/or an overflow, this is not propagated up through the next bits in the destination operand. The carry and overflow conditions are simply recorded in the corresponding bits in the rflags register.

For example, if we consider the initial conditions

    register    contents
       rax:    ffff eeee dddd cccc
        r8:    2222 4444 6666 8888
        CF:    ?
        OF:    ?
the instruction
        subl    %eax, %r8w

would produce

    register contents
       rax:    ffff eeee dddd cccc
        r8:    2222 4444 8888 bbbc
        CF:    1
        OF:    1
Whereas (starting from the same initial conditions) the instruction
        subb    %al, %r8b

would produce

    register contents
       rax:    ffff eeee dddd cccc
        r8:    2222 4444 6666 88bc
        CF:    1
        OF:    0

A simple program given in Listing 9.4 illustrates both addition and subtraction in C.

 
1/* 
2 * addAndSubtract1.c 
3 * Reads two integers from user, then 
4 * performs addition and subtraction 
5 * Bob Plantz - 11 June 2009 
6 */ 
7 
8#include <stdio.h> 
9 
10int main(void) 
11{ 
12    int w, x, y, z; 
13 
14    printf("Enter two integers: "); 
15    scanf("%i %i", &w, &x); 
16    y = w + x; 
17    z = w - x; 
18    printf("sum = %i, difference = %i\n", y, z); 
19 
20    return 0; 
21}
Listing 9.4: Addition and subtraction (C).

Unfortunately, this program can give incorrect results:

    $ ./addAndSubtract1
    Enter two integers: 1000000000 2000000000
    sum = -1294967296, difference = -1000000000
    $ ./addAndSubtract1
    Enter two integers: -1000000000 2000000000
    sum = 1000000000, difference = 1294967296
Worse, there is no message even warning that these are incorrect results. You know (see Section 3.4, page 108) that the results have overflowed. C does not check for overflow, so you would have to write code that explicitly checks for it.

The assembly language generated by gcc is shown in Listing 9.5 with comments added.

 
1        .file  "addAndSubtract1.c" 
2        .section      .rodata 
3.LC0: 
4        .string"Enter two integers: " 
5.LC1: 
6        .string"%i %i" 
7.LC2: 
8        .string"sum = %i, difference = %i\n" 
9        .text 
10        .globlmain 
11        .type  main, @function 
12main: 
13        pushq  %rbp 
14        movq  %rsp, %rbp 
15        subq  $16, %rsp 
16        movl  $.LC0, %edi 
17        movl  $0, %eax 
18        call  printf 
19        leaq  -12(%rbp), %rdx   # rdx <- address of x 
20        leaq  -16(%rbp), %rax 
21        movq  %rax, %rsi        # rsi <- address of w 
22        movl  $.LC1, %edi       # edi <- address of format string 
23        movl  $0, %eax          # no float arguments 
24        call  __isoc99_scanf 
25        movl  -16(%rbp), %edx   # load w 
26        movl  -12(%rbp), %eax   # load x 
27        addl  %edx, %eax        # x = w + x; 
28        movl  %eax, -8(%rbp)    # y = x; 
29        movl  -16(%rbp), %edx   # load w 
30        movl  -12(%rbp), %eax   # load x 
31        movl  %edx, %ecx 
32        subl  %eax, %ecx 
33        movl  %ecx, %eax        # x = w-x; 
34        movl  %eax, -4(%rbp)    # z = w-x; 
35        movl  -4(%rbp), %edx    # edx <- z 
36        movl  -8(%rbp), %eax 
37        movl  %eax, %esi        # esi <- y 
38        movl  $.LC2, %edi       # edi <- address of format string 
39        movl  $0, %eax          # no float arguments 
40        call  printf 
41        movl  $0, %eax 
42        leave 
43        ret 
44        .size  main, .-main 
45        .ident"GCC: (Ubuntu/Linaro 4.7.0-7ubuntu3) 4.7.0" 
46        .section      .note.GNU-stack,"",@progbits
Listing 9.5: Addition and subtraction (gcc assembly language).

We see that a rather simple C statement:

16    y = w + x;

must be broken down into distinct steps at the assembly language level:

25        movl  -16(%rbp), %edx   # load w 
26        movl  -12(%rbp), %eax   # load x 
27        addl  %edx, %eax        # x = w + x; 
28        movl  %eax, -8(%rbp)    # y = x;

Similarly, the C statement:

17    z = w - x;

is broken down into the distinct steps:

29        movl  -16(%rbp), %edx   # load w 
30        movl  -12(%rbp), %eax   # load x 
31        movl  %edx, %ecx 
32        subl  %eax, %ecx 
33        movl  %ecx, %eax        # x = w-x; 
34        movl  %eax, -4(%rbp)    # z = w-x;

It is easy to see that the compiler did not generate the most efficient code. (This was compiled with no optimization.)

An important lesson here is that writing complex statements in high-level programming languages does not improve efficiency. The statements are ultimately broken down into simple steps. See Exercise 9.3 for a comparison.

We have seen that the computations performed by both these C statements can produce overflow. Table 9.1 shows how the variables (and CF and OF) change as we walk through the code in the program of Listing 9.4. There are two runs of the program using the input values above.


statement w x y z CF OF







scanf(); 0x3b9aca00 0x77359400 ???????? ???????? ? ?
y = w + x; 0x3b9aca00 0x77359400 0xb2d05e00 ???????? 0 0
z = w - x; 0x3b9aca00 0x77359400 0xb2d05e00 0xc4653600 1 0
scanf(); 0xc4653600 0x77359400 ???????? ???????? ? ?
y = w + x; 0xc4653600 0x77359400 0x3b9aca00 ???????? 0 0
z = w - x; 0xc4653600 0x77359400 0x3b9aca00 0x4d2fa200 0 1

Table 9.1: Walking through the code in Listing 9.4. There are two runs of the program here.

Listing 9.6 shows an assembly language program that performs the same operations as the C program in Listing 9.4 but uses the jno (jump if no overflow) instruction to check for overflow. These checks are easy in assembly language. They add very little to the execution time of the program, because most of the time only the conditional jumps are executed, and the jumps do not take place.

 
1# addAndSubtract2.s 
2# Gets two integers from user, then 
3# performs addition and subtraction 
4# Bob Plantz - 11 June 2009 
5# Stack frame 
6        .equ    w,-8 
7        .equ    x,-4 
8        .equ    localSize,-16 
9# Read only data 
10        .section  .rodata 
11prompt: 
12        .string "Enter two integers: " 
13getData: 
14        .string "%i %i" 
15display: 
16        .string "sum = %i, difference = %i\n" 
17warning: 
18        .string "Overflow has occurred.\n" 
19# Code 
20        .text 
21        .globl main 
22        .type  main, @function 
23main: 
24        pushq   %rbp        # save callers base pointer 
25        movq    %rsp, %rbp  # establish our base pointer 
26        addq    $localSize, %rsp  # for local vars 
27 
28        movl    $prompt, %edi  # prompt user 
29        movl    $0, %eax       # no floats 
30        call    printf 
31 
32        leaq    x(%rbp), %rdx  # &x 
33        leaq    w(%rbp), %rsi  # &w 
34        movl    $getData, %edi # get user data 
35        movl    $0, %eax       # no floats 
36        call    scanf 
37 
38        movl    w(%rbp), %esi  # y = w 
39        addl    x(%rbp), %esi  # y += x 
40        jno     nOver1         # skip warning if no OF 
41        movl    $warning, %edi 
42        movl    $0, %eax 
43        call    printf 
44nOver1: 
45        movl    w(%rbp), %edx  # z = w 
46        subl    x(%rbp), %edx  # z -= x 
47        jno     nOver2         # skip warning if no OF 
48        movl    $warning, %edi 
49        movl    $0, %eax 
50        call    printf 
51nOver2: 
52        movl    $display, %edi # display results 
53        movl    $0, %eax       # no floats 
54        call    printf 
55 
56        movl    $0, %eax       # return 0 to OS 
57        movq    %rbp, %rsp     # restore stack pointer 
58        popq    %rbp           # restore callers base pointer 
59        ret
Listing 9.6: Addition and subtraction (programmer assembly language).

9.3 Introduction to Machine Code

This section provides only a very brief glimpse of the machine code for the x86 architecture. The goal here is to provide you with a taste of what machine code looks like and thus emphasize that the computer is really controlled by groups of bit settings. The vast majority of computer professionals never need to know the machine code for the computer they are working with. For a complete description you will need to consult the manufacturer’s documentation.

Let us consider for a moment how we might design a set of machine instructions for a simple four-function computer. Our proposed computer can add, subtract, multiply, and divide. And we will suppose that it has 1 MB of memory. Each instruction must encode the following information for the control unit:

  1. the operation to be performed, and
  2. the location of the operand(s), if any, to operate on.

We will ignore the problem of getting data into the computer for this example, but we will certainly want to be able to move data from location to location in our computer. So we will have five operations:

     move
     add
     subtract
     multiply
     divide
Our design will need to allow three bits for encoding each of these operations. For example, we could use the following code:

     move        000
     add         001
     subtract    010
     multiply    100
     divide      111
Recall that N bits can be used to encode 2N different values. We want 1 MB of memory. From 210 = 1024 = 1K, and 1M = 1K × 1K = 210 × 210 = 220, we see that we need to allow 20 bits for memory addressing.

Thus, if we want our computer to be able to add a value stored in one memory location to the value at another we need 3 + 20 + 20 = 43 bits to encode the instruction. Question: how many bits would be required if we wanted a design that would allow us to add two values stored in memory and store the sum at a third location?

Our silly design falls far short of practicality. The instructions themselves take too much memory, and we have allowed for only a very limited number of operations on the data. This was a more serious problem in the early days of computer design because memory was very expensive. The result was that computer designers came up with some clever ways to encode the necessary information into very few bits.

The design of the x86 processors is a very good example of this cleverness. Intel has paid particular attention to backwards compatibility as their designs have evolved. Thus, we see the remnants of the earlier designs — when memory was very expensive — in the latest Intel processors. The more common instructions generally take fewer bytes of memory. As newer, more complex features have been added, they generally take more bytes.

Computer design took a different turn in the 1980s. Memory had become much cheaper and CPUs had become much faster. This led to designs where all the instructions are the same size — 32 bits being very common these days.

We now turn our attention to the machine code that is produced by the assembler. Recall that it is the machine code that is actually executed by the control unit in the CPU. That is, the computer is controlled by bit patterns that are loaded into the instruction register in the CPU.

Programmers seldom need to know what the machine code is for any given assembly language instruction. The actual instruction depends upon the operation to be performed, the location(s) of the data to operate on, and the size of the data. Even when writing in assembly language, the programmer uses mnemonic names to specify each of these, and the assembler program translates them into the proper machine code instruction. So you do not need to memorize machine code. However, learning how assembly language instructions translate to machine code is important for learning how a computer actually works. And knowing how to “hand assemble” an instruction using a manual can help you find obscure bugs.

9.3.1 Assembler Listings

Most assemblers can provide the programmer with a listing file, which shows the machine code for each instruction. The assembly listing option for the gnu assembler is -al. For example, the “program” in Listing 9.7 contains some instructions that we will assemble and study to illustrate how to read machine language from a listing file.

 
1# someMachineCode.s 
2# Some instructions to illustrate machine code. 
3# Bob Plantz - 11 June 2009 
4 
5        .text 
6        .globl  main 
7        .type   main, @function 
8main: 
9        pushq   %rbp        # save callers base pointer 
10        movq    %rsp, %rbp  # establish our base pointer 
11 
12        movq    $0x1234567890abcdef, %r10  # 64-bit immediate 
13        movl    $0x12345678, %r11d  # 32-bit immediate 
14        movw    $0x1234, %r12w      # 16-bit immediate 
15        movb    $0x12, %r13b        # 8-bit immediate 
16 
17        movq    %rax, %r10       # 64-bit operands 
18        movl    %ecx, %r11d      # 32-bit operands 
19        movw    %dx, %r12w       # 16-bit operands 
20        movb    %bl, %r13b       # 8-bit operands 
21 
22        addq    %r10, %rax       # add 64-bit operands 
23 
24        movb    %al, (%rdi)      # register indirect 
25        movq    %r12, 24(%rsi)   # register indirect with offset 
26 
27        movl    $0, %eax         # return 0 to caller 
28        movq    %rbp, %rsp       # restore stack pointer 
29        popq    %rbp             # restore callers base pointer 
30        ret                      # back to caller
Listing 9.7: Some instructions for us to assemble. (This is not a program, just some instructions.)

The command to assemble the source file in Listing 9.7 and create a listing file is

     as --gstabs -al -o someMachineCode.o someMachineCode.s
The -al option sends the listing file to the standard output file, which defaults to the screen. You can capture this output by redirecting the standard output to a disk file. A good extension for the file name is “.lst.” The complete command is1

     as --gstabs -al -o someMachineCode.o someMachineCode.s \
      > someMachineCode.lst
which produces the file shown in Figure 9.1.
GAS LISTING someMachineCode.s  page 1  
 
 
   1               # someMachineCode.s  
   2               # Some instructions to illustrate machine code.  
   3               # Bob Plantz - 11 June 2009  
   4  
   5                       .text  
   6                       .globl  main  
   7                       .type   main, @function  
   8               main:  
   9 0000 55                pushq   %rbp        # save caller’s base pointer  
  10 0001 4889E5            movq    %rsp, %rbp  # establish our base pointer  
  11  
  12 0004 49BAEFCD          movq    $0x1234567890abcdef, %r10  # 64-bit immediate  
  12      AB907856  
  12      3412  
  13 000e 41BB7856          movl    $0x12345678, %r11d  # 32-bit immediate  
  13      3412  
  14 0014 6641BC34          movw    $0x1234, %r12w      # 16-bit immediate  
  14      12  
  15 0019 41B512            movb    $0x12, %r13b        # 8-bit immediate  
  16  
  17 001c 4989C2            movq    %rax, %r10       # 64-bit operands  
  18 001f 4189CB            movl    %ecx, %r11d      # 32-bit operands  
  19 0022 664189D4          movw    %dx, %r12w       # 16-bit operands  
  20 0026 4188DD            movb    %bl, %r13b       # 8-bit operands  
  21  
  22 0029 4C01D0            addq    %r10, %rax       # add 64-bit operands  
  23  
  24 002c 8807              movb    %al, (%rdi)      # register indirect  
  25 002e 4C896618          movq    %r12, 24(%rsi)   # register indirect with offset  
  26  
  27 0032 B8000000          movl    $0, %eax         # return 0 to caller  
  27      00  
  28 0037 4889EC            movq    %rbp, %rsp       # restore stack pointer  
  29 003a 5D                popq    %rbp             # restore caller’s base pointer  
  30 003b C3                ret                      # back to caller

Figure 9.1: Assembler listing file for the function shown in Listing 9.7.


The first column is the line number of the original source. You should recognize the right-hand two-thirds of the listing as the assembly language source. We will focus our attention on the second and third columns on the left-hand side.

The values in the first column are displayed in decimal, while the values in the second and third columns are in hexadecimal.

The function itself starts on line 8 with the label “main.” Since there is nothing else on this line in the source file, it does not occupy any memory in the program.

The first entry in the second column — 0000 — occurs at line 9. It shows the memory location relative to the beginning of the function. Since the source code on line 8 has only a label, the instruction on line 9 is the first one in this function. Furthermore, the label on line 8 applies to (relative) memory location 0000. The label allows other parts of the program to refer to this memory location by name. In particular, since the label, main, is declared as a .globl, functions in other files linked to this one can refer to this memory location. It effectively names this function as the main function.

The entry in the third column on line 9 is 55. It is the machine code at relative location 0000. That is, byte number 0000 in this function is set to the bit pattern 5516. Following the line across, we can see that this is the machine code corresponding to the instruction

     pushq    %rbp

Since the first instruction occupies one byte of memory, the second instruction will start in byte number 0001 (the second byte from the beginning). From the assembly listing file (Figure 9.1) we see that the machine code for

     movq    %rsp, %rbp

is the bit pattern

4889e516 = 0100 1000 1000 1001 1110 01012

This instruction occupies three bytes. Thus, the third instruction in this function begins at the fifth byte — relative location 0004. Continuing to line 30, the last instruction in the program

     ret

is a one-byte instruction. It is the sixtieth byte in the function and is located at relative location 003b with the bit pattern,

c316 = 1100 00112

So you can use the -al option for the as assembler to produce an assembler listing, which will show you exactly what the bit patterns are for each instruction and which bytes, relative to the beginning of the function, are set to these patterns.

9.3.2 General Format of Instructions

Instructions in the X86-64 architecture can be from one to fifteen bytes in length. Each byte falls into one of several categories:

The general placement of these bytes is shown in Figure 9.2.


PICT
Figure 9.2: General format of instructions. There can be more than one prefix byte. The number of data bytes depends on the size of the data.


9.3.3 REX Prefix Byte

In order for an instruction to use the 64-bit features the x86-64 architecture uses a prefix byte, a REX prefix, placed immediately before the primary instruction. The assembler recognizes when a REX prefix is required and inserts it automatically; the programmer does not need to explicitly specify it. However, the assembler may give an error message that implies it is the responsibility of the programmer to insert a REX prefix. For example, when attempting to use

        subb    %ah, %dil         # subtract bytes

the assembler gave the error message:

addAndSubtract2.s:23: Error: can’t encode register ’%ah’ in an
                instruction requiring REX prefix.
The reason for this error is explained in Section 6.2 (page 457). Accessing the %dil register requires that the assembler insert a REX prefix, but the %ah register cannot be accessed by an instruction that has a REX prefix.

REX prefixes are a byproduct of maintaining backward compatibility. The x86-32 architecture has only 8 general purpose registers, so it is sufficient to have only three bits in an instruction to specify any register. There are 16 general purpose registers in the x86-64 architecture, so four bits are required to specify a register. Some instructions involve up to three registers, thus there must be a place for three more bits to specify all the registers. Rather than change the register-specifying patterns in the Opcode, ModRM, and SIB bytes, the CPU designers decided to use the REX.R, REX.X, and REX.B bits in the REX prefix byte as the high-order bits for specifying registers. This provides the necessary three bits for register specification. A fourth bit in the REX prefix, the REX.W bit, is set to 1 when the operand is 64 bits. For all other operand sizes — 8, 16, or 32 bits — REX.W is set to 0. The format of the REX prefix byte is shown in Figure 9.3.


PICT

Figure 9.3: REX prefix byte. The four lettered bits are named REX.W, REX.R, REX.X, and REX.B.


9.3.4 ModRM Byte

The format of a ModRM byte is shown in Figure 9.4.


PICT

Figure 9.4: ModRM byte. The mode is specified by the mm bits, register by the rrr bits, and address base register by the bbb bits.


When one operand uses the base register plus offset addressing mode, that register is specified by the 3-bit bbb register field, and the other register is specified by the rrr register field. Table 9.2 shows the meaning of the 2-bit mm field.


mm meaning


00 memory operand; address in register specified by bbb
01 memory operand; address in register specified by bbb plus 8-bit offset
10 memory operand; address in register specified by bbb plus 16-bit offset
11 register operand; register specified by bbb

Table 9.2: The mm field in the ModRM byte. Shows how to interpret the bbb register field.

If mm = 11 both operands are register direct and are specified by the two register fields, bbb and rrr. If mm = 00 the bbb register contains the memory address of one of the operands. The bbb register contains a base address for the other two values of mm. 01 means that an 8-bit offset, and 10 a 16-bit offset, is added to the base address to obtain the memory address. The offset is stored as part of the instruction.

The meaning of the register fields is shown in Table 9.3. For 64-bit mode, the REX bit column is explained in Section 9.3.3.


REX
register
register
bit
field
names





0 0 0 0 rax, eax, ax, al
0 0 0 1 rcx, ecx, cx, cl
0 0 1 0 rdx, edx, dx, dl
0 0 1 1 rbx, ebx, bx, bl
0 1 0 0 rsp, esp, sp, spl, ah
0 1 0 1 rbp, ebp, bp, bpl, ch
0 1 1 0 rsi, esi, si, sil, dh
0 1 1 1 rdi, edi, di, dil, bh
1 0 0 0 r8, r8d, r8w, r8b
1 0 0 1 r9, r9d, r9w, r9b
1 0 1 0 r10, r10d, r10w, r10b
1 0 1 1 r11, r11d, r11w, r11b
1 1 0 0 r12, r12d, r12w, r12b
1 1 0 1 r13, r13d, r13w, r13b
1 1 1 0 r14, r14d, r14w, r14b
1 1 1 1 r15, r15d, r15w, r15b

Notes:

  1. A 3-bit register field can be in an opcode, ModRM, or SIB byte, depending upon the instruction.
  2. The REX bit is the REX.R, REX.X, or REX.B bit in the REX prefix (Section 9.3.3), depending on the location of the register field.
  3. If a REX prefix is required, the REX.W bit is set to 1 for 64-bit operands.
  4. The ah, bh, ch, and dh registers cannot be used in an instruction that requires a REX prefix; the spl, bpl, sil, and dil registers require a REX prefix.

Table 9.3: Machine code of general purpose registers. The register name specified by the programmer determines other bit patterns in the instruction in addition to those shown here.

9.3.5 SIB Byte

The format of an SIB byte is shown in Figure 9.5.


PICT

Figure 9.5: SIB byte. The ss bits specify a scale factor, the iii bits the index register, and the bbb bits the address base register.


An SIB byte is required to implement the indexed addressing mode (see Section 13.1, page 799). The memory address is given by multiplying the value in the index register by the scale factor and adding this to the address in the base register. There can also be a offset, which is added to this sum.

9.3.6 The mov Instruction

We next consider the instruction on line 10 of Figure 9.1:

  10 0001 4889E5               movq    %rsp, %rbp  # establish our base pointer

This instruction copies all eight bytes from the rsp register to the rbp register. It starts with a REX Prefix, followed by two bytes for the instruction itself. The general format of the instruction for moving data from one register to another is shown in Figure 9.6.


PICT

Figure 9.6: Machine code for the mov from a register to a register instruction. The source register is coded in the src bits and the destination in the dst bits. See Table 9.3 for the bit patterns in each of these fields.


The REX Prefix is followed by the opcode, then an ModRM byte.

The opcode includes a “w” bit. This bit is 0 for 8-bit moves and 1 for all other sizes. The instruction operates on a 64-bit value, so w = 1 in the opcode (8916).

The 112 in the mod field of the ModRM byte shows that both the source and destination register numbers are encoded in this byte. The src field shows the source and the dst field shows the destination.

From Table 9.3 we see that the source register is either rsp, esp, or sp, and the destination register is either rbp, ebp, or bp. (w = 1 rules out the 8-bit registers.) Since the REX.W bit in the REX Prefix is 1, the operand size is 64 bits. Thus, the instruction makes a copy of all 64 bits in the rsp register into the ebp register.

The second mov format covered here is moving immediate data to a register. Examples are given on lines 11 – 14 of Figure 9.1. The first operand (the source) is a literal — the value itself is stated. This value will be stored immediately after the instruction. Of course, the instruction must encode the fact that this operand is located at the address immediately following the instruction — the immediate data addressing mode. The destination operand is a register — the register direct addressing mode. The general format for the move immediate data to a register instruction is shown in Figure 9.7 in binary.


PICT

Figure 9.7: Machine code for the mov immediate data to a register instruction. The number of data bytes depends on the size of the data.


Consider the

  11 0004 49BAEFCD             movq    $0x1234567890abcdef, %r10 
  11      AB907856 
  11      3412

instruction, the assembler determines that this is a mov instruction and the source operand is immediate data (due to the “$” character), so the first four bits of the opcode are 1011 (see Figure 9.7). Since the operand is not 8 bits, the “w” bit is 1. Next, the assembler figures out that the destination register is the r10 register. Looking this up on Table 9.3 (which is built into the assembler) shows that the remaining three bits are 010. Thus, the assembler generates the first byte of the instruction:

1011 10102 = ba16

Since the operand size is 64 bits, the data value, 0x1234567890abcdef, is stored immediately (immediate addressing mode) after the instruction. Notice that the bytes seem to be stored backwards. That is, it looks like the assembler stored the 64-bit value 0xefcdab9078563412! Recall that the x86-64 architecture uses the little endian order for storing data in memory, so when the movl instruction copies four bytes from memory into a register, the byte at the lowest memory address is loaded into the least significant byte of the register, the byte at the next memory address is loaded into the next higher order byte of the register, etc. The assembler takes this into account for us and stores the immediate data in memory in little endian format.

The endian issue is irrelevant if you are always consistent with the size of the data item. However, if your algorithm changes data size, you need to be very aware of the endianess of the processor. For example, if you use a movl to store four bytes in memory, then four movbs to read them back into registers, you need to be aware of how they are physically stored in memory.

Finally, since this instruction operates on a 64-bit value, the instruction requires a REX Prefix. Referring to Figure 9.3 we see that the REX.W bit is 1, indicating the 64-bit size of the operands. And the REX.B bit is 1, which is used with the dst field to give the 4-bit number of the r10 register, 10102.

BE CAREFUL! Notice that the instruction is ten bytes long (Figure 9.1), but the operand size is four bytes. Do not confuse the size of the instruction with the size of the operand(s).

9.3.7 The add Instruction

The add instruction has three different general formats. We present only a partial description here.

The format for adding an immediate value to a value in the rax, eax, ax, or al register is shown in Figure 9.8. The w bit is 0 for al and 1 for all others. The immediate data value must be the same size as the register to which it is added, except when adding to the rax register. Then the immediate data is 32 bits and is sign-extended to 64 bits before adding it to the value in the rax register. Note that this instruction is not used for the ah portion of the a register. For adding an immediate value to a value to the ah register or any of the other registers, the assembler program must use the instruction shown in Figure 9.9.


PICT

Figure 9.8: Machine code for the add immediate data to the A register (except ah) instruction. The number of data bytes depends on the size of the data.



PICT

Figure 9.9: Machine code for the add immediate data to register (not al, ax, nor eax registers) instruction. The number of data bytes depends on the size of the data.


Notice that the instruction for adding to the a register (except the ah portion) is one byte shorter than when adding to the other registers (compare Figures 9.8 and 9.9). There is an historical reason for this. Early CPU designs had only one general purpose register. It was used as the “accumulator” for performing arithmetic. (Perhaps naming it the “a” register makes a little more sense.) As more general purpose registers were added to the designs, assembly language programmers tended to continue using the “accumulator” register more frequently than the others. And compiler writers continued this same pattern of register usage. Hence, the “a” register is used much more for addition in a program than the other registers, and making it a shorter instruction reduces memory usage and increases execution speed. The differences are generally irrelevant these days, but the x86 architecture has evolved in such a way to maintain backward compatibility.

The add instruction shown in Figure 9.10 is used when the data value is small enough to fit into one byte, but it is being added to a two-, four-, or eight-byte register. The value is sign-extended to a full 16-bit, 32-bit, or 64-bit value, respectively, inside the CPU before it is added to the register. Sign-extension consists of copying the high-order bit into each bit to the left until the full width is reached. For example, sign-extending 0x7f to 32 bits would give 0x0000007f; sign-extending 0x80 to 32 bits would give 0xffffff80. Notice that sign-extension preserves the signed decimal value of the bit pattern. (Review Section 3.3.)


PICT

Figure 9.10: Machine code for the add immediate data to a register instruction. Used when the data will fit into one byte, but the register is two, four, or eight bytes. Value is sign-extended.


An example of this is the instruction

     addl    $5, %ecx

Even though the value can be coded in only eight bits, the full 32 bits of the register may be affected by the addition. That is, the machine code is 83c105 (the data is coded in only one byte), but the CPU adds 0x00000005 to the rcx register. (Recall that this may produce different results than simply adding 0x05 to the cl portion of the ecx register.)

The format for adding a value in a register to a value in a register is shown in Figure 9.11. Again, the registers and size of data are specified by the bits w, src, and dst are given in Table 9.3, and “src” means “source” and “dst” means “destination.”


PICT

Figure 9.11: Machine code for the add register to register instruction.


Let us look at the add instruction on line 17 in Figure 9.1:

     addl    %ecx, %edx

This instruction adds the 32 bits from the ecx register to the 32 bits in the edx register, leaving the result in the edx register. From Table Table 9.3, w = 1, src = 001, and dst = 010. Thus the instruction is

00000001 110010102 = 01ca816

9.4 Instructions Introduced Thus Far

This summary shows the assembly language instructions introduced thus far in the book. The page number where the instruction is explained in more detail, which may be in a subsequent chapter, is also given. This book provides only an introduction to the usage of each instruction. You need to consult the manuals ([2][6], [14][18]) in order to learn all the possible uses of the instructions.

9.4.1 Instructions

data movement:
opcode source destination action page





movs $imm/%reg %reg/mem move 506





movs mem %reg move 506





movsss $imm/%reg %reg/mem move, sign extend 693





movzss $imm/%reg %reg/mem move, zero extend 693





popw %reg/mem pop from stack 566





pushw $imm/%reg/mem push onto stack 566










s = b, w, l, q; w = l, q

arithmetic/logic:
opcode source destination action page





adds $imm/%reg %reg/mem add 607





adds mem %reg add 607





cmps $imm/%reg %reg/mem compare 676





cmps mem %reg compare 676





incs %reg/mem increment 698





leaw mem %reg load effective address 579





subs $imm/%reg %reg/mem subtract 612





subs mem %reg subtract 612










s = b, w, l, q; w = l, q

program flow control:
opcode location action page




call label call function 546




je label jump equal 679




jmp label jump 691




jne label jump not equal 679




jno label jump no overflow 679




leave undo stack frame 580




ret return from function 583




syscall call kernel function 587








9.4.2 Addressing Modes

___________________________________________________________

register direct:

The data value is located in a CPU register.

syntax: name of the register with a “%” prefix.

example: movl    %eax, %ebx



immediate data:

The data value is located immediately after the instruction. Source operand only.

syntax: data value with a “$” prefix.

example: movl    $0xabcd1234, %ebx



base register plus offset:

The data value is located in memory. The address of the memory location is the sum of a value in a base register plus an offset value.

syntax: use the name of the register with parentheses around the name and the offset value immediately before the left parenthesis.

example: movl    $0xaabbccdd, 12(%eax)



9.5 Exercises

9-1

9.1) Enter the assembly language program in Listing 9.3. Use gdb to single step through the program as shown in the book. Before executing each instruction, predict how the rax, rbp, and rsp registers will change. Also record the values in the rip and eflags registers as you single step through the program. How many bytes are there in each instruction?

9-2

9.2) Enter the C program in Listing 9.4. Using gdb, verify that the program works correctly, as shown in Table 9.1.

9-3

9.2) Modify the C program in Listing 9.4 such that the arithmetic operations are done in single steps:

    y = w; 
    y += x; 
    z = w; 
    z -= x;

Use the -S gcc option to generate the assembly language and compare it with the version in Listing 9.5.

9-4

9.2) Enter the assembly language program in Listing 9.6 and run it. Notice that it gives different results than the C version if there is overflow. Why is this? Modify the program so that it gives the same results as the C version but still gives an overflow warning.

9-5

9.3) Assemble each of the mov instructions in Listings 9.7 by hand. Check your answers with the assembly listing.

9-6

9.3) Assemble each of the add instructions in Listing 9.7 by hand. Check your answers with the assembly listing.

9-7

9.3) Assemble each of the following instructions by hand (on paper).

a) movl  $0x89abcdef, %ecx b) movw  $0xabcd, %ax c) movb  $0x30, %al d) movb  $0x31, %ah e) movq  %r8, %r15 f) movb  %r9b, %r10b g) movl  %r11d, %r12d h) movq  $0x7fffec9b2cf4, %rsi

Check your work by entering the code into a source file of the form
        .text 
        .globl  main 
        .type   main, @function 
main: 
        pushq   %rbp 
        movq    %rsp, %rbp 
        # Your code sequence goes here. 
        movl    $0, %eax 
        popq    %rbp 
        ret

and creating a listing file.

9-8

9.3) Assemble each of the following instructions by hand (on paper).

a) addl  $0x89abcdef, %ecx b) addw  $0xabcd, %ax c) addb  $0x30, %al d) addb  $0x31, %ah e) addq  %r12, %r15 f) addw  %r8w, %r10w g) addb  %r9b, %sil h) addl  %esi, %edi
Check your work by entering the code into a source file of the form
        .text 
        .globl  main 
        .type   main, @function 
main: 
        pushq   %rbp 
        movq    %rsp, %rbp 
        # Your code sequence goes here. 
        movl    $0, %eax 
        popq    %rbp 
        ret

and creating a listing file.

9-9

9.3) Design an experiment that will allow you to determine what the machine code is for the

    pushq   64-bit_register

instruction, where “64-bit_register” is any of the general purpose registers. What is the general format of the instruction? Show your answer as a drawing similar to Figure 9.7. Which ones use a REX prefix? Hint: assemble with the -al option.

9-10

9.3) Design an experiment that will allow you to determine what the machine code is for the

    popq    64-bit_register

instruction, where “64-bit_register” is any of the general purpose registers. What is the general format of the instruction? Show your answer as a drawing similar to Figure 9.7. Which ones use a REX prefix? Hint: assemble with the -al option.

9-11

9.3) Disassemble each of the machine instruction sequences by hand (on paper). (Find the corresponding assembly language instruction for each machine code instruction.) Notice that this is a much more difficult problem, because it is difficult to tell where one instruction ends and the next one begins. I have placed one machine instruction on each line to help you. Enter each of your assembly language programs into a source file and use the assembler to check your work.

a)

     b0ab
     b4cd
     41b0ef
     41b701
   
auto b)

     40b723
     40b634
     b256
     b678
   
c)

     b83412cdab
     bbabcd1234
     41b900000000
     41be7b000000
 
d)

     66b8cdab
     66bbbacd
     66b93412
     66ba2143
   
e)

     88c4
     8808
     88480a
     8a08
     8a480a
   
f)

     89c3
     6689d8
     4889ca
     4589c6
   
g)

     04ab
     80c4cd
     80c3ef
     80c701
   
h)

     80c123
     80c534
     80c256
     80c678
   
i)

     053412cdab
     81c3abcd1234
     81c1d4c3b2a1
     81c2a1b2c3d4
   
j)

     5ab00000000
     83c301
     83c100
     81c2ff000000
   
k)

     6605cdab
     6681c3bace
     6681c13412
     6681c22143
   
l)

     6605ab00
     6683c301
     6683c100
     6681c2ff00
   
m)

     00c4
     4100c2
     00ca
     4500c1
   
n)

     01c3
     6600d8
     4801ca
     4501c6