Chapter 12
Bit Operations; Multiplication and Division

We saw in Section 3.5 (page 120) that input read from the keyboard and output written on the screen is in the ASCII code and that integers are stored in the binary number system. So if a program reads user input as, say, 12310, that input is read as the characters ’1’, ’2’, and ’3’’, but the value used in the program is represented by the bit pattern 0000007b16.1 In this chapter, we return to the conversion algorithms between these two storage codes and look at the assembly language that is involved.

12.1 Logical Operators

Two numeric operators, addition and subtraction, were introduced in Section 9.2 (page 609). Many data items are better thought of as bit patterns rather than numerical entities. For example, study Table 2.3 on page 49 and see if you can determine which bit determines the case (upper/lower) of the alphabetic characters.

In order to manipulate individual character codes in a text string, we introduce the bit-wise logical operators in this section. The logical operations are shown in the truth tables in Figure 3.4 (page 136). The instructions available to us to perform these three operations are:

ands source, destination

ors source, destination

xors source, destination

where s denotes the size of the operand:

s meaning number of bits
b byte 8
w word 16
l longword 32
q quadword 64

and destination, source
Intel®; Syntax

or destination, source

xor destination, source

For example, the instruction

        andl    %eax, %edx

performs an and operation between each of the respective 32 bits in the eax register with the 32 bits in the edx register, leaving the result in the edx register. The instruction

        andb    %dh, %ah

performs an and operation between each of the respective 8 bits in the dh register with the 8 bits in the ah register, leaving the result in the ah register.

The addressing modes available for the arithmetic operators, add and sub, are also available for the logical operators. For example, if eax contains the bit pattern 0x89abcdef, the instruction

        andb    $0xee, %al

would change eax to contain 0x89abcdee. If we follow this with the instruction

        orl    $0x11111111, %eax

the bit pattern in eax becomes 0x99bbddff. Finally, if we then use

        xorw    $0x1111, %ax

we end up with 0x99bbccee in eax.

The program in Listing 12.1 shows the use of the C bit-wise logical operators “&” and “|” to change the case of alphabetic characters.

 
1/* 
2 * upperLower.c 
3 * Converts alphabetic characters to all upper case 
4 * and all lower case. 
5 * Bob Plantz - 14 June 2009 
6 */ 
7 
8#include "writeStr.h" 
9#include "readLn.h" 
10#include "toUpper.h" 
11#include "toLower.h" 
12#define MAX 50 
13int main() 
14{ 
15    char stringOrig[MAX]; 
16    char stringWork[MAX]; 
17 
18    writeStr("Enter some alphabetic characters: "); 
19    readLn(stringOrig, MAX); 
20 
21    writeStr("All upper: "); 
22    toUpper(stringOrig, stringWork); 
23    writeStr(stringWork); 
24    writeStr("\n"); 
25 
26    writeStr("All lower: "); 
27    toLower(stringOrig, stringWork); 
28    writeStr(stringWork); 
29    writeStr("\n"); 
30 
31    writeStr("Original: "); 
32    writeStr(stringOrig); 
33    writeStr("\n"); 
34 
35    return 0; 
36}
 
1/* 
2 * toUpper.h 
3 * Converts letters in a C string to upper case. 
4 * Bob Plantz - 14 June 2009 
5 */ 
6 
7#ifndef TOUPPER_H 
8#define TOUPPER_H 
9int toUpper(char *, char *); 
10#endif
 
1/* 
2 * toUpper.c 
3 * Converts alphabetic letters in a C string to upper case. 
4 * Bob Plantz - 14 June 2009 
5 */ 
6 
7#include "toUpper.h" 
8#define UPMASK 0xdf 
9 
10int toUpper(char *srcPtr, char *destPtr) 
11{ 
12    int count = 0; 
13    while (*srcPtr != \0) 
14    { 
15        *destPtr = *srcPtr & UPMASK; 
16        srcPtr++; 
17        destPtr++; 
18        count++; 
19    } 
20    *destPtr = \0;  // terminate string 
21    return count; 
22}
 
1/* 
2 * toLower.h 
3 * Converts letters in a C string to lower case. 
4 * Bob Plantz - 14 June 2009 
5 */ 
6 
7#ifndef TOLOWER_H 
8#define TOLOWER_H 
9int toLower(char *, char *); 
10#endif
 
1/* 
2 * toLower.c 
3 * Converts letters in a C string to lower case. 
4 * Bob Plantz - 14 June 2009 
5 */ 
6 
7#include "toLower.h" 
8#define LOWMASK 0x20 
9 
10int toLower(char *srcPtr, char *destPtr) 
11{ 
12    int count = 0; 
13    while (*srcPtr != \0) 
14    { 
15        *destPtr = *srcPtr | LOWMASK; 
16        srcPtr++; 
17        destPtr++; 
18        count++; 
19    } 
20    *destPtr = \0;  // terminate string 
21    return count; 
22}
Listing 12.1: Convert letters to upper/lower case (C). The functions writeStr and readLn are not shown here; see Exercises 11-3 and 11-4 for the assembly language versions. (There are three files here.)

The program assumes that the user enters all alphabetic characters without making mistakes. Of course, the conversions could be accomplished with addition and subtraction, but in this application the bit-wise logical operators are more natural.

In Listing 12.2 we show only the gcc-generated assembly language for the main and toUpper functions.

 
1        .file  "upperLower.c" 
2        .section      .rodata 
3        .align 8 
4.LC0: 
5        .string"Enter some alphabetic characters: " 
6.LC1: 
7        .string"All upper: " 
8.LC2: 
9        .string"\n" 
10.LC3: 
11        .string"All lower: " 
12.LC4: 
13        .string"Original: " 
14        .text 
15        .globlmain 
16        .type  main, @function 
17main: 
18        pushq  %rbp 
19        movq  %rsp, %rbp 
20        addq  $-128, %rsp 
21        movq  %fs:40, %rax      # load guard value 
22        movq  %rax, -8(%rbp)    # store at end of stack 
23        xorl  %eax, %eax        # clear rax 
24        movl  $.LC0, %edi 
25        call  writeStr 
26        leaq  -128(%rbp), %rax 
27        movl  $50, %esi 
28        movq  %rax, %rdi 
29        call  readLn 
30        movl  $.LC1, %edi 
31        call  writeStr 
32        leaq  -64(%rbp), %rdx 
33        leaq  -128(%rbp), %rax 
34        movq  %rdx, %rsi 
35        movq  %rax, %rdi 
36        call  toUpper 
37        leaq  -64(%rbp), %rax 
38        movq  %rax, %rdi 
39        call  writeStr 
40        movl  $.LC2, %edi 
41        call  writeStr 
42        movl  $.LC3, %edi 
43        call  writeStr 
44        leaq  -64(%rbp), %rdx 
45        leaq  -128(%rbp), %rax 
46        movq  %rdx, %rsi 
47        movq  %rax, %rdi 
48        call  toLower 
49        leaq  -64(%rbp), %rax 
50        movq  %rax, %rdi 
51        call  writeStr 
52        movl  $.LC2, %edi 
53        call  writeStr 
54        movl  $.LC4, %edi 
55        call  writeStr 
56        leaq  -128(%rbp), %rax 
57        movq  %rax, %rdi 
58        call  writeStr 
59        movl  $.LC2, %edi 
60        call  writeStr 
61        movl  $0, %eax 
62        movq  -8(%rbp), %rdx 
63        xorq  %fs:40, %rdx 
64        je    .L3 
65        call  __stack_chk_fail  # check for stack overflow 
66.L3: 
67        leave 
68        ret 
69        .size  main, .-main 
70        .ident"GCC: (Ubuntu/Linaro 4.7.0-7ubuntu3) 4.7.0" 
71        .section      .note.GNU-stack,"",@progbits
 
1        .file  "toUpper.c" 
2        .text 
3        .globltoUpper 
4        .type  toUpper, @function 
5toUpper: 
6        pushq  %rbp 
7        movq  %rsp, %rbp 
8        movq  %rdi, -24(%rbp)    # save srcPtr 
9        movq  %rsi, -32(%rbp)    # save destPtr 
10        movl  $0, -4(%rbp) 
11        jmp    .L2 
12.L3: 
13        movq  -24(%rbp), %rax    # srcPtr 
14        movzbl(%rax), %eax    # load char there 
15        movl  %eax, %edx 
16        andl  $-33, %edx         # make upper case 
17        movq  -32(%rbp), %rax    # destPtr 
18        movb  %dl, (%rax)        # store char there 
19        addq  $1, -24(%rbp)      # srcPtr++; 
20        addq  $1, -32(%rbp)      # destPtr++; 
21        addl  $1, -4(%rbp)       # count++; 
22.L2: 
23        movq  -24(%rbp), %rax 
24        movzbl(%rax), %eax 
25        testb  %al, %al           # NUL character? 
26        jne    .L3                # no, keep going 
27        movq  -32(%rbp), %rax    # yes, load destPtr 
28        movb  $0, (%rax)         # and store NUL there 
29        movl  -4(%rbp), %eax     # return count; 
30        popq  %rbp 
31        ret 
32        .size  toUpper, .-toUpper 
33        .ident"GCC: (Ubuntu/Linaro 4.7.0-7ubuntu3) 4.7.0" 
34        .section      .note.GNU-stack,"",@progbits
Listing 12.2: Convert letters to upper/lower case (gcc assembly language). Only two of the functions in Listing 12.1 are shown. (There are two files here.)

The toLower function is similar to the toUpper, and the writeStr and readLn functions were covered in the exercises in Chapter 11.

Most of the code in Listing 12.2 should be familiar from previous chapters. Note that the C code specifies char arrays in the main function that are 50 elements long (lines 13 and 14). But the compiler generates assembly language that allocates 64 bytes for each array:

17main: 
18        pushq  %rbp 
19        movq  %rsp, %rbp 
20        addq  $-128, %rsp

and:

32        leaq  -64(%rbp), %rdx 
33        leaq  -128(%rbp), %rax 
34        movq  %rdx, %rsi 
35        movq  %rax, %rdi 
36        call  toUpper

Recall that this 16-byte address alignment is specified by the ABI [25].

The code sequence on lines 21 – 23 in main:

21        movq  %fs:40, %rax     # load guard value 
22        movq  %rax, -8(%rbp)   # store at end of stack 
23        xorl  %eax, %eax       # clear rax

is new to you. This code sequence stores a value supplied by the operating system near the end of the stack. The purpose is described in the gcc man page entry for the -fstack-protector option:

Emit extra code to check for buffer overflows, such as stack smashing attacks. This is done by adding a guard variable to functions with vulnerable objects. This includes functions that call alloca, and functions with buffers larger than 8 bytes. The guards are initialized when a function is entered and then checked when the function exits. If a guard check fails, an error message is printed and the program exits.

The value stored there is checked at the end of the function, on lines 54 – 58:

62        movq  -8(%rbp), %rdx 
63        xorq  %fs:40, %rdx 
64        je    .L3 
65        call  __stack_chk_fail  # check for stack overflow 
66.L3:

If the value has been overwritten, the __stack_chk_fail function is called, which notifies the user about the problem.

Your version of gcc may be compiled without this option as the default. It can be turned off with the -fno-stack-protector option. Since the assembly language we are writing in this book is not “industrial strength,” we will not include this stack protection code.

In the toUpper function, the compiler-generated assembly language first loads the address stored in the srcPtr variable into a register so it can dereference the pointer.

13        movq  -24(%rbp), %rax    # srcPtr

It then moves the byte at that address into the register, using the movzbl instruction to zero out the remaining 24 bits of the register. (Recall that changing the low-order 32 bits of a register also zeros out the high-order 32 bits.)

14        movzbl(%rax), %eax    # load char there

Next it moves the byte into another working register and then performs the bit-wise and operation with the bit pattern ffffffdf (= 3310), leaving the result in the edx register. This and operation leaves all the bits in the edx register as they were, except the sixth bit is set to zero. The sixth bit in the ASCII code determines whether a letter is upper or lower case.

15        movl  %eax, %edx 
16        andl  $-33, %edx         # make upper case

Regardless of whether the letter was upper or lower case, it is now upper case. The letter is stored in the low-order eight bits of the edx register, the dl register. So the program loads the address stored in the destPtr variable into a register so it can dereference it and store the character there.

17        movq  -32(%rbp), %rax    # destPtr 
18        movb  %dl, (%rax)        # store char there

We will now consider the version of this program written in assembly language (Listing 12.3).

 
1# upperLower.s 
2# Converts alphabetic characters to all upper case 
3# and all lower case. 
4# Bob Plantz - 14 June 2009 
5 
6# Constant 
7        .equ    MAX,50 
8# Local variable names 
9        .equ    stringOrig,-64            # original char array 
10        .equ    stringWork,-128  # working char array 
11        .equ    localSize,-128 
12# Read only data 
13        .section  .rodata 
14prompt: 
15        .string "Enter some alphabetic characters: " 
16upMsg: 
17        .string "All upper: " 
18lowMsg: 
19        .string "All lower: " 
20origMsg: 
21        .string "Original: " 
22endl: 
23        .string "\n" 
24# Code 
25        .text 
26        .globl  main 
27        .type   main, @function 
28main: 
29        pushq   %rbp             # save base pointer 
30        movq    %rsp, %rbp       # base pointer = current top of stack 
31        addq    $localSize, %rsp # allocate local var. space 
32 
33        movq    $prompt, %rdi # point to prompt message 
34        call    writeStr 
35 
36        leaq    stringOrig(%rbp), %rdi  # place to store string 
37        movl    $MAX, %esi              # max number of char 
38        call    readLn 
39 
40        leaq    stringOrig(%rbp), %rdi  # original string stored here 
41        leaq    stringWork(%rbp), %rsi  # modified string goes here 
42        call    toLower 
43        movq    $lowMsg, %rdi 
44        call    writeStr 
45        leaq    stringWork(%rbp), %rdi  # show modified string 
46        call    writeStr 
47        movq    $endl, %rdi 
48        call    writeStr 
49 
50        leaq    stringWork(%rbp), %rdi  # original string stored here 
51        leaq    stringWork(%rbp), %rsi  # modified string goes here 
52        call    toUpper 
53        movq    $upMsg, %rdi 
54        call    writeStr 
55        leaq    stringWork(%rbp), %rdi  # show modified string 
56        call    writeStr 
57        movq    $endl, %rdi 
58        call    writeStr 
59 
60        movq    $origMsg, %rdi 
61        call    writeStr 
62        leaq    stringOrig(%rbp), %rdi  # original string stored here 
63        call    writeStr 
64        movq    $endl, %rdi 
65        call    writeStr 
66done: 
67        movl    $0, %eax        # return 0; 
68        movq    %rbp, %rsp      # remove local variables 
69        popq    %rbp            # restore caller base pointer 
70        ret                     # back to OS
 
1# toUpper.s 
2# Converts alpha characters to upper case. 
3# Bob Plantz - 14 June 2009 
4 
5# Calling sequence: 
6#       rdi <- address of string to be converted 
7#       rsi <- address to store result string 
8#       call    toUpper 
9# returns number of characters written 
10# If rdi and rsi have the same address, original string 
11# is overwritten. 
12 
13# Useful constant 
14        .equ    UPMASK,0xdf 
15# Stack frame, showing local variables and arguments 
16        .equ    destPtr,-32 
17        .equ    srcPtr,-24 
18        .equ    count,-4 
19        .equ    localSize,-32 
20# Code 
21        .text 
22        .globl  toUpper 
23        .type   toUpper, @function 
24toUpper: 
25        pushq   %rbp             # save frame pointer 
26        movq    %rsp, %rbp       # new frame pointer 
27        addq    $localSize, %rsp # local vars. and arg. 
28 
29        movq    %rdi, srcPtr(%rbp)   # source pointer 
30        movq    %rsi, destPtr(%rbp)  # destination pointer 
31        movl    $0, count(%rbp)  # count = 0; 
32upLoop: 
33        movq    srcPtr(%rbp), %rax # source pointer 
34        movb    (%rax), %al      # get current char 
35        cmpb    $0, %al          # at end yet? 
36        je      done             # yes, all done 
37 
38        andb    $UPMASK, %al     # in range, convert 
39        movq    destPtr(%rbp), %r8  # destination pointer 
40        movb    %al, (%r8)       # store character 
41 
42        incl    count(%rbp)      # count++; 
43        incl    srcPtr(%rbp)     # srcPtr++; 
44        incl    destPtr(%rbp)    # destPtr++; 
45        jmp     upLoop           # and check for end 
46done: 
47        movq    destPtr(%rbp), %r8  # destination pointer 
48        movb    $0, (%r8)       # store NUL 
49        movl    count(%rbp), %eax    # return count 
50        movq    %rbp, %rsp      # restore stack pointer 
51        popq    %rbp            # restore frame pointer 
52        ret                     # back to caller
 
1# toLower.s 
2# Converts alpha characters to lower case. 
3# Bob Plantz - 14 June 2009 
4 
5# Calling sequence: 
6#       rdi <- address of string to be converted 
7#       rsi <- address to store result string 
8#       call    toLower 
9# returns number of characters written 
10# If rdi and rsi have the same address, original string 
11# is overwritten. 
12 
13# Useful constant 
14        .equ    LOWMASK,0x20 
15# Stack frame, showing local variables and arguments 
16        .equ    destPtr,-32 
17        .equ    srcPtr,-24 
18        .equ    count,-4 
19        .equ    localSize,-32 
20# Code 
21        .text 
22        .globl  toLower 
23        .type   toLower, @function 
24toLower: 
25        pushq   %rbp             # save frame pointer 
26        movq    %rsp, %rbp       # new frame pointer 
27        addq    $localSize, %rsp # local vars. and arg. 
28 
29        movq    %rdi, srcPtr(%rbp)   # source pointer 
30        movq    %rsi, destPtr(%rbp)  # destination pointer 
31        movl    $0, count(%rbp)  # count = 0; 
32lowLoop: 
33        movq    srcPtr(%rbp), %rax # source pointer 
34        movb    (%rax), %al      # get current char 
35        cmpb    $0, %al          # at end yet? 
36        je      done             # yes, all done 
37 
38        orb     $LOWMASK, %al    # in range, convert 
39        movq    destPtr(%rbp), %r8  # destination pointer 
40        movb    %al, (%r8)       # store character 
41 
42        incl    count(%rbp)      # count++; 
43        incl    srcPtr(%rbp)     # srcPtr++; 
44        incl    destPtr(%rbp)    # destPtr++; 
45        jmp     lowLoop          # and check for end 
46done: 
47        movq    destPtr(%rbp), %r8  # destination pointer 
48        movb    $0, (%r8)       # store NUL 
49        movl    count(%rbp), %eax    # return count 
50        movq    %rbp, %rsp      # restore stack pointer 
51        popq    %rbp            # restore frame pointer 
52        ret                     # back to caller
Listing 12.3: Convert letters to upper/lower case (programmer assembly language). See Exercises 11.3 and 11.4 for the functions writeStr and readLn. (There are three files here.)

Again, we will describe on the toUpper function. Writing directly in assembly language, we also need to get the address in srcPtr so we can dereference it. But in copying the character stored there, we simply ignore the remaining 56 bits of the rax register. Notice that the movb instruction first uses the full 64-bit address in the rax register to fetch the byte stored there, and it then can write over the low-order 8 bits of the same register. (This, of course, “destroys” the address.)

33        movq    srcPtr(%rbp), %rax # source pointer 
34        movb    (%rax), %al      # get current char

Since we are ignoring the high-order 56 bits of the rax register, we must be consistent when operating on the data in the low-order 8 bits. So we use the andb instruction to operate only on the al portion of the rax register.

38        andb    $UPMASK, %al     # in range, convert

Storing the final result is the same, except we are using different registers.

39        movq    destPtr(%rbp), %r8  # destination pointer 
40        movb    %al, (%r8)       # store character

Both ways of implementing this algorithm are correct, and there is probably no significant efficiency difference. However, comparing the two shows the importance of maintaining consistency in data sizes. You do not need to zero out unused portions of registers, but you should also never assume that they are zero.

12.2 Shifting Bits

It is sometimes useful to be able to shift all the bits to the left or right. Since the relative position of a bit in an integer has significance, shifting all the bits to the left one position effectively multiplies the value by two. And shifting them one position to the right effectively divides the value by two. As you will see in Sections 12.3 and 12.4, the multiplication and division instructions are complicated. They also take a great deal of processor time. Using left/right shifts to effect multiplication/division by powers of two is very efficient.

There are two instructions for shifting bits to the right — shift right and shift arithmetic right:

shrs source, destination

sars source, destination

where s denotes the size of the operand:

s meaning number of bits
b byte 8
w word 16
l longword 32
q quadword 64

shr destination, source
Intel®; Syntax

sar destination, source

The source operand can be either an immediate value, or the value can be located in the cl register. If it is an immediate value, it can be up to 6310. The destination operand can be either a memory location or a register. Any of the addressing modes that we have covered can be used to specify a memory location.

The action of the shr instruction is to shift all the bits in the destination operand to the right by the number of bit positions specified by the source operand. The “vacated” bit positions at the high-order end of the destination operand are filled with zeros. The last bit to be shifted out of the low-order bit position is copied into the carry flag (CF). For example, if the eax register contained the bit pattern aabb 2233, then the instruction

        shrw    $1, %ax

would produce

        eax: aabb 1119

and the CF would be one. With the same initial conditions, the instruction

        shrl    $4, %eax

would produce

        eax: 0aab b223

and the CF would be zero.

The action of the sar instruction is to shift all the bits in the destination operand to the right by the number of bit positions specified by the source operand. The “vacated” bit positions at the high-order end of the destination operand are filled with the same value that was originally in the highest-order bit. The last bit to be shifted out of the low order bit position is copied into the carry flag (CF). For example, if the eax register contained the bit pattern aabb 2233, then the instruction

        sarw    $1, %ax

would produce

        eax: aabb 1119

and the CF would be one. With the same initial conditions, the instruction

        sarl    $4, %eax

would produce

        eax: faab b223

and the CF would be zero.

Thus the difference between “shift right” and “shift arithmetic right” is that the arithmetic shift preserves the sign of the value – as though it represents an integer stored in two’s complement code.

There are two instructions for shifting bits to the left — shift left and shift arithmetic left:

shls source, destination

sals source, destination

where s denotes the size of the operand:

s meaning number of bits
b byte 8
w word 16
l longword 32
q quadword 64

shl destination, source
Intel®; Syntax

sal destination, source

The source operand can be either an immediate value, or the value can be located in the cl register. If it is an immediate value, it can be up to 3110. The destination operand can be either a memory location or a register. Any of the addressing modes that we have covered can be used to specify a memory location.

The action of both the shl and sal instructions is to shift all the bits in the destination operand to the left by the number of bit positions specified by the source operand. In fact, these are really two different assembly language mnemonics for the same machine code. The “vacated” bit positions at the low-order end of the destination operand are filled with zeros. The last bit to be shifted out of the highest-order bit position is copied into the carry flag (CF). For example, if the eax register contained the bit pattern bbaa 2233, then the instruction

        shlw    $1, %ax

would produce

        eax: bbaa 4466

and the CF would be zero. With the same initial conditions, the instruction

        shll    $4, %eax

would produce

        eax: baa2 2330

and the CF would be one.

We see how shifts can be used in the hexToInt function shown in Listing 12.4.

 
1/* 
2 * readHex.c 
3 * Gets hex number from user and stores it as int. 
4 * Bob Plantz - 14 June 2009 
5 */ 
6#include <stdio.h> 
7#include "writeStr.h" 
8#include "readLn.h" 
9#include "toLower.h" 
10#include "hexToInt.h" 
11#define MAX 20 
12int main() 
13{ 
14    char theString[MAX]; 
15    long int theInt; 
16 
17    writeStr("Enter up to 16 hex characters: "); 
18    readLn(theString, MAX); 
19 
20    toLower(theString, theString); 
21    theInt = hexToInt(theString); 
22    printf("%lx = %li\n", theInt, theInt); 
23    return 0; 
24}
 
1/* 
2 * hexToInt.h 
3 * Converts hex character string to int. 
4 * Bob Plantz - 8 April 2008 
5 */ 
6 
7#ifndef HEXTOINT_H 
8#define HEXTOINT_H 
9long int hexToInt(char *); 
10#endif
 
1/* 
2 * hexToInt.c 
3 * Converts hex character string to int. 
4 * Assumes A - F in upper case. 
5 * Bob Plantz - 14 June 2009 
6 */ 
7 
8#include "hexToInt.h" 
9#define NUMERAL 0x30 
10#define ALPHA 0x57 
11 
12long int hexToInt(char *stringPtr) 
13{ 
14    long int accumulator = 0; 
15    char current; 
16 
17    current = *stringPtr; 
18    while (current != \0) 
19    { 
20        accumulator = accumulator << 4; 
21        if (current <= 9)  // only works for 0-9,A-F 
22            current -= NUMERAL; 
23        else 
24            current -= ALPHA; 
25        accumulator += (long int)current; 
26        stringPtr++; 
27        current = *stringPtr; 
28    } 
29    return accumulator; 
30}
Listing 12.4: Shifting bits (C). (There are three files here.)

Notice that “«” (on line 20 in the hexToInt function) is the left shift operator and “»” is the right shift operator in C/C++. In C++ these operators are overloaded to provide file output and input.

The code in the main function is familiar. The compiler-generated assembly language for hexToInt is shown in Listing 12.5 with comments added.

 
1        .file  "hexToInt.c" 
2        .text 
3        .globlhexToInt 
4        .type  hexToInt, @function 
5hexToInt: 
6        pushq  %rbp 
7        movq  %rsp, %rbp 
8        movq  %rdi, -24(%rbp) 
9        movq  $0, -8(%rbp) 
10        movq  -24(%rbp), %rax 
11        movzbl(%rax), %eax 
12        movb  %al, -9(%rbp) 
13        jmp    .L2             # jump to bottom of loop 
14.L5: 
15        salq  $4, -8(%rbp)    # accumulator = accumulator << 4; 
16        cmpb  $57, -9(%rbp)   # if (current > ’9’) 
17        jg    .L3             #    jump to "else" part 
18        movzbl-9(%rbp), %eax  # load current character 
19        subl  $48, %eax       #    convert numeral char to int 
20        movb  %al, -9(%rbp)   #    and update current 
21        jmp    .L4             # jump around the "else" part 
22.L3:                            # "else" part 
23        movzbl-9(%rbp), %eax  # load current character 
24        subl  $87, %eax       #    convert (hex) letter char to int 
25        movb  %al, -9(%rbp)   #    and update current 
26.L4: 
27        movsbq-9(%rbp), %rax  # type-cast char to a long int 
28        addq  %rax, -8(%rbp)  # add it to accumulator 
29        addq  $1, -24(%rbp)   # stringPtr++; 
30        movq  -24(%rbp), %rax 
31        movzbl(%rax), %eax 
32        movb  %al, -9(%rbp)   # current = *stringPtr; 
33.L2: 
34        cmpb  $0, -9(%rbp)    # while (current != ’\0’) 
35        jne    .L5             #    go to top of loop 
36        movq  -8(%rbp), %rax  # 64-bit return value 
37        popq  %rbp 
38        ret 
39        .size  hexToInt, .-hexToInt 
40        .ident"GCC: (Ubuntu/Linaro 4.7.0-7ubuntu3) 4.7.0" 
41        .section      .note.GNU-stack,"",@progbits
Listing 12.5: Shifting bits (gcc assembly language).

As usual, gcc has converted the while loop in hexToInt to a do-while loop, which is entered at the bottom. Most of this code has been covered previously. The instruction

15        salq  $4, -16(%rbp)    # accumulator = accumulator << 4;

shifts the 64 bits that make up the value of the variable accumulator four bits to the left. Make sure that you understand the four high-order bits in this group of 64 are lost. That is, the shift does not carry on to other memory areas beyond these 64 bits. As stated above, the last bit to get shifted out of these 64 bits is copied to the CF.

The conversions from characters to integers

18        movzbl-9(%rbp), %eax  # load current character 
19        subl  $48, %eax       #    convert numeral char to int 
20        movb  %al, -9(%rbp)   #    and update current

and

23        movzbl-9(%rbp), %eax  # load current character 
24        subl  $87, %eax       #    convert (hex) letter char to int 
25        movb  %al, -9(%rbp)   #    and update current

start by moving a one-byte character into an int-sized register with the high-order 24 bits zeroed. The actual conversion consists of subtracting off the “character part” as an integer arithmetic operation. Then the result, which is guaranteed to fit within a byte, is stored back in the single byte allocated for the original character.

Actually, we can easily see that the result of this conversion operation is a four-bit value in the range 0000211112. The four-bit left shift of the variable accumulator has left space for inserting these four bits. The bit insertion operation consists of first type casting the four-bit integer to a 64-bit integer as we load it from the variable:

27        movsbq-9(%rbp),%rax    # type-cast char to a long int

then adding this 64-bit integer to the variable accumulator:

28        addq  %rax, -8(%rbp)   # add it to accumulator

We also note that although the standard return value is 32-bits in the eax register, declaring a long int (64-bit) return value causes the compiler to use the entire rax register:

36        movq  -8(%rbp), %rax  # 64-bit return value

Listing 12.6 shows a version of the hexToInt function written in assembly language.

 
1# hexToInt_a.s 
2# Converts hex characters to a 64-bit int. 
3# Bob Plantz - 14 June 2009 
4 
5# Calling sequence: 
6#       rdi <- address of hex string to be converted 
7#       call    string2Hex 
8# returns 64-bit int represented by the hex string 
9 
10# Useful constants 
11        .equ    NUMERAL,0x30 
12        .equ    ALPHA,0x57 
13        .equ    HEXBITS,4 
14        .equ    TYPEMASK,0xf 
15# Stack frame, showing local variables and arguments 
16        .equ    accumulator,-16 
17        .equ    current,-1 
18        .equ    localSize,-32 
19# Code 
20        .text 
21        .globl  hexToInt 
22        .type   hexToInt, @function 
23hexToInt: 
24        pushq   %rbp             # save frame pointer 
25        movq    %rsp, %rbp       # new frame pointer 
26        addq    $localSize, %rsp # local vars. and arg. 
27 
28        movq    $0, accumulator(%rbp)    # accumulator = 0; 
29loop: 
30        movb    (%rdi), %al      # load character 
31        cmpb    $0, %al          # at end yet? 
32        je      done             # yes, all done 
33 
34        salq    $HEXBITS, accumulator(%rbp) # accumulator = accumulator << 4; 
35 
36        cmpb    $9, %al        # is it numeral? 
37        ja      isAlpha          # no, so its alpha 
38        subb    $NUMERAL, %al    # convert to int 
39        jmp     addIn            # and add to accumulator 
40isAlpha: 
41        subb    $ALPHA, %al      # convert to int 
42addIn: 
43        andq    $TYPEMASK, %rax  # 4 bits -> 64 bits 
44        addq    %rax, accumulator(%rbp) # insert the 4 bits 
45        incq    %rdi             # stringPtr++; 
46        jmp     loop             # and check for end 
47done: 
48        movq    accumulator(%rbp), %rax  # return accumulator; 
49        movq    %rbp, %rsp      # restore stack pointer 
50        popq    %rbp            # restore frame pointer 
51        ret                     # back to caller
Listing 12.6: Shifting bits (programmer assembly language).

It differs from the C version in several ways. First, since this is a leaf function, we do not save the argument in the stack frame. Instead, we simply use the register as the stringPtr variable:

29loop: 
30        movb    (%rdi), %al      # load character 
31        cmpb    $0, %al          # at end yet? 
32        je      done             # yes, all done

We do, however, explicitly allocate stack space for the local variable:

23hexToInt: 
24        pushq   %rbp             # save frame pointer 
25        movq    %rsp, %rbp       # new frame pointer 
26        addq    $localSize, %rsp # local vars. and arg. 
27 
28        movq    $0, accumulator(%rbp)    # accumulator = 0;

Although this is not required because this is a leaf function, it is somewhat better software engineering. If this function is ever modified such that it does call another function, the programmer may forget to allocate the stack space, which would then be required. There is less chance that saving the contents of the rdi register would be overlooked since it is the where the first argument is passed. Both of these issues are arguable design decisions.

Next, for the conversion from 4-bit integer values to 64-bit, we define a bit mask:

14        .equ    TYPEMASK,0xf

Performing an and operation with this bit mask leaves the four low-order bits as they were and sets the 60 high-order bits all to zero. Then we simply add this to the 64-bit accumulator (which has already been shifted four bits to the left) it effectively insert the four bits into the correct location:

43        andq    $TYPEMASK, %rax  # 4 bits -> 64 bits 
44        addq    %rax, accumulator(%rbp) # insert the 4 bits

12.3 Multiplication

The hexToInt function discussed in Section 12.2 shows how to convert a string of hexadecimal characters into the integer they represent. That function uses the fact that each hexadecimal character represents four bits. So as the characters are read from left to right, the bits in the accumulator are shifted four places to the left in order to make space for the next four-bit value. The character is converted to the four bits it represents and added to the accumulator.

Although the four-bit left shift seems natural for hexadecimal, it is equivalent to multiplying the value in the accumulator by sixteen. This follows from the positional notation used to write numbers. Add another hexadecimal digit to the right of an existing number effectively multiplies that existing number by sixteen. A little thought shows that this algorithm, shown in Algorithm 12.1, works in any number base.


Algorithm 12.1: Character to integer conversion.
1:  accumulator 0
2:  Get character.
3:  while more characters do
4:   accumulator base × accumulator
5:   tempInt integer equivalent of the character
6:   accumulator accumulator + tempInt
7:   Get character.
8:  end while

Of course, you probably want to write programs that allow users to work with decimal numbers. So we need to know how to convert a string of decimal characters to the integer they represent. The characters that represent decimal numbers are in the range 30163916. Table 12.1 shows the 32-bit int that corresponds to each numeric character.


Numeral int
(ASCII code) (Binary number system)


0011 0000 0000 0000 0000 0000 0000 0000 0000 0000
0011 0001 0000 0000 0000 0000 0000 0000 0000 0001
0011 0010 0000 0000 0000 0000 0000 0000 0000 0010
0011 0011 0000 0000 0000 0000 0000 0000 0000 0011
0011 0100 0000 0000 0000 0000 0000 0000 0000 0100
0011 0101 0000 0000 0000 0000 0000 0000 0000 0101
0011 0110 0000 0000 0000 0000 0000 0000 0000 0110
0011 0111 0000 0000 0000 0000 0000 0000 0000 0111
0011 1000 0000 0000 0000 0000 0000 0000 0000 1000
0011 1001 0000 0000 0000 0000 0000 0000 0000 1001



Table 12.1: Bit patterns (in binary) of the ASCII numerals and the corresponding 32-bit ints.

For a string of characters that represents a decimal integer, Algorithm 12.1 can be specialized to give Algorithm 12.2. (Recall that “” is the bit-wise and operator.)


Algorithm 12.2: Decimal character to integer conversion.
1:  accumulator 0
2:  Get character.
3:  while more characters do
4:   accumulator 10 × accumulator
5:   tempInt 0xf & character ⊳  &’ is bitwise AND
6:   accumulator accumulator + tempInt
7:   Get character.
8:  end while

Shifting N bits to the left multiplies a number by 2N, so it can only be used to multiply by powers of two. Algorithm 12.2 multiplies the accumulator by 10, which cannot be accomplished with only shifts. Thus, we need to use the multiplication instruction for decimal conversions.

The multiplication instruction is somewhat more complicated than addition. The main problem is that the product can, in general, occupy the number of digits in the multiplier plus the number of digits in the multiplicand. This is easily seen by computing 99 × 99 = 9801 (in decimal). Thus in general,

8-bit × 8-bit 16-bit
16-bit × 16-bit 32-bit
32-bit × 32-bit 64-bit
64-bit × 64-bit 128-bit

The unsigned multiplication instruction is:

muls source

where s denotes the size of the operands:

s meaning number of bits
b byte 8
w word 16
l longword 32
q quadword 64

Intel®; Syntax

mul source

In the x86-64 architecture, the destination operand contains the multiplicand and must be in the al, ax, eax, or rax register, depending on the size of the operand, for the unsigned multiplication instruction, mul. This register is not specified as an operand. The instruction specifies the source operand, which contains the multiplier and must be the same size. It can be located in another general-purpose register or in memory. If the numbers are eight bits (hence, one number is in al), the high-order portion of the result will be in the ah register, and the low-order portion of the result will be in the al register. For sixteen and thirty-two bit numbers, the low-order portion of the product will be stored in a portion of the rax register and the high-order will be stored in a portion of the rdx register as shown in Table 12.2.


Operand Portion High-Order Low-Order
Size of A Reg. Result Result




8 bits al ah al
16 bits ax dx ax
32 bits eax edx eax
64 bits rax rdx rax





Table 12.2: Register usage for the mul instruction.

For example, let’s see how the computation 7 × 24 = 168 looks in 8-bit, 16-bit, and 32-bit values. First, note that:

710 = 0000 01112
   = 0716
(12.1)

2410 = 0001 10002
    = 18
        16
(12.2)

and

16810 = 1010 10002
     = a816
(12.3)

Now, if we declare the constants:

   byteDay: 
           .byte   24 
   wordDay: 
           .word   24 
   longDay: 
           .long   24

These declarations cause the assembler to do the following:

First, consider 8-bit multiplication. If eax contains the bit pattern 0x??????07, then

        mulb   byteDay

changes eax such that it contains 0x????00a8. Notice that only the al portion of the A register can be used for the operand, but the result will occupy the entire ax portion of the register even though the result would fit into only the al portion. That is, the instruction will produce a 16-bit result, and anything stored in the ah portion will be lost.

Next, consider 16-bit multiplication. If eax contains 0x????0007, then

        mulw   wordDay

changes eax to contain 0x????00a8 and edx to contain 0x????0000. Two points are important in this example:

Finally, 32-bit multiplication. If eax contains 0x00000007, then

        mull   longDay

changes eax to contain 0x000000a8 and edx to contain 0x00000000. This example shows the entire eax register must be used for the operand before mull is executed, and the entire edx register is used for the high-order portion of the result, even though it is not needed. That is, the instruction will produce a 64-bit result, and anything stored in the edx register will be lost.

These examples show that the rax and rdx registers are used without ever explicitly appearing in the instruction. You must be very careful not to write over a required value that is stored in one of these registers. Using the multiplication instruction requires some careful planning.

There is also a signed multiply instruction, which has three forms:

imuls source

imuls source, destination

imuls immediate, source, destination

where s denotes the size of the operand:

s meaning number of bits
b byte 8
w word 16
l longword 32
q quadword 64

imul source
Intel®; Syntax

imul destination, source

imul destination, source, immediate

In the one-operand format the signed multiply instruction uses the rdx:rax register combination in the same way as the mul instruction.

In its two-operand format the destination must be a register. The source can be a register, an immediate value, or a memory location. The source and destination are multiplied, and the result is stored in the destination register. Unfortunately, if the result is too large to fit into the destination register, it is simply truncated. In this case, both the CF and OF flags are set to 1. If the result was able to fit into the destination register, both flags are set to 0.

In its three-operand format the destination must be a register. The source can be a register or a memory location. The source is multiplied by the immediate value and the result is stored in the destination register. As in the two-operand form, if the result is too large to fit into the destination register, it is simply truncated. In this case, both the CF and OF flags are set to 1. If the result was able to fit into the destination register, both flags are set to 0.

The difference between signed and unsigned multiplication can be illustrated with the following multiplication of two 16-bit values. Given the declaration:

            .data 
    mOne:   .word   -1

and the initial conditions in the rdx and rax registers:

    rdx            0x7fffec889ec8   140737161764552
    rax            0x2ba1be79ffff   47973685395455
we will multiply the two 16-bit values in the memory location mOne and the register ax. Notice that if we consider them to be signed integers, both values represent -1, and we would expect the result to be +1 (= 0000000116). However, if we consider them to be unsigned integers, they both represent 6553510, and we would expect the result to be 429483622510 (= fffe000116).

Indeed, starting with the initial conditions above, the instruction:

        mulw   mOne

yields:

    rdx            0x7fffec88fffe   140737161789438
    rax            0x2ba1be790001   47973685329921
We see that the register combination dx:ax = fffe:0001. And with the same initial conditions, the instruction
        imulw   mOne

yields:

    rdx            0x7fffec880000   140737161723904
    rax            0x2ba1be790001   47973685329921
With signed multiplication we get dx:ax = 0000:0001. Both of these operations multiplied 16-bit values to provide a 32-bit result. They each used the sixteen low-order bits of the rax and rdx registers for the result. Notice that the upper 48 bits of these registers were not changed and that neither “ax” nor “dx” appeared in either instruction.

Multiplication is used on line 19 in the decToInt function shown in Listing 12.7.

 
1/* 
2 * decToUInt.c 
3 * Converts decimal character string to int. 
4 * Bob Plantz - 15 June 2009 
5 */ 
6 
7#include "decToUInt.h" 
8#define NUMERALMASK 0xf 
9 
10unsigned int decToUInt(char *stringPtr) 
11{ 
12    unsigned int accumulator = 0; 
13    unsigned int base = 10; 
14    unsigned char current; 
15 
16    current = *stringPtr; 
17    while (current != \0) 
18    { 
19        accumulator = accumulator * base; 
20        current = current & NUMERALMASK; 
21        accumulator += (int)current; 
22        stringPtr++; 
23        current = *stringPtr; 
24    } 
25    return accumulator; 
26}
Listing 12.7: Convert decimal text string to int (C).

As we can see on line 17 in Listing 12.8 the compiler has chosen to use the imull instruction for multiplication.

 
1        .file  "decToUInt.c" 
2        .text 
3        .globldecToUInt 
4        .type  decToUInt, @function 
5decToUInt: 
6        pushq  %rbp 
7        movq  %rsp, %rbp 
8        movq  %rdi, -24(%rbp) 
9        movl  $0, -4(%rbp) 
10        movl  $10, -12(%rbp) 
11        movq  -24(%rbp), %rax 
12        movzbl(%rax), %eax 
13        movb  %al, -5(%rbp) 
14        jmp    .L2 
15.L3: 
16        movl  -4(%rbp), %eax   # destination must 
17        imull  -12(%rbp), %eax  #    be in a 
18        movl  %eax, -4(%rbp)   #    register 
19        andb  $15, -5(%rbp) 
20        movzbl-5(%rbp), %eax 
21        addl  %eax, -4(%rbp) 
22        addq  $1, -24(%rbp) 
23        movq  -24(%rbp), %rax 
24        movzbl(%rax), %eax 
25        movb  %al, -5(%rbp) 
26.L2: 
27        cmpb  $0, -5(%rbp) 
28        jne    .L3 
29        movl  -4(%rbp), %eax 
30        popq  %rbp 
31        ret 
32        .size  decToUInt, .-decToUInt 
33        .ident"GCC: (Ubuntu/Linaro 4.7.0-7ubuntu3) 4.7.0" 
34        .section      .note.GNU-stack,"",@progbits
Listing 12.8: Convert decimal text string to int (gcc assembly language).

Recall that the destination must be in a register. The compiler has chosen eax in this case. The value to be multiplied must be loaded from its memory location into the register, multiplied, then stored back into memory:

16        movl  -4(%rbp), %eax   # destination must 
17        imull  -12(%rbp), %eax  #    be in eax 
18        movl  %eax, -4(%rbp)   #    register

It may appear that the compiler has made an error here. Since both the multiplier and the multiplicand are 32-bit values, the product can be 64 bits wide. However, the compiler has chosen code that assumes the product will be no wider than 32 bits. This can lead to arithmetic errors when multiplying large integers, but according to the C programming language standard [10] this is acceptable:

A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.

In Listing 12.9 the programmer has chosen unsigned multiplication, mull.

 
1# decToUInt.s 
2# Converts decimal character string to unsigned int. 
3# Bob Plantz - 15 June 2009 
4 
5# Calling sequence: 
6#       rdi <- address of decimal string to be converted 
7#       call    decToUInt 
8# returns 32-bit int represented by the decimal string 
9 
10# Useful constants 
11        .equ    NUMERALMASK,0xf 
12        .equ    DECIMAL,10 
13# Stack frame, showing local variables and arguments 
14        .equ    accumulator,-8 
15        .equ    current,-1 
16        .equ    localSize,-16 
17 
18        .text 
19        .globl  decToUInt 
20        .type   decToUInt, @function 
21decToUInt: 
22        pushq   %rbp             # save base pointer 
23        movq    %rsp, %rbp       # new base pointer 
24        addq    $localSize, %rsp # local vars. and arg. 
25 
26        movl    $0, accumulator(%rbp)    # accumulator = 0; 
27loop: 
28        movb    (%rdi), %sil     # load character 
29        cmpb    $0, %sil         # at end yet? 
30        je      done             # yes, all done 
31 
32        movl    $DECIMAL, %eax   # multiplier 
33        mull    accumulator(%rbp) # accumulator * 10 
34        movl    %eax, accumulator(%rbp) # update accumulator 
35 
36        andl    $NUMERALMASK, %esi  # char -> int 
37        addl    %esi, accumulator(%rbp) # add the digit 
38        incq    %rdi             # stringPtr++; 
39        jmp     loop             # and check for end 
40done: 
41        movl    accumulator(%rbp), %eax  # return accumulator; 
42        movq    %rbp, %rsp      # restore stack pointer 
43        popq    %rbp            # restore base pointer 
44        ret                     # back to caller
Listing 12.9: Convert decimal text string to int (programmer assembly language).

And since this is a leaf function, the register used to pass the address of the text string (rdi) is simply used as the pointer variable rather than allocate a register save area in the stack frame:

27loop: 
28        movb    (%rdi), %sil     # load character 
29        cmpb    $0, %sil         # at end yet? 
30        je      done             # yes, all done 
31 
32        movl    $DECIMAL, %eax   # multiplier 
33        mull    accumulator(%rbp) # accumulator * 10 
34        movl    %eax, accumulator(%rbp) # update accumulator 
35 
36        andl    $NUMERALMASK, %esi  # char -> int 
37        addl    %esi, accumulator(%rbp) # add the digit 
38        incq    %rdi             # stringPtr++; 
39        jmp     loop             # and check for end

This is safe because no other functions are called within this loop. Of course, the programmer must be careful that the pointer variable (rdi) is not changed unintentionally.

The discussion of multiplication here highlights a problem with C — if an arithmetic operation produces a result that is too large to fit into the allocated data type, the most significant portion of the result is lost. It is important to realize that this loss of information can occur in intermediate results when executing even simple arithmetic expressions. It is therefore VERY IMPORTANT that you analyze each step of arithmetic operations with the range of values that is possible at that step in order to ensure that the result does not overflow the allocated data type. This is one more place where your knowledge of assembly language can help you to write better programs in C/C++.

12.4 Division

Division poses a different problem. In general, the quotient will not be larger than the dividend (except when attempting to divide by zero). Division is also complicated by the existence of a remainder. The divide instruction starts with a dividend that is twice as wide as the divisor. Both the quotient and remainder are the same width as the divisor. The unsigned division instruction is:

divs source

where s denotes the size of the operands:

s meaning number of bits
b byte 8
w word 16
l longword 32
q quadword 64

Intel®; Syntax

div source

The source operand specifies the divisor. It can be either a register or a memory location. Table 12.3 shows how to set up the registers with the dividend and where the quotient and remainder will be located after the unsigned division instruction, div, is executed. Notice that the quotient is the C ‘/’ operation, and the remainder is the ‘%’ operation.


Operand High-Order Low-Order
Size Dividend Dividend Quotient Remainder





8 bits ah al al ah
16 bits dx ax ax dx
32 bits edx eax eax edx
64 bits rdx rax rax rdx






Table 12.3: Register usage for the div instruction.

Notice that these instructions do not explicitly specify any operands, but they may change the rax and rdx registers. They do not affect the condition codes in the rflags register.

For example, let’s see how the computation 93 ÷ 19 = 4 with remainder 17 looks in 8-bit, 16-bit, and 32-bit values. First, note that:



1910
= 0001 00112


= 1316
(12.4)
and


9310
= 0101 11012


= 5d16
(12.5)
Now if we declare the constants:
   byteDivisor:  
           .byte   19  
   wordDivisor:  
           .word   19  
   longDivisor:  
           .long   19

These declarations cause the assembler to do the following:

First, consider 8-bit division. If eax contains the bit pattern 0x????005d, then

        divb   byteDivisor

changes eax such that it contains 0x????1104. Notice that ah had to be set to 0 before executing divb even though the dividend fits into one byte. That’s because the divb instruction starts with the ah:al pair as a 16-bit number. We also see that after executing the instruction, ax contains what appears to be a much larger number as a result of the division. Of course, we no longer consider ax, but al (the quotient) and ah (the remainder) as two separate numbers.

Next, consider 16-bit division. If eax contains 0x????005d and edx 0x????0000, then

        divw   wordDivisor

changes eax to contain 0x????0004 and edx to contain 0x????0011. You may wonder why the divw instruction does not start with the 32-bit dividend in eax. This is for backward compatibility — Intel processors prior to the 80386 had only 16-bit registers.

Finally, 32-bit division. If eax contains 0x0000005d and edx 0x00000000, then

        divl   longDivisor

changes eax to contain 0x00000004 and edx to contain 0x00000011. Again, we see that the entire edx register must be filled with zeros before executing the divl instruction, even though the dividend fits within two bytes.

One of the more common errors with division occurs when performing repeated division of a number. Since the first division places the remainder in the area occupied by the high-order portion of the dividend, you must remember to set that area to the appropriate value before dividing again.

The signed division instruction is:

idivs source

where s denotes the size of the operand:

s meaning number of bits
b byte 8
w word 16
l longword 32
q quadword 64

Intel®; Syntax

idiv source

Unlike the signed multiply instruction, signed divide only has one form, which is the same as unsigned divide. That is, the divisor is in the source operand, and the dividend is set up in the rax and rdx registers as shown in Table 12.3.

There is a nice set of instructions that set up the dividend in the rdx and rax registers, using the contents of the rax register, for signed division. These are shown in Table 12.4. The notation dx:ax, edx:eax, and rdx:rax, means that the high-order portion of the doubled, sign extended, value is placed in the respective portion of the rdx register.


AT&T syntax Intel®; syntax start result




cbtw cbw byte in al word in ax
cwtd cwd word in ax long in dx:ax
cltd cdq lonq in eax quad in edx:eax
cqto cqo quad in rax octuple in rdx:rax

Table 12.4: Instructions to set up the dividend register(s). These instructions do not specify any operands but may change the rax and rdx registers. The cbtw instruction is a repeat from Table 10.5.

We can see the difference between signed and unsigned division by dividing a 32-bit value by a 16-bit value. Given the declaration:

             .section .rodata 
    divisor: .word   100

and loading the 32-bit dividend 32768 into the dx:ax register pair (using hexadecimal):

        movw    $0, %dx 
        movw    $0x8000, %ax

produces the conditions

dx             0x0      0
ax             0x8000   -32768
just before the division. (Recall that gdb displays the signed decimal values.) When we divide 32768 by 100 we expect to get 327 with a remainder of 68. Indeed, unsigned division:
        divw    divisor

gives:

dx             0x44     68
ax             0x147    327
The quotient is in ax, and the remainder is in dx.

With signed integers we need to sign extend when setting up the dx:ax register pair:

        movw    $0x8000, %ax 
        cwtd

which sets up the registers:

dx             0xffff   -1
ax             0x8000   -32768

When we use the signed divide instruction, we are dividing -32767 (= 800016) by 100. We expect to get -327 with a remainder of -68. Indeed, signed division:

        idivw    divisor

yields:

dx             0xffbc -68
ax             0xfeb9 -327
Again, the quotient is in ax, and the remainder is in dx.

The “/” operation is used on line 34 in the intToUDec function shown in Listing 12.7, and the “%” operation is used on line 35.

 
1/* 
2 * intToUDec.c 
3 * 
4 * Converts an int to corresponding unsigned text 
5 * string representation. 
6 * 
7 * input: 
8 *    32-bit int 
9 *    pointer to at least 10-char array 
10 * output: 
11 *    null-terminated string in array 
12 * 
13 * Bob Plantz - 15 June 2009 
14 */ 
15 
16#include "intToUDec.h" 
17#define TOASCII 0x30 
18 
19void intToUDec(char *decString, int theInt) 
20{ 
21    unsigned int x = theInt; 
22    int base = 10; 
23    char reverseArray[10]; 
24    char digit; 
25    char *ptr = reverseArray; 
26 
27    *ptr = \0;     // start with NUL 
28    ptr++; 
29    if (x == 0)      // zero case 
30    { 
31        *ptr = 0; 
32        ptr++; 
33    } 
34    while (x > 0)    // create the string 
35    { 
36        digit = x % base; 
37        x = x / base; 
38        digit = TOASCII | digit; 
39        *ptr = digit; 
40        ptr++; 
41    } 
42    do               // reverse the string 
43    { 
44        ptr--; 
45        *decString = *ptr; 
46        decString++; 
47    } while (*ptr != \0); 
48}
Listing 12.10: Convert unsigned int to decimal text string (C).

The assembly language generated by gcc is shown in Listing 12.11 (with comments added).

 
1        .file  "intToUDec.c" 
2        .text 
3        .globlintToUDec 
4        .type  intToUDec, @function 
5intToUDec: 
6        pushq  %rbp 
7        movq  %rsp, %rbp 
8        movq  %rdi, -40(%rbp) 
9        movl  %esi, -44(%rbp) 
10        movl  -44(%rbp), %eax 
11        movl  %eax, -4(%rbp) 
12        movl  $10, -20(%rbp) 
13        leaq  -32(%rbp), %rax 
14        movq  %rax, -16(%rbp) 
15        movq  -16(%rbp), %rax 
16        movb  $0, (%rax) 
17        addq  $1, -16(%rbp) 
18        cmpl  $0, -4(%rbp) 
19        jne    .L3 
20        movq  -16(%rbp), %rax 
21        movb  $48, (%rax) 
22        addq  $1, -16(%rbp) 
23        jmp    .L3 
24.L4: 
25        movl  -20(%rbp), %ecx   # load base value (10) 
26        movl  -4(%rbp), %eax    # load the integer 
27        movl  $0, %edx          #   edx forms part of dividend 
28        divl  %ecx              # this is for % operation 
29        movl  %edx, %eax        # get remainder 
30        movb  %al, -21(%rbp)    # type cast to char and store 
31        movl  -20(%rbp), %edx   # load base value (10) 
32        movl  %edx, -48(%rbp)   #    and store it 
33        movl  -4(%rbp), %eax    # load the integer 
34        movl  $0, %edx          #   edx forms part of dividend 
35        divl  -48(%rbp)         # this is for / operation 
36        movl  %eax, -4(%rbp)    # x = x / base; 
37        orb    $48, -21(%rbp)    # digit = TOASCII | digit; 
38        movq  -16(%rbp), %rax   # load ptr for dereference 
39        movzbl-21(%rbp), %edx    # load digit 
40        movb  %dl, (%rax)       # *ptr = digit; 
41        addq  $1, -16(%rbp)     # ptr++; 
42.L3: 
43        cmpl  $0, -4(%rbp) 
44        jne    .L4 
45.L5: 
46        subq  $1, -16(%rbp) 
47        movq  -16(%rbp), %rax 
48        movzbl(%rax), %edx 
49        movq  -40(%rbp), %rax 
50        movb  %dl, (%rax) 
51        addq  $1, -40(%rbp) 
52        movq  -16(%rbp), %rax 
53        movzbl(%rax), %eax 
54        testb  %al, %al 
55        jne    .L5 
56        popq  %rbp 
57        ret 
58        .size  intToUDec, .-intToUDec 
59        .ident"GCC: (Ubuntu/Linaro 4.7.0-7ubuntu3) 4.7.0" 
60        .section      .note.GNU-stack,"",@progbits
Listing 12.11: Convert unsigned int to decimal text string (gcc assembly language).

Compare the code sequence from lines 25 – 30:

25        movl  -20(%rbp), %ecx   # load base value (10) 
26        movl  -4(%rbp), %eax    # load the integer 
27        movl  $0, %edx          #   edx forms part of dividend 
28        divl  %ecx              # this is for % operation 
29        movl  %edx, %eax        # get remainder 
30        movb  %al, -21(%rbp)    # type cast to char and store 
31        movb  %al, -9(%rbp)     # type cast to char and store

with that from lines 31 – 36:

31        movl  -20(%rbp), %edx   # load base value (10) 
32        movl  %edx, -48(%rbp)   #    and store it 
33        movl  -4(%rbp), %eax    # load the integer 
34        movl  $0, %edx          #   edx forms part of dividend 
35        divl  -48(%rbp)         # this is for / operation 
36        movl  %eax, -4(%rbp)    # x = x / base;

Even though the divl instruction produces both the quotient (“/” operation) and remainder (“%” operation), the compiler uses almost the same code sequence twice, once for each operation.

In Listing 12.12 the programmer has chosen to retrieve both the quotient and the remainder from one execution of the divl instruction.

 
1# intToUDec.s 
2# Converts unsigned int to corresponding unsigned decimal string 
3# Bob Plantz - 15 June 2009 
4# Bob Plantz - 10 November 2013 - use cltd for division 
5 
6# Calling sequence 
7#       esi <- value of the int 
8#       rdi <- address of place to store string 
9#       call intToUDec 
10# Useful constant 
11        .equ    asciiNumeral,0x30 
12# Stack frame 
13        .equ    reverseArray,-12 
14        .equ    localSize,-16 
15# Read only data 
16        .section .rodata 
17ten:    .long   10 
18# Code 
19        .text 
20        .globl  intToUDec 
21        .type   intToUDec, @function 
22intToUDec: 
23        pushq   %rbp            # save callers base ptr 
24        movq    %rsp, %rbp      # our stack frame 
25        addq    $localSize, %rsp         # local char array 
26 
27        movl    %esi, %eax      # eax used for division 
28        leaq    reverseArray(%rbp), %r8  # ptr to local array 
29        movb    $0, (%r8)       # store NUL 
30        incq    %r8             # increment pointer 
31 
32        cmpl    $0, %eax        # integer == 0? 
33        jne     stringLup       # no, start on the string 
34        movb    $0, (%r8)     # yes, this is the string 
35        incq    %r8             # for reverse copy 
36stringLup: 
37        cmpl    $0, %eax        # integer > 0? 
38        jbe     copyLup         # no, do reverse copy 
39        cvltd                   # yes, set up rdx reg. 
40        divl    ten             # divide by ten 
41        orb     $asciiNumeral, %dl # convert to ascii 
42        movb    %dl, (%r8)      # store character 
43        incq    %r8             # increment pointer 
44        jmp     stringLup       # check at top 
45 
46copyLup: 
47        decq    %r8             # decrement pointer 
48        movb    (%r8), %dl      # get char 
49        movb    %dl, (%rdi)     # store it 
50        incq    %rdi            # increment storage pointer 
51        cmpb    $0, %dl         # NUL character? 
52        jne     copyLup         # no, keep copying 
53 
54        movq    %rbp, %rsp      # delete local vars. 
55        popq    %rbp            # restore caller base ptr 
56        ret
Listing 12.12: Convert unsigned int to decimal text string (programmer assembly language).

On line 38 the high-order 32 bits of the quotient (edx register) are set to 0.

38        movl    $0, %edx        # yes, high-order = 0 
39        divl    ten             # divide by ten 
40        orb     $asciiNumeral, %dl # convert to ascii 
41        movb    %dl, (%r8)      # store character

The division on line 39 leaves “x / base” in the eax register for the next execution of the loop body. It also places “x % base” in the edx register. We know that this value is in the range 0 – 9 and thus fits entirely within the dl portion of the register. Lines 40 and 41 show how the value is converted to its ASCII equivalent and stored in the local char array.

As in the decToUInt function (Listing 12.9), since this is a leaf function, the register used to pass the address of the text string (rdi) is simply used as the pointer variable rather than allocate a register save area in the stack frame. Similarly, the eax register is used as the local “x” variable.

12.5 Negating Signed ints

For dealing with signed numbers, the x86-64 architecture provides an instruction that will perform the two’s complement operation. That is, this instruction will negate an integer that is stored in the two’s complement notation. The mnemonic for the instruction is neg.

negs source

where s denotes the size of the operand:

s meaning number of bits
b byte 8
w word 16
l longword 32
q quadword 64

Intel®; Syntax

neg source

neg performs a two’s complement operation on the value in the operand, which can be either a memory location or a register. Any of the addressing modes that we have covered can be used to specify a memory location.

12.6 Instructions Introduced Thus Far

This summary shows the assembly language instructions introduced thus far in the book. The page number where the instruction is explained in more detail, which may be in a subsequent chapter, is also given. This book provides only an introduction to the usage of each instruction. You need to consult the manuals ([2][6], [14][18]) in order to learn all the possible uses of the instructions.

12.6.1 Instructions

data movement:
opcode source destination action page





cbtw convert byte to word, al ax 699





cwtl convert word to long, ax eax 699





cltq convert long to quad, eax rax 699





cwtd convert word to long, ax dx:ax 788





cltd convert long to quad, eax edx:eax 788





cqto convert quad to octuple, rax rdx:rax 788





cmovcc %reg/mem %reg conditional move 709





movs $imm/%reg %reg/mem move 508





movs mem %reg move 508





movsss $imm/%reg %reg/mem move, sign extend 696





movzss $imm/%reg %reg/mem move, zero extend 696





popw %reg/mem pop from stack 568





pushw $imm/%reg/mem push onto stack 568










s = b, w, l, q; w = l, q; cc = condition codes

arithmetic/logic:
opcode source destination action page





adds $imm/%reg %reg/mem add 609





adds mem %reg add 609





ands $imm/%reg %reg/mem bit-wise and 750





ands mem %reg bit-wise and 750





cmps $imm/%reg %reg/mem compare 679





cmps mem %reg compare 679





decs %reg/mem decrement 702





divs %reg/mem unsigned divide 780





idivs %reg/mem signed divide 786





imuls %reg/mem signed multiply 778





incs %reg/mem increment 701





leaw mem %reg load effective address 581





muls %reg/mem unsigned multiply 772





negs %reg/mem negate 791





ors $imm/%reg %reg/mem bit-wise inclusive or 750





ors mem %reg bit-wise inclusive or 750





sals $imm/%cl %reg/mem shift arithmetic left 759





sars $imm/%cl %reg/mem shift arithmetic right 754





shls $imm/%cl %reg/mem shift left 759





shrs $imm/%cl %reg/mem shift right 754





subs $imm/%reg %reg/mem subtract 614





subs mem %reg subtract 614





tests $imm/%reg %reg/mem test bits 679





tests mem %reg test bits 679





xors $imm/%reg %reg/mem bit-wise exclusive or 750





xors mem %reg bit-wise exclusive or 750










s = b, w, l, q; w = l, q

program flow control:
opcode location action page




call label call function 548




ja label jump above (unsigned) 686




jae label jump above/equal (unsigned) 686




jb label jump below (unsigned) 686




jbe label jump below/equal (unsigned) 686




je label jump equal 682




jg label jump greater than (signed) 689




jge label jump greater than/equal (signed) 689




jl label jump less than (signed) 689




jle label jump less than/equal (signed) 689




jmp label jump 694




jne label jump not equal 682




jno label jump no overflow 682




jcc label jump on condition codes 682




leave undo stack frame 582




ret return from function 585




syscall call kernel function 589








cc = condition codes

12.6.2 Addressing Modes

__________________________________________________________

register direct:

The data value is located in a CPU register.

syntax: name of the register with a “%” prefix.

example: movl    %eax, %ebx



immediate data:

The data value is located immediately after the instruction. Source operand only.

syntax: data value with a “$” prefix.

example: movl    $0xabcd1234, %ebx



base register plus offset:

The data value is located in memory. The address of the memory location is the sum of a value in a base register plus an offset value.

syntax: use the name of the register with parentheses around the name and the offset value immediately before the left parenthesis.

example: movl    $0xaabbccdd, 12(%eax)



rip-relative:

The target is a memory address determined by adding an offset to the current address in the rip register.

syntax: a programmer-defined label

example: je     somePlace



12.7 Exercises

12-1

12.2) Write a program in assembly language that

a)

prompts the user to enter a number in binary,

b)

reads the user input into a char array,

c)

converts the string of characters in the char array into a decimal integer stored in a local int variable, and

d)

calls printf to display the int.

Your program should use the writeStr function from Exercise 11-3 to display the user prompt. And it should use the readStr function from Exercise 11-4 or 11-6 to read the user’s input.

Your program should read the user’s input into the local char array, then perform the conversion using the stored characters. Do not do the conversion as the characters are entered by the user.

Your program does not need to check for user errors. You can assume that the user will enter only ones and zeros. And you can assume that the user will not enter more than 32 bits. (Be careful when you test your program.)

12-2

12.2) Write a program in assembly language that allows the user to enter a decimal integer then displays it in binary.

Your program should convert the decimal integer into the corresponding C-style text string of ones and zeros, then use the writeStr function from Exercise 11-3 to display the text string.

This program will require some careful planning in order to get the bits to print in the correct order.

12-3

12.3) Write a function, mul16, in assembly language that takes two 16-bit integers as arguments and returns the 32-bit product of the argument. Write a main driver function to test mul16. You may use printf and scanf in the main function for the user interface.

Hint: Notice that most of the numbers in this problem are 16-bit unsigned integers. Read the man pages for printf and scanf. In particular, the ”u” flag character is used to indicate a short (16-bit) int.

12-4

12.4) Write a function, div32, in assembly language that implements the C / operation. The function takes two 32-bit integers as arguments and returns the 32-bit quotient of the first argument divided by the second. Write a main driver function to test div32. You may use printf and scanf in the main function for the user interface.

12-5

12.4) Write a function, mod32, in assembly language that implements the C % operation. The function takes two 32-bit integers as arguments and returns the 32-bit quotient of the first argument divided by the second. Write a main driver function to test mod32. You may use printf and scanf in the main function for the user interface.

12-6

12.4) Write a function in assembly language, decimal2uint, that takes two arguments: a pointer to a char, and a pointer to an unsigned int.

      int decimal2uint(char *, unsigned int *);

The function assumes that the first argument points to a C-style text string that contains only numeric characters representing an unsigned decimal integer. It computes the binary value of the integer and stores the result at the location where the second argument points. It returns zero.

Write a program that demonstrates the correctness of decimal2uint. Your program will allocate a char array, call readStr (from Exercise 11-4 or 11-5) to get a decimal integer from the user, and call decimal2uint to convert the text string to binary format. Then it adds an integer to the user’s input integer and uses printf to display the result.

Hint: Start with the program from Exercise 12-1. Rewrite it so that the conversion from the text string to the binary number is performed by a function. Then modify the function so that it performs a decimal conversion instead of binary.

12-7

12.4) Write a function in assembly language, uint2dec, that takes two arguments: a pointer to a char, and an unsigned int.

      int uint2dec(char *, unsigned int);

The function assumes that the first argument points to a char array that is large enough to hold a text string that represents the largest possible 32-bit unsigned integer in decimal. It computes the characters that represent the integer (the second argument) and stores this representation as a C-style text string where the first argument points. It returns zero.

Write an assembly language program that demonstrates the correctness of uint2dec. Your program will allocate one char array, call readStr (from Exercise 11-4) or 11-6 to get a decimal integer from the user, and call decimal2uint (from Exercise 12-6) to convert the text string to binary format. It should add a constant integer to this converted value. Then it calls uint2dec to convert the sum to its corresponding text string, storing the string in the char array.

Hint: Start with the program from Exercise 10-6. Rewrite it so that the conversion from the binary number to the text string is performed by a function. Then modify the function so that it performs a decimal conversion instead of binary.

12-8

12.3) Modify the program in Exercise 12-7 so that it deals with signed ints. Hint: Write the function decimal2sint, which will call decimal2uint, and write the function sint2dec, which will call uint2dec.