12 Bit Operations; Multiplication and Division

Chapter 12
Bit Operations; Multiplication and Division

We saw in Section 3.5 (page 118) that input read from the keyboard and output written on the screen is in the ASCII code and that integers are stored in the binary number system. So if a program reads user input as, say, 123₁₀, that input is read as the characters ’1’, ’2’, and ’3’’, but the value used in the program is represented by the bit pattern 0000007b₁₆.¹ In this chapter, we return to the conversion algorithms between these two storage codes and look at the assembly language that is involved.

12.1 Logical Operators

Two numeric operators, addition and subtraction, were introduced in Section 9.2 (page 607). Many data items are better thought of as bit patterns rather than numerical entities. For example, study Table 2.3 on page 49 and see if you can determine which bit determines the case (upper/lower) of the alphabetic characters.

In order to manipulate individual character codes in a text string, we introduce the bit-wise logical operators in this section. The logical operations are shown in the truth tables in Figure 3.4 (page 134). The instructions available to us to perform these three operations are:

		ands	source, destination
		ors	source, destination
		xors	source, destination

where s denotes the size of the operand:

s	meaning	number of bits
b	byte	8
w	word	16
l	longword	32
q	quadword	64

	and	destination, source
Intel® Syntax	or	destination, source
	xor	destination, source

For example, the instruction

andl %eax, %edx

performs an and operation between each of the respective 32 bits in the eax register with the 32 bits in the edx register, leaving the result in the edx register. The instruction

andb %dh, %ah

performs an and operation between each of the respective 8 bits in the dh register with the 8 bits in the ah register, leaving the result in the ah register.

The addressing modes available for the arithmetic operators, add and sub, are also available for the logical operators. For example, if eax contains the bit pattern 0x89abcdef, the instruction

andb $0xee, %al

would change eax to contain 0x89abcdee. If we follow this with the instruction

orl $0x11111111, %eax

the bit pattern in eax becomes 0x99bbddff. Finally, if we then use

xorw $0x1111, %ax

we end up with 0x99bbccee in eax.

The program in Listing 12.1 shows the use of the C bit-wise logical operators “&” and “|” to change the case of alphabetic characters.

1/*
2 * upperLower.c
3 * Converts alphabetic characters to all upper case
4 * and all lower case.
5 * Bob Plantz - 14 June 2009
6 */
7
8#include "writeStr.h"
9#include "readLn.h"
10#include "toUpper.h"
11#include "toLower.h"
12#define MAX 50
13int main()
14{
15    char stringOrig[MAX];
16    char stringWork[MAX];
17
18    writeStr("Enter some alphabetic characters: ");
19    readLn(stringOrig, MAX);
20
21    writeStr("All upper: ");
22    toUpper(stringOrig, stringWork);
23    writeStr(stringWork);
24    writeStr("\n");
25
26    writeStr("All lower: ");
27    toLower(stringOrig, stringWork);
28    writeStr(stringWork);
29    writeStr("\n");
30
31    writeStr("Original: ");
32    writeStr(stringOrig);
33    writeStr("\n");
34
35    return 0;
36}

1/*
2 * toUpper.h
3 * Converts letters in a C string to upper case.
4 * Bob Plantz - 14 June 2009
5 */
6
7#ifndef TOUPPER_H
8#define TOUPPER_H
9int toUpper(char *, char *);
10#endif

1/*
2 * toUpper.c
3 * Converts alphabetic letters in a C string to upper case.
4 * Bob Plantz - 14 June 2009
5 */
6
7#include "toUpper.h"
8#define UPMASK 0xdf
9
10int toUpper(char *srcPtr, char *destPtr)
11{
12    int count = 0;
13    while (*srcPtr != ’\0’)
14    {
15        *destPtr = *srcPtr & UPMASK;
16        srcPtr++;
17        destPtr++;
18        count++;
19    }
20    *destPtr = ’\0’;  // terminate string
21    return count;
22}

1/*
2 * toLower.h
3 * Converts letters in a C string to lower case.
4 * Bob Plantz - 14 June 2009
5 */
6
7#ifndef TOLOWER_H
8#define TOLOWER_H
9int toLower(char *, char *);
10#endif

1/*
2 * toLower.c
3 * Converts letters in a C string to lower case.
4 * Bob Plantz - 14 June 2009
5 */
6
7#include "toLower.h"
8#define LOWMASK 0x20
9
10int toLower(char *srcPtr, char *destPtr)
11{
12    int count = 0;
13    while (*srcPtr != ’\0’)
14    {
15        *destPtr = *srcPtr | LOWMASK;
16        srcPtr++;
17        destPtr++;
18        count++;
19    }
20    *destPtr = ’\0’;  // terminate string
21    return count;
22}

Listing 12.1: Convert letters to upper/lower case (C). The functions writeStr and readLn are not shown here; see Exercises 11-3 and 11-4 for the assembly language versions. (There are three ﬁles here.)

The program assumes that the user enters all alphabetic characters without making mistakes. Of course, the conversions could be accomplished with addition and subtraction, but in this application the bit-wise logical operators are more natural.

In Listing 12.2 we show only the gcc-generated assembly language for the main and toUpper functions.

1        .file  "upperLower.c"
2        .section      .rodata
3        .align 8
4.LC0:
5        .string"Enter some alphabetic characters: "
6.LC1:
7        .string"All upper: "
8.LC2:
9        .string"\n"
10.LC3:
11        .string"All lower: "
12.LC4:
13        .string"Original: "
14        .text
15        .globlmain
16        .type  main, @function
17main:
18        pushq  %rbp
19        movq  %rsp, %rbp
20        addq  $-128, %rsp
21        movq  %fs:40, %rax      # load guard value
22        movq  %rax, -8(%rbp)    # store at end of stack
23        xorl  %eax, %eax        # clear rax
24        movl  $.LC0, %edi
25        call  writeStr
26        leaq  -128(%rbp), %rax
27        movl  $50, %esi
28        movq  %rax, %rdi
29        call  readLn
30        movl  $.LC1, %edi
31        call  writeStr
32        leaq  -64(%rbp), %rdx
33        leaq  -128(%rbp), %rax
34        movq  %rdx, %rsi
35        movq  %rax, %rdi
36        call  toUpper
37        leaq  -64(%rbp), %rax
38        movq  %rax, %rdi
39        call  writeStr
40        movl  $.LC2, %edi
41        call  writeStr
42        movl  $.LC3, %edi
43        call  writeStr
44        leaq  -64(%rbp), %rdx
45        leaq  -128(%rbp), %rax
46        movq  %rdx, %rsi
47        movq  %rax, %rdi
48        call  toLower
49        leaq  -64(%rbp), %rax
50        movq  %rax, %rdi
51        call  writeStr
52        movl  $.LC2, %edi
53        call  writeStr
54        movl  $.LC4, %edi
55        call  writeStr
56        leaq  -128(%rbp), %rax
57        movq  %rax, %rdi
58        call  writeStr
59        movl  $.LC2, %edi
60        call  writeStr
61        movl  $0, %eax
62        movq  -8(%rbp), %rdx
63        xorq  %fs:40, %rdx
64        je    .L3
65        call  __stack_chk_fail  # check for stack overflow
66.L3:
67        leave
68        ret
69        .size  main, .-main
70        .ident"GCC: (Ubuntu/Linaro 4.7.0-7ubuntu3) 4.7.0"
71        .section      .note.GNU-stack,"",@progbits

1        .file  "toUpper.c"
2        .text
3        .globltoUpper
4        .type  toUpper, @function
5toUpper:
6        pushq  %rbp
7        movq  %rsp, %rbp
8        movq  %rdi, -24(%rbp)    # save srcPtr
9        movq  %rsi, -32(%rbp)    # save destPtr
10        movl  $0, -4(%rbp)
11        jmp    .L2
12.L3:
13        movq  -24(%rbp), %rax    # srcPtr
14        movzbl(%rax), %eax    # load char there
15        movl  %eax, %edx
16        andl  $-33, %edx         # make upper case
17        movq  -32(%rbp), %rax    # destPtr
18        movb  %dl, (%rax)        # store char there
19        addq  $1, -24(%rbp)      # srcPtr++;
20        addq  $1, -32(%rbp)      # destPtr++;
21        addl  $1, -4(%rbp)       # count++;
22.L2:
23        movq  -24(%rbp), %rax
24        movzbl(%rax), %eax
25        testb  %al, %al           # NUL character?
26        jne    .L3                # no, keep going
27        movq  -32(%rbp), %rax    # yes, load destPtr
28        movb  $0, (%rax)         # and store NUL there
29        movl  -4(%rbp), %eax     # return count;
30        popq  %rbp
31        ret
32        .size  toUpper, .-toUpper
33        .ident"GCC: (Ubuntu/Linaro 4.7.0-7ubuntu3) 4.7.0"
34        .section      .note.GNU-stack,"",@progbits

Listing 12.2: Convert letters to upper/lower case (gcc assembly language). Only two of the functions in Listing 12.1 are shown. (There are two ﬁles here.)

The toLower function is similar to the toUpper, and the writeStr and readLn functions were covered in the exercises in Chapter 11.

Most of the code in Listing 12.2 should be familiar from previous chapters. Note that the C code speciﬁes char arrays in the main function that are 50 elements long (lines 13 and 14). But the compiler generates assembly language that allocates 64 bytes for each array:

17main:
18        pushq  %rbp
19        movq  %rsp, %rbp
20        addq  $-128, %rsp

and:

32        leaq  -64(%rbp), %rdx
33        leaq  -128(%rbp), %rax
34        movq  %rdx, %rsi
35        movq  %rax, %rdi
36        call  toUpper

Recall that this 16-byte address alignment is speciﬁed by the ABI [25].

The code sequence on lines 21 – 23 in main:

21        movq  %fs:40, %rax     # load guard value
22        movq  %rax, -8(%rbp)   # store at end of stack
23        xorl  %eax, %eax       # clear rax

is new to you. This code sequence stores a value supplied by the operating system near the end of the stack. The purpose is described in the gcc man page entry for the -fstack-protector option:

Emit extra code to check for buffer overflows, such as stack smashing attacks. This is done by adding a guard variable to functions with vulnerable objects. This includes functions that call alloca, and functions with buffers larger than 8 bytes. The guards are initialized when a function is entered and then checked when the function exits. If a guard check fails, an error message is printed and the program exits.

The value stored there is checked at the end of the function, on lines 54 – 58:

62        movq  -8(%rbp), %rdx
63        xorq  %fs:40, %rdx
64        je    .L3
65        call  __stack_chk_fail  # check for stack overflow
66.L3:

If the value has been overwritten, the __stack_chk_fail function is called, which notiﬁes the user about the problem.

Your version of gcc may be compiled without this option as the default. It can be turned oﬀ with the -fno-stack-protector option. Since the assembly language we are writing in this book is not “industrial strength,” we will not include this stack protection code.

In the toUpper function, the compiler-generated assembly language ﬁrst loads the address stored in the srcPtr variable into a register so it can dereference the pointer.

13 movq -24(%rbp), %rax # srcPtr

It then moves the byte at that address into the register, using the movzbl instruction to zero out the remaining 24 bits of the register. (Recall that changing the low-order 32 bits of a register also zeros out the high-order 32 bits.)

14 movzbl(%rax), %eax # load char there

Next it moves the byte into another working register and then performs the bit-wise and operation with the bit pattern ffffffdf (= −33₁₀), leaving the result in the edx register. This and operation leaves all the bits in the edx register as they were, except the sixth bit is set to zero. The sixth bit in the ASCII code determines whether a letter is upper or lower case.

15 movl %eax, %edx
16 andl $-33, %edx # make upper case

Regardless of whether the letter was upper or lower case, it is now upper case. The letter is stored in the low-order eight bits of the edx register, the dl register. So the program loads the address stored in the destPtr variable into a register so it can dereference it and store the character there.

17 movq -32(%rbp), %rax # destPtr
18 movb %dl, (%rax) # store char there

We will now consider the version of this program written in assembly language (Listing 12.3).

1# upperLower.s
2# Converts alphabetic characters to all upper case
3# and all lower case.
4# Bob Plantz - 14 June 2009
5
6# Constant
7        .equ    MAX,50
8# Local variable names
9        .equ    stringOrig,-64            # original char array
10        .equ    stringWork,-128  # working char array
11        .equ    localSize,-128
12# Read only data
13        .section  .rodata
14prompt:
15        .string "Enter some alphabetic characters: "
16upMsg:
17        .string "All upper: "
18lowMsg:
19        .string "All lower: "
20origMsg:
21        .string "Original: "
22endl:
23        .string "\n"
24# Code
25        .text
26        .globl  main
27        .type   main, @function
28main:
29        pushq   %rbp             # save base pointer
30        movq    %rsp, %rbp       # base pointer = current top of stack
31        addq    $localSize, %rsp # allocate local var. space
32
33        movq    $prompt, %rdi # point to prompt message
34        call    writeStr
35
36        leaq    stringOrig(%rbp), %rdi  # place to store string
37        movl    $MAX, %esi              # max number of char
38        call    readLn
39
40        leaq    stringOrig(%rbp), %rdi  # original string stored here
41        leaq    stringWork(%rbp), %rsi  # modified string goes here
42        call    toLower
43        movq    $lowMsg, %rdi
44        call    writeStr
45        leaq    stringWork(%rbp), %rdi  # show modified string
46        call    writeStr
47        movq    $endl, %rdi
48        call    writeStr
49
50        leaq    stringWork(%rbp), %rdi  # original string stored here
51        leaq    stringWork(%rbp), %rsi  # modified string goes here
52        call    toUpper
53        movq    $upMsg, %rdi
54        call    writeStr
55        leaq    stringWork(%rbp), %rdi  # show modified string
56        call    writeStr
57        movq    $endl, %rdi
58        call    writeStr
59
60        movq    $origMsg, %rdi
61        call    writeStr
62        leaq    stringOrig(%rbp), %rdi  # original string stored here
63        call    writeStr
64        movq    $endl, %rdi
65        call    writeStr
66done:
67        movl    $0, %eax        # return 0;
68        movq    %rbp, %rsp      # remove local variables
69        popq    %rbp            # restore caller base pointer
70        ret                     # back to OS

1# toUpper.s
2# Converts alpha characters to upper case.
3# Bob Plantz - 14 June 2009
4
5# Calling sequence:
6#       rdi <- address of string to be converted
7#       rsi <- address to store result string
8#       call    toUpper
9# returns number of characters written
10# If rdi and rsi have the same address, original string
11# is overwritten.
12
13# Useful constant
14        .equ    UPMASK,0xdf
15# Stack frame, showing local variables and arguments
16        .equ    destPtr,-32
17        .equ    srcPtr,-24
18        .equ    count,-4
19        .equ    localSize,-32
20# Code
21        .text
22        .globl  toUpper
23        .type   toUpper, @function
24toUpper:
25        pushq   %rbp             # save frame pointer
26        movq    %rsp, %rbp       # new frame pointer
27        addq    $localSize, %rsp # local vars. and arg.
28
29        movq    %rdi, srcPtr(%rbp)   # source pointer
30        movq    %rsi, destPtr(%rbp)  # destination pointer
31        movl    $0, count(%rbp)  # count = 0;
32upLoop:
33        movq    srcPtr(%rbp), %rax # source pointer
34        movb    (%rax), %al      # get current char
35        cmpb    $0, %al          # at end yet?
36        je      done             # yes, all done
37
38        andb    $UPMASK, %al     # in range, convert
39        movq    destPtr(%rbp), %r8  # destination pointer
40        movb    %al, (%r8)       # store character
41
42        incl    count(%rbp)      # count++;
43        incl    srcPtr(%rbp)     # srcPtr++;
44        incl    destPtr(%rbp)    # destPtr++;
45        jmp     upLoop           # and check for end
46done:
47        movq    destPtr(%rbp), %r8  # destination pointer
48        movb    $0, (%r8)       # store NUL
49        movl    count(%rbp), %eax    # return count
50        movq    %rbp, %rsp      # restore stack pointer
51        popq    %rbp            # restore frame pointer
52        ret                     # back to caller

1# toLower.s
2# Converts alpha characters to lower case.
3# Bob Plantz - 14 June 2009
4
5# Calling sequence:
6#       rdi <- address of string to be converted
7#       rsi <- address to store result string
8#       call    toLower
9# returns number of characters written
10# If rdi and rsi have the same address, original string
11# is overwritten.
12
13# Useful constant
14        .equ    LOWMASK,0x20
15# Stack frame, showing local variables and arguments
16        .equ    destPtr,-32
17        .equ    srcPtr,-24
18        .equ    count,-4
19        .equ    localSize,-32
20# Code
21        .text
22        .globl  toLower
23        .type   toLower, @function
24toLower:
25        pushq   %rbp             # save frame pointer
26        movq    %rsp, %rbp       # new frame pointer
27        addq    $localSize, %rsp # local vars. and arg.
28
29        movq    %rdi, srcPtr(%rbp)   # source pointer
30        movq    %rsi, destPtr(%rbp)  # destination pointer
31        movl    $0, count(%rbp)  # count = 0;
32lowLoop:
33        movq    srcPtr(%rbp), %rax # source pointer
34        movb    (%rax), %al      # get current char
35        cmpb    $0, %al          # at end yet?
36        je      done             # yes, all done
37
38        orb     $LOWMASK, %al    # in range, convert
39        movq    destPtr(%rbp), %r8  # destination pointer
40        movb    %al, (%r8)       # store character
41
42        incl    count(%rbp)      # count++;
43        incl    srcPtr(%rbp)     # srcPtr++;
44        incl    destPtr(%rbp)    # destPtr++;
45        jmp     lowLoop          # and check for end
46done:
47        movq    destPtr(%rbp), %r8  # destination pointer
48        movb    $0, (%r8)       # store NUL
49        movl    count(%rbp), %eax    # return count
50        movq    %rbp, %rsp      # restore stack pointer
51        popq    %rbp            # restore frame pointer
52        ret                     # back to caller

Listing 12.3: Convert letters to upper/lower case (programmer assembly language). See Exercises 11.3 and 11.4 for the functions writeStr and readLn. (There are three ﬁles here.)

Again, we will describe on the toUpper function. Writing directly in assembly language, we also need to get the address in srcPtr so we can dereference it. But in copying the character stored there, we simply ignore the remaining 56 bits of the rax register. Notice that the movb instruction ﬁrst uses the full 64-bit address in the rax register to fetch the byte stored there, and it then can write over the low-order 8 bits of the same register. (This, of course, “destroys” the address.)

33 movq srcPtr(%rbp), %rax # source pointer
34 movb (%rax), %al # get current char

Since we are ignoring the high-order 56 bits of the rax register, we must be consistent when operating on the data in the low-order 8 bits. So we use the andb instruction to operate only on the al portion of the rax register.

38 andb $UPMASK, %al # in range, convert

Storing the ﬁnal result is the same, except we are using diﬀerent registers.

39 movq destPtr(%rbp), %r8 # destination pointer
40 movb %al, (%r8) # store character

Both ways of implementing this algorithm are correct, and there is probably no signiﬁcant eﬃciency diﬀerence. However, comparing the two shows the importance of maintaining consistency in data sizes. You do not need to zero out unused portions of registers, but you should also never assume that they are zero.

12.2 Shifting Bits

It is sometimes useful to be able to shift all the bits to the left or right. Since the relative position of a bit in an integer has signiﬁcance, shifting all the bits to the left one position eﬀectively multiplies the value by two. And shifting them one position to the right eﬀectively divides the value by two. As you will see in Sections 12.3 and 12.4, the multiplication and division instructions are complicated. They also take a great deal of processor time. Using left/right shifts to eﬀect multiplication/division by powers of two is very eﬃcient.

There are two instructions for shifting bits to the right — shift right and shift arithmetic right:

		shrs	source, destination
		sars	source, destination

where s denotes the size of the operand:

s	meaning	number of bits
b	byte	8
w	word	16
l	longword	32
q	quadword	64

	shr	destination, source
Intel® Syntax	sar	destination, source

The source operand can be either an immediate value, or the value can be located in the cl register. If it is an immediate value, it can be up to 63₁₀. The destination operand can be either a memory location or a register. Any of the addressing modes that we have covered can be used to specify a memory location.

The action of the shr instruction is to shift all the bits in the destination operand to the right by the number of bit positions speciﬁed by the source operand. The “vacated” bit positions at the high-order end of the destination operand are ﬁlled with zeros. The last bit to be shifted out of the low-order bit position is copied into the carry ﬂag (CF). For example, if the eax register contained the bit pattern aabb 2233, then the instruction

shrw $1, %ax

would produce

eax: aabb 1119

and the CF would be one. With the same initial conditions, the instruction

shrl $4, %eax

would produce

eax: 0aab b223

and the CF would be zero.

The action of the sar instruction is to shift all the bits in the destination operand to the right by the number of bit positions speciﬁed by the source operand. The “vacated” bit positions at the high-order end of the destination operand are ﬁlled with the same value that was originally in the highest-order bit. The last bit to be shifted out of the low order bit position is copied into the carry ﬂag (CF). For example, if the eax register contained the bit pattern aabb 2233, then the instruction

sarw $1, %ax

would produce

eax: aabb 1119

and the CF would be one. With the same initial conditions, the instruction

sarl $4, %eax

would produce

eax: faab b223

and the CF would be zero.

Thus the diﬀerence between “shift right” and “shift arithmetic right” is that the arithmetic shift preserves the sign of the value – as though it represents an integer stored in two’s complement code.

There are two instructions for shifting bits to the left — shift left and shift arithmetic left:

		shls	source, destination
		sals	source, destination

where s denotes the size of the operand:

s	meaning	number of bits
b	byte	8
w	word	16
l	longword	32
q	quadword	64

	shl	destination, source
Intel® Syntax	sal	destination, source

The source operand can be either an immediate value, or the value can be located in the cl register. If it is an immediate value, it can be up to 31₁₀. The destination operand can be either a memory location or a register. Any of the addressing modes that we have covered can be used to specify a memory location.

The action of both the shl and sal instructions is to shift all the bits in the destination operand to the left by the number of bit positions speciﬁed by the source operand. In fact, these are really two diﬀerent assembly language mnemonics for the same machine code. The “vacated” bit positions at the low-order end of the destination operand are ﬁlled with zeros. The last bit to be shifted out of the highest-order bit position is copied into the carry ﬂag (CF). For example, if the eax register contained the bit pattern bbaa 2233, then the instruction

shlw $1, %ax

would produce

eax: bbaa 4466

and the CF would be zero. With the same initial conditions, the instruction

shll $4, %eax

would produce

eax: baa2 2330

and the CF would be one.

We see how shifts can be used in the hexToInt function shown in Listing 12.4.

1/*
2 * readHex.c
3 * Gets hex number from user and stores it as int.
4 * Bob Plantz - 14 June 2009
5 */
6#include <stdio.h>
7#include "writeStr.h"
8#include "readLn.h"
9#include "toLower.h"
10#include "hexToInt.h"
11#define MAX 20
12int main()
13{
14    char theString[MAX];
15    long int theInt;
16
17    writeStr("Enter up to 16 hex characters: ");
18    readLn(theString, MAX);
19
20    toLower(theString, theString);
21    theInt = hexToInt(theString);
22    printf("%lx = %li\n", theInt, theInt);
23    return 0;
24}

1/*
2 * hexToInt.h
3 * Converts hex character string to int.
4 * Bob Plantz - 8 April 2008
5 */
6
7#ifndef HEXTOINT_H
8#define HEXTOINT_H
9long int hexToInt(char *);
10#endif

1/*
2 * hexToInt.c
3 * Converts hex character string to int.
4 * Assumes A - F in upper case.
5 * Bob Plantz - 14 June 2009
6 */
7
8#include "hexToInt.h"
9#define NUMERAL 0x30
10#define ALPHA 0x57
11
12long int hexToInt(char *stringPtr)
13{
14    long int accumulator = 0;
15    char current;
16
17    current = *stringPtr;
18    while (current != ’\0’)
19    {
20        accumulator = accumulator << 4;
21        if (current <= ’9’)  // only works for 0-9,A-F
22            current -= NUMERAL;
23        else
24            current -= ALPHA;
25        accumulator += (long int)current;
26        stringPtr++;
27        current = *stringPtr;
28    }
29    return accumulator;
30}

Listing 12.4: Shifting bits (C). (There are three ﬁles here.)

Notice that “«” (on line 20 in the hexToInt function) is the left shift operator and “»” is the right shift operator in C/C++. In C++ these operators are overloaded to provide ﬁle output and input.

The code in the main function is familiar. The compiler-generated assembly language for hexToInt is shown in Listing 12.5 with comments added.

1        .file  "hexToInt.c"
2        .text
3        .globlhexToInt
4        .type  hexToInt, @function
5hexToInt:
6        pushq  %rbp
7        movq  %rsp, %rbp
8        movq  %rdi, -24(%rbp)
9        movq  $0, -8(%rbp)
10        movq  -24(%rbp), %rax
11        movzbl(%rax), %eax
12        movb  %al, -9(%rbp)
13        jmp    .L2             # jump to bottom of loop
14.L5:
15        salq  $4, -8(%rbp)    # accumulator = accumulator << 4;
16        cmpb  $57, -9(%rbp)   # if (current > ’9’)
17        jg    .L3             #    jump to "else" part
18        movzbl-9(%rbp), %eax  # load current character
19        subl  $48, %eax       #    convert numeral char to int
20        movb  %al, -9(%rbp)   #    and update current
21        jmp    .L4             # jump around the "else" part
22.L3:                            # "else" part
23        movzbl-9(%rbp), %eax  # load current character
24        subl  $87, %eax       #    convert (hex) letter char to int
25        movb  %al, -9(%rbp)   #    and update current
26.L4:
27        movsbq-9(%rbp), %rax  # type-cast char to a long int
28        addq  %rax, -8(%rbp)  # add it to accumulator
29        addq  $1, -24(%rbp)   # stringPtr++;
30        movq  -24(%rbp), %rax
31        movzbl(%rax), %eax
32        movb  %al, -9(%rbp)   # current = *stringPtr;
33.L2:
34        cmpb  $0, -9(%rbp)    # while (current != ’\0’)
35        jne    .L5             #    go to top of loop
36        movq  -8(%rbp), %rax  # 64-bit return value
37        popq  %rbp
38        ret
39        .size  hexToInt, .-hexToInt
40        .ident"GCC: (Ubuntu/Linaro 4.7.0-7ubuntu3) 4.7.0"
41        .section      .note.GNU-stack,"",@progbits

Listing 12.5: Shifting bits (gcc assembly language).

As usual, gcc has converted the while loop in hexToInt to a do-while loop, which is entered at the bottom. Most of this code has been covered previously. The instruction

15 salq $4, -16(%rbp) # accumulator = accumulator << 4;

shifts the 64 bits that make up the value of the variable accumulator four bits to the left. Make sure that you understand the four high-order bits in this group of 64 are lost. That is, the shift does not carry on to other memory areas beyond these 64 bits. As stated above, the last bit to get shifted out of these 64 bits is copied to the CF.

The conversions from characters to integers

18        movzbl-9(%rbp), %eax  # load current character
19        subl  $48, %eax       #    convert numeral char to int
20        movb  %al, -9(%rbp)   #    and update current

and

23        movzbl-9(%rbp), %eax  # load current character
24        subl  $87, %eax       #    convert (hex) letter char to int
25        movb  %al, -9(%rbp)   #    and update current

start by moving a one-byte character into an int-sized register with the high-order 24 bits zeroed. The actual conversion consists of subtracting oﬀ the “character part” as an integer arithmetic operation. Then the result, which is guaranteed to ﬁt within a byte, is stored back in the single byte allocated for the original character.

Actually, we can easily see that the result of this conversion operation is a four-bit value in the range 0000₂ – 1111₂. The four-bit left shift of the variable accumulator has left space for inserting these four bits. The bit insertion operation consists of ﬁrst type casting the four-bit integer to a 64-bit integer as we load it from the variable:

27 movsbq-9(%rbp),%rax # type-cast char to a long int

then adding this 64-bit integer to the variable accumulator:

28 addq %rax, -8(%rbp) # add it to accumulator

We also note that although the standard return value is 32-bits in the eax register, declaring a long int (64-bit) return value causes the compiler to use the entire rax register:

36 movq -8(%rbp), %rax # 64-bit return value

Listing 12.6 shows a version of the hexToInt function written in assembly language.

1# hexToInt_a.s
2# Converts hex characters to a 64-bit int.
3# Bob Plantz - 14 June 2009
4
5# Calling sequence:
6#       rdi <- address of hex string to be converted
7#       call    string2Hex
8# returns 64-bit int represented by the hex string
9
10# Useful constants
11        .equ    NUMERAL,0x30
12        .equ    ALPHA,0x57
13        .equ    HEXBITS,4
14        .equ    TYPEMASK,0xf
15# Stack frame, showing local variables and arguments
16        .equ    accumulator,-16
17        .equ    current,-1
18        .equ    localSize,-32
19# Code
20        .text
21        .globl  hexToInt
22        .type   hexToInt, @function
23hexToInt:
24        pushq   %rbp             # save frame pointer
25        movq    %rsp, %rbp       # new frame pointer
26        addq    $localSize, %rsp # local vars. and arg.
27
28        movq    $0, accumulator(%rbp)    # accumulator = 0;
29loop:
30        movb    (%rdi), %al      # load character
31        cmpb    $0, %al          # at end yet?
32        je      done             # yes, all done
33
34        salq    $HEXBITS, accumulator(%rbp) # accumulator = accumulator << 4;
35
36        cmpb    $’9’, %al        # is it numeral?
37        ja      isAlpha          # no, so it’s alpha
38        subb    $NUMERAL, %al    # convert to int
39        jmp     addIn            # and add to accumulator
40isAlpha:
41        subb    $ALPHA, %al      # convert to int
42addIn:
43        andq    $TYPEMASK, %rax  # 4 bits -> 64 bits
44        addq    %rax, accumulator(%rbp) # insert the 4 bits
45        incq    %rdi             # stringPtr++;
46        jmp     loop             # and check for end
47done:
48        movq    accumulator(%rbp), %rax  # return accumulator;
49        movq    %rbp, %rsp      # restore stack pointer
50        popq    %rbp            # restore frame pointer
51        ret                     # back to caller

Listing 12.6: Shifting bits (programmer assembly language).

It diﬀers from the C version in several ways. First, since this is a leaf function, we do not save the argument in the stack frame. Instead, we simply use the register as the stringPtr variable:

29loop:
30        movb    (%rdi), %al      # load character
31        cmpb    $0, %al          # at end yet?
32        je      done             # yes, all done

We do, however, explicitly allocate stack space for the local variable:

23hexToInt:
24        pushq   %rbp             # save frame pointer
25        movq    %rsp, %rbp       # new frame pointer
26        addq    $localSize, %rsp # local vars. and arg.
27
28        movq    $0, accumulator(%rbp)    # accumulator = 0;

Although this is not required because this is a leaf function, it is somewhat better software engineering. If this function is ever modiﬁed such that it does call another function, the programmer may forget to allocate the stack space, which would then be required. There is less chance that saving the contents of the rdi register would be overlooked since it is the where the ﬁrst argument is passed. Both of these issues are arguable design decisions.

Next, for the conversion from 4-bit integer values to 64-bit, we deﬁne a bit mask:

14 .equ TYPEMASK,0xf

Performing an and operation with this bit mask leaves the four low-order bits as they were and sets the 60 high-order bits all to zero. Then we simply add this to the 64-bit accumulator (which has already been shifted four bits to the left) it eﬀectively insert the four bits into the correct location:

43 andq $TYPEMASK, %rax # 4 bits -> 64 bits
44 addq %rax, accumulator(%rbp) # insert the 4 bits

12.3 Multiplication

The hexToInt function discussed in Section 12.2 shows how to convert a string of hexadecimal characters into the integer they represent. That function uses the fact that each hexadecimal character represents four bits. So as the characters are read from left to right, the bits in the accumulator are shifted four places to the left in order to make space for the next four-bit value. The character is converted to the four bits it represents and added to the accumulator.

Although the four-bit left shift seems natural for hexadecimal, it is equivalent to multiplying the value in the accumulator by sixteen. This follows from the positional notation used to write numbers. Add another hexadecimal digit to the right of an existing number eﬀectively multiplies that existing number by sixteen. A little thought shows that this algorithm, shown in Algorithm 12.1, works in any number base.

Algorithm 12.1: Character to integer conversion.

1: accumulator ← 0
2: Get character.
3: while more characters do
4: accumulator ← base × accumulator
5: tempInt ← integer equivalent of the character
6: accumulator ← accumulator + tempInt
7: Get character.
8: end while

Of course, you probably want to write programs that allow users to work with decimal numbers. So we need to know how to convert a string of decimal characters to the integer they represent. The characters that represent decimal numbers are in the range 30₁₆ – 39₁₆. Table 12.1 shows the 32-bit int that corresponds to each numeric character.

Numeral	int
(ASCII code)	(Binary number system)

0011 0000	0000 0000 0000 0000 0000 0000 0000 0000
0011 0001	0000 0000 0000 0000 0000 0000 0000 0001
0011 0010	0000 0000 0000 0000 0000 0000 0000 0010
0011 0011	0000 0000 0000 0000 0000 0000 0000 0011
0011 0100	0000 0000 0000 0000 0000 0000 0000 0100
0011 0101	0000 0000 0000 0000 0000 0000 0000 0101
0011 0110	0000 0000 0000 0000 0000 0000 0000 0110
0011 0111	0000 0000 0000 0000 0000 0000 0000 0111
0011 1000	0000 0000 0000 0000 0000 0000 0000 1000
0011 1001	0000 0000 0000 0000 0000 0000 0000 1001

Table 12.1: Bit patterns (in binary) of the ASCII numerals and the corresponding 32-bit ints.

For a string of characters that represents a decimal integer, Algorithm 12.1 can be specialized to give Algorithm 12.2. (Recall that “⋅” is the bit-wise and operator.)

Algorithm 12.2: Decimal character to integer conversion.

1: accumulator ← 0
2: Get character.
3: while more characters do
4: accumulator ← 10 × accumulator
5: tempInt ← 0xf & character $⊳$ ’&’ is bitwise AND
6: accumulator ← accumulator + tempInt
7: Get character.
8: end while

Shifting N bits to the left multiplies a number by 2^N, so it can only be used to multiply by powers of two. Algorithm 12.2 multiplies the accumulator by 10, which cannot be accomplished with only shifts. Thus, we need to use the multiplication instruction for decimal conversions.

The multiplication instruction is somewhat more complicated than addition. The main problem is that the product can, in general, occupy the number of digits in the multiplier plus the number of digits in the multiplicand. This is easily seen by computing 99 × 99 = 9801 (in decimal). Thus in general,

8-bit × 8-bit	⇒ 16-bit
16-bit × 16-bit	⇒ 32-bit
32-bit × 32-bit	⇒ 64-bit
64-bit × 64-bit	⇒ 128-bit

The unsigned multiplication instruction is:

		muls	source

where s denotes the size of the operands:

s	meaning	number of bits
b	byte	8
w	word	16
l	longword	32
q	quadword	64

Intel® Syntax		mul	source

In the x86-64 architecture, the destination operand contains the multiplicand and must be in the al, ax, eax, or rax register, depending on the size of the operand, for the unsigned multiplication instruction, mul. This register is not speciﬁed as an operand. The instruction speciﬁes the source operand, which contains the multiplier and must be the same size. It can be located in another general-purpose register or in memory. If the numbers are eight bits (hence, one number is in al), the high-order portion of the result will be in the ah register, and the low-order portion of the result will be in the al register. For sixteen and thirty-two bit numbers, the low-order portion of the product will be stored in a portion of the rax register and the high-order will be stored in a portion of the rdx register as shown in Table 12.2.

Operand	Portion	High-Order	Low-Order
Size	of A Reg.	Result	Result

8 bits	al	ah	al
16 bits	ax	dx	ax
32 bits	eax	edx	eax
64 bits	rax	rdx	rax

Table 12.2: Register usage for the mul instruction.

For example, let’s see how the computation 7 × 24 = 168 looks in 8-bit, 16-bit, and 32-bit values. First, note that:

710 = 0000 01112 = 0716

(12.1)

2410 = 0001 10002 = 18 16

(12.2)

and

16810 = 1010 10002 = a816

(12.3)

Now, if we declare the constants:

   byteDay:
           .byte   24
   wordDay:
           .word   24
   longDay:
           .long   24

These declarations cause the assembler to do the following:

byteDay → allocate one byte of memory and set the bit pattern of the byte to 0x18.
wordDay → allocate two bytes of memory and set the bit pattern of those two bytes to 0x0018.
longDay → allocate four bytes of memory and set the bit pattern of those four bytes to 0x00000018.

First, consider 8-bit multiplication. If eax contains the bit pattern 0x??????07, then

mulb byteDay

changes eax such that it contains 0x????00a8. Notice that only the al portion of the A register can be used for the operand, but the result will occupy the entire ax portion of the register even though the result would ﬁt into only the al portion. That is, the instruction will produce a 16-bit result, and anything stored in the ah portion will be lost.

Next, consider 16-bit multiplication. If eax contains 0x????0007, then

mulw wordDay

changes eax to contain 0x????00a8 and edx to contain 0x????0000. Two points are important in this example:

the ah portion of the A register must be set to zero before executing the mulw instruction so that the ax portion contains the proper value, and
the dx portion of the D register is used, even though the result is ﬁts within the 16 bits of the ax register.

Finally, 32-bit multiplication. If eax contains 0x00000007, then

mull longDay

changes eax to contain 0x000000a8 and edx to contain 0x00000000. This example shows the entire eax register must be used for the operand before mull is executed, and the entire edx register is used for the high-order portion of the result, even though it is not needed. That is, the instruction will produce a 64-bit result, and anything stored in the edx register will be lost.

These examples show that the rax and rdx registers are used without ever explicitly appearing in the instruction. You must be very careful not to write over a required value that is stored in one of these registers. Using the multiplication instruction requires some careful planning.

There is also a signed multiply instruction, which has three forms:

		imuls	source
		imuls	source, destination
		imuls	immediate, source, destination

where s denotes the size of the operand:

s	meaning	number of bits
b	byte	8
w	word	16
l	longword	32
q	quadword	64

	imul	source
Intel® Syntax	imul	destination, source
	imul	destination, source, immediate

In the one-operand format the signed multiply instruction uses the rdx:rax register combination in the same way as the mul instruction.

In its two-operand format the destination must be a register. The source can be a register, an immediate value, or a memory location. The source and destination are multiplied, and the result is stored in the destination register. Unfortunately, if the result is too large to ﬁt into the destination register, it is simply truncated. In this case, both the CF and OF ﬂags are set to 1. If the result was able to ﬁt into the destination register, both ﬂags are set to 0.

In its three-operand format the destination must be a register. The source can be a register or a memory location. The source is multiplied by the immediate value and the result is stored in the destination register. As in the two-operand form, if the result is too large to ﬁt into the destination register, it is simply truncated. In this case, both the CF and OF ﬂags are set to 1. If the result was able to ﬁt into the destination register, both ﬂags are set to 0.

The diﬀerence between signed and unsigned multiplication can be illustrated with the following multiplication of two 16-bit values. Given the declaration:

.data
mOne: .word -1

and the initial conditions in the rdx and rax registers:

rdx 0x7fffec889ec8 140737161764552
rax 0x2ba1be79ffff 47973685395455

we will multiply the two 16-bit values in the memory location mOne and the register ax. Notice that if we consider them to be signed integers, both values represent -1, and we would expect the result to be +1 (= 00000001₁₆). However, if we consider them to be unsigned integers, they both represent 65535₁₀, and we would expect the result to be 4294836225₁₀ (= fffe0001₁₆).

Indeed, starting with the initial conditions above, the instruction:

mulw mOne

yields:

rdx 0x7fffec88fffe 140737161789438
rax 0x2ba1be790001 47973685329921

We see that the register combination dx:ax = fffe:0001. And with the same initial conditions, the instruction

imulw mOne

yields:

rdx 0x7fffec880000 140737161723904
rax 0x2ba1be790001 47973685329921

With signed multiplication we get dx:ax = 0000:0001. Both of these operations multiplied 16-bit values to provide a 32-bit result. They each used the sixteen low-order bits of the rax and rdx registers for the result. Notice that the upper 48 bits of these registers were not changed and that neither “ax” nor “dx” appeared in either instruction.

Multiplication is used on line 19 in the decToInt function shown in Listing 12.7.

1/*
2 * decToUInt.c
3 * Converts decimal character string to int.
4 * Bob Plantz - 15 June 2009
5 */
6
7#include "decToUInt.h"
8#define NUMERALMASK 0xf
9
10unsigned int decToUInt(char *stringPtr)
11{
12    unsigned int accumulator = 0;
13    unsigned int base = 10;
14    unsigned char current;
15
16    current = *stringPtr;
17    while (current != ’\0’)
18    {
19        accumulator = accumulator * base;
20        current = current & NUMERALMASK;
21        accumulator += (int)current;
22        stringPtr++;
23        current = *stringPtr;
24    }
25    return accumulator;
26}

Listing 12.7: Convert decimal text string to int (C).

As we can see on line 17 in Listing 12.8 the compiler has chosen to use the imull instruction for multiplication.

1        .file  "decToUInt.c"
2        .text
3        .globldecToUInt
4        .type  decToUInt, @function
5decToUInt:
6        pushq  %rbp
7        movq  %rsp, %rbp
8        movq  %rdi, -24(%rbp)
9        movl  $0, -4(%rbp)
10        movl  $10, -12(%rbp)
11        movq  -24(%rbp), %rax
12        movzbl(%rax), %eax
13        movb  %al, -5(%rbp)
14        jmp    .L2
15.L3:
16        movl  -4(%rbp), %eax   # destination must
17        imull  -12(%rbp), %eax  #    be in a
18        movl  %eax, -4(%rbp)   #    register
19        andb  $15, -5(%rbp)
20        movzbl-5(%rbp), %eax
21        addl  %eax, -4(%rbp)
22        addq  $1, -24(%rbp)
23        movq  -24(%rbp), %rax
24        movzbl(%rax), %eax
25        movb  %al, -5(%rbp)
26.L2:
27        cmpb  $0, -5(%rbp)
28        jne    .L3
29        movl  -4(%rbp), %eax
30        popq  %rbp
31        ret
32        .size  decToUInt, .-decToUInt
33        .ident"GCC: (Ubuntu/Linaro 4.7.0-7ubuntu3) 4.7.0"
34        .section      .note.GNU-stack,"",@progbits

Listing 12.8: Convert decimal text string to int (gcc assembly language).

Recall that the destination must be in a register. The compiler has chosen eax in this case. The value to be multiplied must be loaded from its memory location into the register, multiplied, then stored back into memory:

16        movl  -4(%rbp), %eax   # destination must
17        imull  -12(%rbp), %eax  #    be in eax
18        movl  %eax, -4(%rbp)   #    register

It may appear that the compiler has made an error here. Since both the multiplier and the multiplicand are 32-bit values, the product can be 64 bits wide. However, the compiler has chosen code that assumes the product will be no wider than 32 bits. This can lead to arithmetic errors when multiplying large integers, but according to the C programming language standard [10] this is acceptable:

A computation involving unsigned operands can never overﬂow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.

In Listing 12.9 the programmer has chosen unsigned multiplication, mull.

1# decToUInt.s
2# Converts decimal character string to unsigned int.
3# Bob Plantz - 15 June 2009
4
5# Calling sequence:
6#       rdi <- address of decimal string to be converted
7#       call    decToUInt
8# returns 32-bit int represented by the decimal string
9
10# Useful constants
11        .equ    NUMERALMASK,0xf
12        .equ    DECIMAL,10
13# Stack frame, showing local variables and arguments
14        .equ    accumulator,-8
15        .equ    current,-1
16        .equ    localSize,-16
17
18        .text
19        .globl  decToUInt
20        .type   decToUInt, @function
21decToUInt:
22        pushq   %rbp             # save base pointer
23        movq    %rsp, %rbp       # new base pointer
24        addq    $localSize, %rsp # local vars. and arg.
25
26        movl    $0, accumulator(%rbp)    # accumulator = 0;
27loop:
28        movb    (%rdi), %sil     # load character
29        cmpb    $0, %sil         # at end yet?
30        je      done             # yes, all done
31
32        movl    $DECIMAL, %eax   # multiplier
33        mull    accumulator(%rbp) # accumulator * 10
34        movl    %eax, accumulator(%rbp) # update accumulator
35
36        andl    $NUMERALMASK, %esi  # char -> int
37        addl    %esi, accumulator(%rbp) # add the digit
38        incq    %rdi             # stringPtr++;
39        jmp     loop             # and check for end
40done:
41        movl    accumulator(%rbp), %eax  # return accumulator;
42        movq    %rbp, %rsp      # restore stack pointer
43        popq    %rbp            # restore base pointer
44        ret                     # back to caller

Listing 12.9: Convert decimal text string to int (programmer assembly language).

And since this is a leaf function, the register used to pass the address of the text string (rdi) is simply used as the pointer variable rather than allocate a register save area in the stack frame:

27loop:
28        movb    (%rdi), %sil     # load character
29        cmpb    $0, %sil         # at end yet?
30        je      done             # yes, all done
31
32        movl    $DECIMAL, %eax   # multiplier
33        mull    accumulator(%rbp) # accumulator * 10
34        movl    %eax, accumulator(%rbp) # update accumulator
35
36        andl    $NUMERALMASK, %esi  # char -> int
37        addl    %esi, accumulator(%rbp) # add the digit
38        incq    %rdi             # stringPtr++;
39        jmp     loop             # and check for end

This is safe because no other functions are called within this loop. Of course, the programmer must be careful that the pointer variable (rdi) is not changed unintentionally.

The discussion of multiplication here highlights a problem with C — if an arithmetic operation produces a result that is too large to ﬁt into the allocated data type, the most signiﬁcant portion of the result is lost. It is important to realize that this loss of information can occur in intermediate results when executing even simple arithmetic expressions. It is therefore VERY IMPORTANT that you analyze each step of arithmetic operations with the range of values that is possible at that step in order to ensure that the result does not overﬂow the allocated data type. This is one more place where your knowledge of assembly language can help you to write better programs in C/C++.

12.4 Division

Division poses a diﬀerent problem. In general, the quotient will not be larger than the dividend (except when attempting to divide by zero). Division is also complicated by the existence of a remainder. The divide instruction starts with a dividend that is twice as wide as the divisor. Both the quotient and remainder are the same width as the divisor. The unsigned division instruction is:

		divs	source

where s denotes the size of the operands:

s	meaning	number of bits
b	byte	8
w	word	16
l	longword	32
q	quadword	64

Intel® Syntax		div	source

The source operand speciﬁes the divisor. It can be either a register or a memory location. Table 12.3 shows how to set up the registers with the dividend and where the quotient and remainder will be located after the unsigned division instruction, div, is executed. Notice that the quotient is the C ‘/’ operation, and the remainder is the ‘%’ operation.

Operand	High-Order	Low-Order
Size	Dividend	Dividend	Quotient	Remainder

8 bits	ah	al	al	ah
16 bits	dx	ax	ax	dx
32 bits	edx	eax	eax	edx
64 bits	rdx	rax	rax	rdx

Table 12.3: Register usage for the div instruction.

Notice that these instructions do not explicitly specify any operands, but they may change the rax and rdx registers. They do not aﬀect the condition codes in the rflags register.

For example, let’s see how the computation 93 ÷ 19 = 4 with remainder 17 looks in 8-bit, 16-bit, and 32-bit values. First, note that:

19₁₀
= 0001 0011₂

= 13₁₆

(12.4)

and

93₁₀
= 0101 1101₂

= 5d₁₆

(12.5)

Now if we declare the constants:

   byteDivisor:
           .byte   19
   wordDivisor:
           .word   19
   longDivisor:
           .long   19

These declarations cause the assembler to do the following:

byteDivisor → allocate one byte of memory and set the bit pattern of the byte to 0x13.
wordDivisor → allocate two bytes of memory and set the bit pattern of those two bytes to 0x0013.
longDivisor → allocate four bytes of memory and set the bit pattern of those four bytes to 0x00000013.

First, consider 8-bit division. If eax contains the bit pattern 0x????005d, then

divb byteDivisor

changes eax such that it contains 0x????1104. Notice that ah had to be set to 0 before executing divb even though the dividend ﬁts into one byte. That’s because the divb instruction starts with the ah:al pair as a 16-bit number. We also see that after executing the instruction, ax contains what appears to be a much larger number as a result of the division. Of course, we no longer consider ax, but al (the quotient) and ah (the remainder) as two separate numbers.

Next, consider 16-bit division. If eax contains 0x????005d and edx 0x????0000, then

divw wordDivisor

changes eax to contain 0x????0004 and edx to contain 0x????0011. You may wonder why the divw instruction does not start with the 32-bit dividend in eax. This is for backward compatibility — Intel processors prior to the 80386 had only 16-bit registers.

Finally, 32-bit division. If eax contains 0x0000005d and edx 0x00000000, then

divl longDivisor

changes eax to contain 0x00000004 and edx to contain 0x00000011. Again, we see that the entire edx register must be ﬁlled with zeros before executing the divl instruction, even though the dividend ﬁts within two bytes.

One of the more common errors with division occurs when performing repeated division of a number. Since the ﬁrst division places the remainder in the area occupied by the high-order portion of the dividend, you must remember to set that area to the appropriate value before dividing again.

The signed division instruction is:

		idivs	source

where s denotes the size of the operand:

s	meaning	number of bits
b	byte	8
w	word	16
l	longword	32
q	quadword	64

Intel® Syntax		idiv	source

Unlike the signed multiply instruction, signed divide only has one form, which is the same as unsigned divide. That is, the divisor is in the source operand, and the dividend is set up in the rax and rdx registers as shown in Table 12.3.

There is a nice set of instructions that set up the dividend in the rdx and rax registers, using the contents of the rax register, for signed division. These are shown in Table 12.4. The notation dx:ax, edx:eax, and rdx:rax, means that the high-order portion of the doubled, sign extended, value is placed in the respective portion of the rdx register.

AT&T syntax	Intel® syntax	start	result

cbtw	cbw	byte in al	word in ax
cwtd	cwd	word in ax	long in dx:ax
cltd	cdq	lonq in eax	quad in edx:eax
cqto	cqo	quad in rax	octuple in rdx:rax

Table 12.4: Instructions to set up the dividend register(s). These instructions do not specify any operands but may change the rax and rdx registers. The cbtw instruction is a repeat from Table 10.5.

We can see the diﬀerence between signed and unsigned division by dividing a 32-bit value by a 16-bit value. Given the declaration:

.section .rodata
divisor: .word 100

and loading the 32-bit dividend 32768 into the dx:ax register pair (using hexadecimal):

movw $0, %dx
movw $0x8000, %ax

produces the conditions

dx 0x0 0
ax 0x8000 -32768

just before the division. (Recall that gdb displays the signed decimal values.) When we divide 32768 by 100 we expect to get 327 with a remainder of 68. Indeed, unsigned division:

divw divisor

gives:

dx 0x44 68
ax 0x147 327

The quotient is in ax, and the remainder is in dx.

With signed integers we need to sign extend when setting up the dx:ax register pair:

movw $0x8000, %ax
cwtd

which sets up the registers:

dx 0xffff -1
ax 0x8000 -32768

When we use the signed divide instruction, we are dividing -32767 (= 8000₁₆) by 100. We expect to get -327 with a remainder of -68. Indeed, signed division:

idivw divisor

yields:

dx 0xffbc -68
ax 0xfeb9 -327

Again, the quotient is in ax, and the remainder is in dx.

The “/” operation is used on line 34 in the intToUDec function shown in Listing 12.7, and the “%” operation is used on line 35.

1/*
2 * intToUDec.c
3 *
4 * Converts an int to corresponding unsigned text
5 * string representation.
6 *
7 * input:
8 *    32-bit int
9 *    pointer to at least 10-char array
10 * output:
11 *    null-terminated string in array
12 *
13 * Bob Plantz - 15 June 2009
14 */
15
16#include "intToUDec.h"
17#define TOASCII 0x30
18
19void intToUDec(char *decString, int theInt)
20{
21    unsigned int x = theInt;
22    int base = 10;
23    char reverseArray[10];
24    char digit;
25    char *ptr = reverseArray;
26
27    *ptr = ’\0’;     // start with NUL
28    ptr++;
29    if (x == 0)      // zero case
30    {
31        *ptr = ’0’;
32        ptr++;
33    }
34    while (x > 0)    // create the string
35    {
36        digit = x % base;
37        x = x / base;
38        digit = TOASCII | digit;
39        *ptr = digit;
40        ptr++;
41    }
42    do               // reverse the string
43    {
44        ptr--;
45        *decString = *ptr;
46        decString++;
47    } while (*ptr != ’\0’);
48}

Listing 12.10: Convert unsigned int to decimal text string (C).

The assembly language generated by gcc is shown in Listing 12.11 (with comments added).

1        .file  "intToUDec.c"
2        .text
3        .globlintToUDec
4        .type  intToUDec, @function
5intToUDec:
6        pushq  %rbp
7        movq  %rsp, %rbp
8        movq  %rdi, -40(%rbp)
9        movl  %esi, -44(%rbp)
10        movl  -44(%rbp), %eax
11        movl  %eax, -4(%rbp)
12        movl  $10, -20(%rbp)
13        leaq  -32(%rbp), %rax
14        movq  %rax, -16(%rbp)
15        movq  -16(%rbp), %rax
16        movb  $0, (%rax)
17        addq  $1, -16(%rbp)
18        cmpl  $0, -4(%rbp)
19        jne    .L3
20        movq  -16(%rbp), %rax
21        movb  $48, (%rax)
22        addq  $1, -16(%rbp)
23        jmp    .L3
24.L4:
25        movl  -20(%rbp), %ecx   # load base value (10)
26        movl  -4(%rbp), %eax    # load the integer
27        movl  $0, %edx          #   edx forms part of dividend
28        divl  %ecx              # this is for % operation
29        movl  %edx, %eax        # get remainder
30        movb  %al, -21(%rbp)    # type cast to char and store
31        movl  -20(%rbp), %edx   # load base value (10)
32        movl  %edx, -48(%rbp)   #    and store it
33        movl  -4(%rbp), %eax    # load the integer
34        movl  $0, %edx          #   edx forms part of dividend
35        divl  -48(%rbp)         # this is for / operation
36        movl  %eax, -4(%rbp)    # x = x / base;
37        orb    $48, -21(%rbp)    # digit = TOASCII | digit;
38        movq  -16(%rbp), %rax   # load ptr for dereference
39        movzbl-21(%rbp), %edx    # load digit
40        movb  %dl, (%rax)       # *ptr = digit;
41        addq  $1, -16(%rbp)     # ptr++;
42.L3:
43        cmpl  $0, -4(%rbp)
44        jne    .L4
45.L5:
46        subq  $1, -16(%rbp)
47        movq  -16(%rbp), %rax
48        movzbl(%rax), %edx
49        movq  -40(%rbp), %rax
50        movb  %dl, (%rax)
51        addq  $1, -40(%rbp)
52        movq  -16(%rbp), %rax
53        movzbl(%rax), %eax
54        testb  %al, %al
55        jne    .L5
56        popq  %rbp
57        ret
58        .size  intToUDec, .-intToUDec
59        .ident"GCC: (Ubuntu/Linaro 4.7.0-7ubuntu3) 4.7.0"
60        .section      .note.GNU-stack,"",@progbits

Listing 12.11: Convert unsigned int to decimal text string (gcc assembly language).

Compare the code sequence from lines 25 – 30:

25        movl  -20(%rbp), %ecx   # load base value (10)
26        movl  -4(%rbp), %eax    # load the integer
27        movl  $0, %edx          #   edx forms part of dividend
28        divl  %ecx              # this is for % operation
29        movl  %edx, %eax        # get remainder
30        movb  %al, -21(%rbp)    # type cast to char and store
31        movb  %al, -9(%rbp)     # type cast to char and store

with that from lines 31 – 36:

31        movl  -20(%rbp), %edx   # load base value (10)
32        movl  %edx, -48(%rbp)   #    and store it
33        movl  -4(%rbp), %eax    # load the integer
34        movl  $0, %edx          #   edx forms part of dividend
35        divl  -48(%rbp)         # this is for / operation
36        movl  %eax, -4(%rbp)    # x = x / base;

Even though the divl instruction produces both the quotient (“/” operation) and remainder (“%” operation), the compiler uses almost the same code sequence twice, once for each operation.

In Listing 12.12 the programmer has chosen to retrieve both the quotient and the remainder from one execution of the divl instruction.

1# intToUDec.s
2# Converts unsigned int to corresponding unsigned decimal string
3# Bob Plantz - 15 June 2009
4# Bob Plantz - 10 November 2013 - use cltd for division
5
6# Calling sequence
7#       esi <- value of the int
8#       rdi <- address of place to store string
9#       call intToUDec
10# Useful constant
11        .equ    asciiNumeral,0x30
12# Stack frame
13        .equ    reverseArray,-12
14        .equ    localSize,-16
15# Read only data
16        .section .rodata
17ten:    .long   10
18# Code
19        .text
20        .globl  intToUDec
21        .type   intToUDec, @function
22intToUDec:
23        pushq   %rbp            # save caller’s base ptr
24        movq    %rsp, %rbp      # our stack frame
25        addq    $localSize, %rsp         # local char array
26
27        movl    %esi, %eax      # eax used for division
28        leaq    reverseArray(%rbp), %r8  # ptr to local array
29        movb    $0, (%r8)       # store NUL
30        incq    %r8             # increment pointer
31
32        cmpl    $0, %eax        # integer == 0?
33        jne     stringLup       # no, start on the string
34        movb    $’0’, (%r8)     # yes, this is the string
35        incq    %r8             # for reverse copy
36stringLup:
37        cmpl    $0, %eax        # integer > 0?
38        jbe     copyLup         # no, do reverse copy
39        cltd                    # yes, set up rdx reg.
40        divl    ten             # divide by ten
41        orb     $asciiNumeral, %dl # convert to ascii
42        movb    %dl, (%r8)      # store character
43        incq    %r8             # increment pointer
44        jmp     stringLup       # check at top
45
46copyLup:
47        decq    %r8             # decrement pointer
48        movb    (%r8), %dl      # get char
49        movb    %dl, (%rdi)     # store it
50        incq    %rdi            # increment storage pointer
51        cmpb    $0, %dl         # NUL character?
52        jne     copyLup         # no, keep copying
53
54        movq    %rbp, %rsp      # delete local vars.
55        popq    %rbp            # restore caller base ptr
56        ret

Listing 12.12: Convert unsigned int to decimal text string (programmer assembly language).

On line 38 the high-order 32 bits of the quotient (edx register) are set to 0.

38        movl    $0, %edx        # yes, high-order = 0
39        divl    ten             # divide by ten
40        orb     $asciiNumeral, %dl # convert to ascii
41        movb    %dl, (%r8)      # store character

The division on line 39 leaves “x / base” in the eax register for the next execution of the loop body. It also places “x % base” in the edx register. We know that this value is in the range 0 – 9 and thus ﬁts entirely within the dl portion of the register. Lines 40 and 41 show how the value is converted to its ASCII equivalent and stored in the local char array.

As in the decToUInt function (Listing 12.9), since this is a leaf function, the register used to pass the address of the text string (rdi) is simply used as the pointer variable rather than allocate a register save area in the stack frame. Similarly, the eax register is used as the local “x” variable.

12.5 Negating Signed ints

For dealing with signed numbers, the x86-64 architecture provides an instruction that will perform the two’s complement operation. That is, this instruction will negate an integer that is stored in the two’s complement notation. The mnemonic for the instruction is neg.

		negs	source

where s denotes the size of the operand:

s	meaning	number of bits
b	byte	8
w	word	16
l	longword	32
q	quadword	64

Intel® Syntax		neg	source

neg performs a two’s complement operation on the value in the operand, which can be either a memory location or a register. Any of the addressing modes that we have covered can be used to specify a memory location.

12.6 Instructions Introduced Thus Far

This summary shows the assembly language instructions introduced thus far in the book. The page number where the instruction is explained in more detail, which may be in a subsequent chapter, is also given. This book provides only an introduction to the usage of each instruction. You need to consult the manuals ([2] – [6], [14] – [18]) in order to learn all the possible uses of the instructions.

12.6.1 Instructions

data movement:
opcode	source	destination	action	page

cbtw			convert byte to word, al → ax	696

cwtl			convert word to long, ax → eax	696

cltq			convert long to quad, eax → rax	696

cwtd			convert word to long, ax → dx:ax	786

cltd			convert long to quad, eax → edx:eax	786

cqto			convert quad to octuple, rax → rdx:rax	786

cmovcc	%reg/mem	%reg	conditional move	706

movs	$imm/%reg	%reg/mem	move	506

movs	mem	%reg	move	506

movsss	$imm/%reg	%reg/mem	move, sign extend	693

movzss	$imm/%reg	%reg/mem	move, zero extend	693

popw		%reg/mem	pop from stack	566

pushw	$imm/%reg/mem		push onto stack	566


s = b, w, l, q; w = l, q; cc = condition codes

arithmetic/logic:
opcode	source	destination	action	page

adds	$imm/%reg	%reg/mem	add	607

adds	mem	%reg	add	607

ands	$imm/%reg	%reg/mem	bit-wise and	747

ands	mem	%reg	bit-wise and	747

cmps	$imm/%reg	%reg/mem	compare	676

cmps	mem	%reg	compare	676

decs	%reg/mem		decrement	699

divs	%reg/mem		unsigned divide	777

idivs	%reg/mem		signed divide	784

imuls	%reg/mem		signed multiply	775

incs	%reg/mem		increment	698

leaw	mem	%reg	load eﬀective address	579

muls	%reg/mem		unsigned multiply	769

negs	%reg/mem		negate	789

ors	$imm/%reg	%reg/mem	bit-wise inclusive or	747

ors	mem	%reg	bit-wise inclusive or	747

sals	$imm/%cl	%reg/mem	shift arithmetic left	756

sars	$imm/%cl	%reg/mem	shift arithmetic right	751

shls	$imm/%cl	%reg/mem	shift left	756

shrs	$imm/%cl	%reg/mem	shift right	751

subs	$imm/%reg	%reg/mem	subtract	612

subs	mem	%reg	subtract	612

tests	$imm/%reg	%reg/mem	test bits	676

tests	mem	%reg	test bits	676

xors	$imm/%reg	%reg/mem	bit-wise exclusive or	747

xors	mem	%reg	bit-wise exclusive or	747


s = b, w, l, q; w = l, q

program ﬂow control:
opcode	location	action	page

call	label	call function	546

ja	label	jump above (unsigned)	683

jae	label	jump above/equal (unsigned)	683

jb	label	jump below (unsigned)	683

jbe	label	jump below/equal (unsigned)	683

je	label	jump equal	679

jg	label	jump greater than (signed)	686

jge	label	jump greater than/equal (signed)	686

jl	label	jump less than (signed)	686

jle	label	jump less than/equal (signed)	686

jmp	label	jump	691

jne	label	jump not equal	679

jno	label	jump no overﬂow	679

jcc	label	jump on condition codes	679

leave		undo stack frame	580

ret		return from function	583

syscall		call kernel function	587


cc = condition codes

12.6.2 Addressing Modes

__________________________________________________________

register direct:	The data value is located in a CPU register.
	syntax: name of the register with a “%” preﬁx.
	example: movl %eax, %ebx

immediate data:	The data value is located immediately after the instruction. Source operand only.
	syntax: data value with a “$” preﬁx.
	example: movl $0xabcd1234, %ebx

base register plus oﬀset:	The data value is located in memory. The address of the memory location is the sum of a value in a base register plus an oﬀset value.
	syntax: use the name of the register with parentheses around the name and the oﬀset value immediately before the left parenthesis.
	example: movl $0xaabbccdd, 12(%eax)

rip-relative:	The target is a memory address determined by adding an oﬀset to the current address in the rip register.
	syntax: a programmer-deﬁned label
	example: je somePlace

12.7 Exercises

12-1

(§12.2) Write a program in assembly language that

a): prompts the user to enter a number in binary,
b): reads the user input into a char array,
c): converts the string of characters in the char array into a decimal integer stored in a local int variable, and
d): calls printf to display the int.

Your program should use the writeStr function from Exercise 11-3 to display the user prompt. And it should use the readStr function from Exercise 11-4 or 11-6 to read the user’s input.

Your program should read the user’s input into the local char array, then perform the conversion using the stored characters. Do not do the conversion as the characters are entered by the user.

Your program does not need to check for user errors. You can assume that the user will enter only ones and zeros. And you can assume that the user will not enter more than 32 bits. (Be careful when you test your program.)

12-2

(§12.2) Write a program in assembly language that allows the user to enter a decimal integer then displays it in binary.

Your program should convert the decimal integer into the corresponding C-style text string of ones and zeros, then use the writeStr function from Exercise 11-3 to display the text string.

This program will require some careful planning in order to get the bits to print in the correct order.

12-3

(§12.3) Write a function, mul16, in assembly language that takes two 16-bit integers as arguments and returns the 32-bit product of the argument. Write a main driver function to test mul16. You may use printf and scanf in the main function for the user interface.

Hint: Notice that most of the numbers in this problem are 16-bit unsigned integers. Read the man pages for printf and scanf. In particular, the ”u” ﬂag character is used to indicate a short (16-bit) int.

12-4

(§12.4) Write a function, div32, in assembly language that implements the C / operation. The function takes two 32-bit integers as arguments and returns the 32-bit quotient of the ﬁrst argument divided by the second. Write a main driver function to test div32. You may use printf and scanf in the main function for the user interface.

12-5

(§12.4) Write a function, mod32, in assembly language that implements the C % operation. The function takes two 32-bit integers as arguments and returns the 32-bit quotient of the ﬁrst argument divided by the second. Write a main driver function to test mod32. You may use printf and scanf in the main function for the user interface.

12-6

(§12.4) Write a function in assembly language, decimal2uint, that takes two arguments: a pointer to a char, and a pointer to an unsigned int.

int decimal2uint(char *, unsigned int *);

The function assumes that the ﬁrst argument points to a C-style text string that contains only numeric characters representing an unsigned decimal integer. It computes the binary value of the integer and stores the result at the location where the second argument points. It returns zero.

Write a program that demonstrates the correctness of decimal2uint. Your program will allocate a char array, call readStr (from Exercise 11-4 or 11-5) to get a decimal integer from the user, and call decimal2uint to convert the text string to binary format. Then it adds an integer to the user’s input integer and uses printf to display the result.

Hint: Start with the program from Exercise 12-1. Rewrite it so that the conversion from the text string to the binary number is performed by a function. Then modify the function so that it performs a decimal conversion instead of binary.

12-7

(§12.4) Write a function in assembly language, uint2dec, that takes two arguments: a pointer to a char, and an unsigned int.

int uint2dec(char *, unsigned int);

The function assumes that the ﬁrst argument points to a char array that is large enough to hold a text string that represents the largest possible 32-bit unsigned integer in decimal. It computes the characters that represent the integer (the second argument) and stores this representation as a C-style text string where the ﬁrst argument points. It returns zero.

Write an assembly language program that demonstrates the correctness of uint2dec. Your program will allocate one char array, call readStr (from Exercise 11-4) or 11-6 to get a decimal integer from the user, and call decimal2uint (from Exercise 12-6) to convert the text string to binary format. It should add a constant integer to this converted value. Then it calls uint2dec to convert the sum to its corresponding text string, storing the string in the char array.

Hint: Start with the program from Exercise 10-6. Rewrite it so that the conversion from the binary number to the text string is performed by a function. Then modify the function so that it performs a decimal conversion instead of binary.

12-8

(§12.3) Modify the program in Exercise 12-7 so that it deals with signed ints. Hint: Write the function decimal2sint, which will call decimal2uint, and write the function sint2dec, which will call uint2dec.

[next] [prev] [prev-tail] [front] [up]

Chapter 12Bit Operations; Multiplication and Division

12.1 Logical Operators

12.2 Shifting Bits

12.3 Multiplication

12.4 Division

12.5 Negating Signed ints

12.6 Instructions Introduced Thus Far

12.6.1 Instructions

12.6.2 Addressing Modes

12.7 Exercises

Chapter 12
Bit Operations; Multiplication and Division