Chapter 6
Central Processing Unit

In this chapter we move on to consider a programmer’s view of the Central Processing Unit (CPU) and how it interacts with memory. X86-64 CPUs can be used with either a 32-bit or a 64-bit operating system. The CPU features available to the programmer depend on the operating mode of the CPU. The modes of interest to the applications programmer are summarized in Table 6.1. With a 32-bit operating system, the CPU behaves essentially the same as an x86-32 CPU.

Mode Submode Operating Default Default
System Address (bits) int (bits)

64-bit 64

IA-32e or 64-bit 32 32

Long Compatibility 16 16

32 32


Legacy Virtual-8086 32-bit 16 16

Real 16-bit

Table 6.1: X86-64 operating modes. Intel manuals use the term “IA-32e” and AMD manuals use “Long” when running a 64-bit operating system. Both manuals use the same terminology for the two sub-modes. Adapted from Table 1-1 in [2].

In this book we describe the view of the CPU when running a 64-bit operating system. Intel manuals call this the IA-32e mode and the AMD manuals call it the long mode. The CPU can run in one of two sub-modes under a 64-bit operating system. Both manuals use the same terminology for the two sub-modes.

The two modes cannot be mixed in the same program.

The discussion in this chapter focuses on the 64-bit mode. We will also point out the differences of the compatibility mode, which we will refer to as the 32-bit mode.

6.1 CPU Overview

An overall block diagram of a typical CPU is shown in Figure 6.1.


Figure 6.1: CPU block diagram. The CPU communicates with the Memory and I/O subsystems via the Address, Data, and Control buses. See Figure 1.1 (page 9).

The subsystems are connected together through internal buses. Keep in mind that this is a highly simplified diagram. Actual CPUs are much more complicated, but the general concepts discussed in this chapter apply to all of them.

We will now describe briefly each of the subsystems in Figure 6.1. The descriptions provided here are generic and apply to most CPUs. Components that are of particular interest to a programmer are described within the context of the x86 ISA later in this chapter.

Bus Interface:
This is the means for the CPU to communicate with the rest of the computer system — Memory and I/O Devices. It contains circuitry to place addresses on the address bus, read and write data on the data bus, and read and write signals on the control bus. The bus interface on many CPUs interfaces with external bus control units that in turn interface with memory and with different types of I/O buses, e.g., SATA, PCI-E, etc. The external control units are transparent to the programmer.
L1 Cache Memory:
Although it could be argued that this is not a part of the CPU, most modern CPUs include very fast cache memory on the CPU chip. As you will see in Section 6.4, each instruction must be fetched from memory. The CPU can execute instructions much faster than they can be fetched. The interface with memory makes it more efficient to fetch several instructions at one time, storing them in L1 cache where the CPU has very fast access to them. Many modern CPUs use two L1 cache memories organized in a Harvard architecture — one for instructions, the other for data. (See Section 1.2, page 10.) Its use is generally transparent to an applications programmer.
A register is a group of bits that is intended to be used as a variable in a program. Compilers and assemblers have names for each register. Almost all arithmetic and logic operations and data movement operations involve at least one register. See Section 6.2 for more details.
Instruction Pointer:
This is a 64-bit register that always contains the address of the next instruction to be executed. See Section 6.2 for more details.
Instruction Register:
This register contains the instruction that is currently being executed. Its bit pattern determines what the Control Unit is causing the CPU to do. Once that action has been completed, the bit pattern in the instruction register can be changed, and the CPU will perform the operation specified by this next bit pattern.
Most modern CPUs use an instruction queue that is built into the chip. Several instructions are waiting in the queue, ready to be executed. Separate electronic circuitry keeps the instruction queue full while the regular control unit is executing the instructions. But this is simply an implementation detail that allows the control unit to run faster. The essence of how the control unit executes a program is represented by the single instruction register model.
Control Unit:
The bits in the Instruction Register are decoded in the Control Unit. It generates the signals that control the other subsystems in the CPU to carry out the action(s) specified by the instruction. It is typically implemented as a finite-state machine and contains Decoders (Section 5.1.3), Multiplexers (Section 5.1.4), and other logic components.
Arithmetic Logic Unit (ALU):
A device that performs arithmetic and logic operations on groups of bits. The logic circuitry to perform addition is discussed in Section 5.1.1.
Flags Register:
Each operation performed by the ALU results in various conditions that must be recorded. For example, addition can produce a carry. One bit in the Flags Register will be set to either zero (no carry) or one (carry) after the ALU has completed any operation that may produce a carry.

We will now look at how the logic circuits discussed in Chapter 4 can be used to implement some of these subsystems.

6.2 CPU Registers

A portion of the memory in the CPU is organized into registers. Machine instructions access CPU registers by their addresses, just as memory contents are accessed. Of course, the register addresses are not placed on the address bus since the registers are in the CPU. The difference from a programmer’s point of view is that the assembler has predefined names for the registers, whereas the programmer creates symbolic names for memory addresses. Thus in each program that you write in assembly language:

The x86-64 architecture registers are shown in Table 6.2.

Basic Programming Registers
16 64-bit General purpose (GPRs)
1 64-bit Flags
1 64-bit Instruction pointer
6 16-bit Segment
Floating Point Registers
8 80-bit Floating point data
1 16-bit Control
1 16-bit Status
1 16-bit Tag
1 11-bit Opcode
1 64-bit FPU Instruction Pointer
1 64-bit FPU Data Pointer
MMX Registers
8 64-bit MMX
XMM Registers
16 128-bit XMM
1 32-bit MXCSR
Model-Specific Registers (MSRs)
These vary depending on the specific
hardware implementation. They are only
accessible to the operating system.

Table 6.2: The x86-64 registers. Not all the registers shown here are discussed in this chapter. Some are discussed in subsequent chapters that deal with the related topic.

Each bit in each register is numbered from right to left, beginning with zero. So the right-most bit is number 0, the next one to the left number 1, etc. Since there are 64 bits in each register, the left-most bit is number 63.

The general purpose registers can be accessed in the following ways:

The assembler uses a different name for each group of bits in a register. The assembler names for the groups of the bits are given in Table 6.3. In 64-bit mode, writing to an 8-bit or 16-bit portion of a register does not affect the other 56 or 48 bits in the register. However, when writing to the low-order 32 bits, the high-order 32 bits are set to zero.

bits 63-0

bits 31-0

bits 15-0

bits 15-8

bits 7-0





































































Table 6.3: Assembly language names for portions of the general-purpose CPU registers. Programs running in 32-bit mode can only use the registers above the line in this table. 64-bit mode allows the use of all the registers. The ah, bh, ch, and dh registers cannot be used with any of the (8-bit) registers below the line.

A pictorial representation of the naming of each portion of the general-purpose registers is shown in Figure 6.2.


Figure 6.2: Graphical representation of general purpose registers. The three shown here are representative of the pattern of all the general purpose registers.

The 8-bit register portions ah, bh, ch, and dh are a holdover from the Intel®; 8086/8088 architecture. It had four 16-bit registers, ax, bx, cx, and dx. The low-order bytes were named al, bl, cl, and dl and the high-order bytes named ah, bh, ch, and dh. Access to these registers has been maintained in 32-bit mode for backward compatibility but is limited in 64-bit mode. Access to the 8-bit low-order portions of the rsi, rdi, rsp, and rbp registers was added along with the move to 64 bits in the x86-64 architecture but cannot be used in the same instruction with the 8-bit register portions of the xh registers.

When using less than the entire 64 bits in a register, it is generally bad to write code that assumes the remaining portion is in any particular state. Such code is difficult to read and leads to errors during its maintenance phase.

Although these are called “general purpose,” the descriptions in Table 6.4 show that some of them have some special significance, depending upon how they are used. (Some of the descriptions may not make sense to you at this point.)


Special usage

Called function preserves contents

1st function return value.


Optional base pointer.


Pass 4th argument to function.


Pass 3rd argument to function; 2nd function return value.


Stack pointer.


Optional frame pointer.


Pass 1st argument to function.


Pass 2nd argument to function.


Pass 5th argument to function.


Pass 6th argument to function.


Pass function’s static chain pointer.






Table 6.4: General purpose registers.

In this book, we will use the rax, rdx, rdi, esi, and r8 r15 registers for general-purpose storage. They will be used just like variables in a high-level language. Usage of the rsp and rbp registers follows a very strict discipline. You should not use either of them for your assembly language programs until you understand how to use them.

The instruction pointer register, rip1 , always points to the next instruction to be executed. As explained in Section 6.4 (page 481), every time an instruction is fetched, the rip register is automatically incremented by the control unit to contain the address of the next instruction. Thus, the rip register is never directly accessed by the programmer. On the other hand, every instruction that is executed affects the contents of the rip register. Thus, the rip register is not a general-purpose register, but it guides the flow of the entire program.

Most arithmetic and logical operations affect the condition codes in the rflags register. The bits that are affected are shown in Figure 6.3.


Figure 6.3: Condition codes portion of the rflags register. The high-order 32 bits (32 – 63) are reserved for other use and are not shown here. Neither are bits 12 – 31, which are for system flags (see [3]).

The names of the condition codes are:

OF Overflow Flag
SF Sign Flag
ZF Zero Flag
AF Auxiliary carry or Adjust Flag
PF Parity Flag
CF Carry Flag

The OF, SF, ZF, and CF are described at appropriate places in this book. See [3] and [14] for descriptions of the other flags.

Two other registers are very important in a program. The rsp register is used as a stack pointer, as will be discussed in Section 8.2 (page 553). The rbp register is typically used as a base pointer; it will be discussed in Section 8.4 (page 576).

The “e” prefix on the 32-bit portion of each register name comes from the history of the x86 architecture. The introduction of the 80386 in 1986 brought an increase of register size from 16 bits to 32 bits. There were no new registers. The old ones were simply “extended.”

6.3 CPU Interaction with Memory and I/O

The connections between the CPU and Memory are shown in Figure 6.4. This figure also includes the I/O (input and output) subsystem. The I/O system will be discussed in subsequent chapters. The control unit is connected to memory by three buses:

a communication path between two or more devices.
Several devices can be connected to one bus, but only two devices can be communicating over the bus at one time.


Figure 6.4: Subsystems of a computer. The CPU, Memory, and I/O subsystems communicate with one another via the three busses. (Repeat of Figure 1.1.)

As an example of how data can be stored in memory, let us imagine that we have some data in one of the CPU registers. Storing this data in memory is effected by setting the states of a group of bits in memory to match those in the CPU register. The control unit can be programmed to do this by

  1. sending the memory address on the address bus,
  2. sending a copy of the register bit states on the data bus, then
  3. sending a “write” signal on the control bus.

For example, if the eight bits in memory at address 0x7fffd9a43cef are in the state:

     0x7fffd9a43cef: b7

the al register in the CPU is in the state:

     %al: e2

and the control unit is programmed to store this value at location 0x7fffd9a43cef, the control unit then

  1. places 0x7fffd9a43cef on the address bus,
  2. places the bit pattern e2 on the data bus, and
  3. places a “write” signal on the control bus.

Then the bits at memory location 0x7fffd9a43cef will be changed to the state:

     0x7fffd9a43cef: e2

Important. When the state of any bit in memory or in a register is changed any previous states are lost forever. There is no way to “undo” this state change or to determine how the bit got in its current state.

6.4 Program Execution in the CPU

You may be wondering how the CPU is programmed. It contains a special register — the instruction register — whose bit pattern determines what the CPU will do. Once that action has been completed, the bit pattern in the instruction register can be changed, and the CPU will perform the operation specified by this next bit pattern.

Most modern CPUs use an instruction queue. Several instructions are waiting in the queue, ready to be executed. Separate electronic circuitry keeps the instruction queue full while the regular control unit is executing the instructions. But this is simply an implementation detail that allows the control unit to run faster. The essence of how the control unit executes a program is represented by the single instruction register model.

Since instructions are simply bit patterns, they can be stored in memory. The instruction pointer register always has the memory address of (points to) the next instruction to be executed. In order for the control unit to execute this instruction, it is copied into the instruction register.

The situation is as follows:

  1. A sequence of instructions is stored in memory.
  2. The memory address where the first instruction is located is copied to the instruction pointer.
  3. The CPU sends the address in the instruction pointer to memory on the address bus.
  4. The CPU sends a “read” signal on the control bus.
  5. Memory responds by sending a copy of the state of the bits at that memory location on the data bus, which the CPU then copies into its instruction register.
  6. The instruction pointer is automatically incremented to contain the address of the next instruction in memory.
  7. The CPU executes the instruction in the instruction register.
  8. Go to step 3.

Steps 3, 4, and 5 are called an instruction fetch. Notice that steps 38 constitute a cycle, the instruction execution cycle. It is shown graphically in Figure 6.5.


Figure 6.5: The instruction execution cycle.

This raises a couple of questions:

How do we get the instructions into memory?
The instructions for a program are stored in a file on a storage device, usually a disk. The computer system is controlled by an operating system. When you indicate to the operating system that you wish to execute a program, e.g., by double-clicking on its icon, the operating system locates a region of memory large enough to hold the instructions in the program then copies them from the file to memory. The contents in the file remain unchanged. 2
How do we create a file on the disk that contains the instructions?
This is a multi-step process using several programs that are provided for you. The programs and the files that each create are:

You may have used an integrated development environment (IDE), e.g., Microsoft®;Visual Studio®;, Eclipse, which combines all of these three programs into one package where each of the intermediate steps is performed automatically. You use the editor program to create the source file and then give the run command to the IDE. The IDE will compile the program in your source files, link the resulting object files with the necessary libraries, load the resulting executable file into memory, then start your program. In general, the intermediate object files resulting from the compilation of each source file are automatically deleted from the disk.

In this book we will explicitly perform each of these steps separately so we can learn the role of each program — editor, assembler, linker — used in preparing the application program.

6.5 Using gdb to View the CPU Registers

We will use the program in Listing 6.1 to illustrate the use of gdb to view the contents of the CPU registers. I have used the register storage class modifier to request that the compiler use a CPU register for the int* ptr variable. The register modifier is “advisory” only. See Exercise 6-3 for an example when the compiler may not be able to honor our request.

2 * gdbExample1.c 
3 * Subtracts one from user integer. 
4 * Demonstrate use of gdb to examine registers, etc. 
5 * Bob Plantz - 5 June 2009 
6 */ 
8#include <stdio.h> 
10int main(void) 
12    register int wye; 
13    int *ptr; 
14    int ex; 
16    ptr = &ex; 
17    ex = 305441741; 
18    wye = -1; 
19    printf("Enter an integer: "); 
20    scanf("%i", ptr); 
21    wye += *ptr; 
22    printf("The result is %i\n", wye); 
24    return 0; 
Listing 6.1: Simple program to illustrate the use of gdb to view CPU registers.

We introduced some gdb commands in Chapter 2. Here are some additional ones that will be used in this section:

Here is a screen shot of how I compiled the program then used gdb to control the execution of the program and observe the register contents. My typing is boldface and the session is annotated in italics. Note that you will probably see different addresses if you replicate this example on your own (Exercise 6-1).

bob$ gcc -g -O0 -Wall -fno-asynchronous-unwind-tables \
> -fno-stack-protector -o gdbExample1 gdbExample1.c

The “-g” option is required. It tells the compiler to include debugger information in the executable program.

bob$ gdb ./gdbExample1
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
Reading symbols from /home/bob/my_book_working/progs/chap06/gdbExample1...done.
(gdb) li
8 #include <stdio.h>
10 int main(void)
11 {
12     register int wye;
13     int *ptr;
14     int ex;
16     ptr = &ex;
17     ex = 305441741;
18     wye = -1;
19     printf("Enter an integer: ");
20     scanf("%i", ptr);
21     wye += *ptr;
22     printf("The result is %i∖  n", wye);
24     return 0;
25 }

The li command lists ten lines of source code. The display is centered around the current line. Since I have not started execution of this program, the display is centered around the beginning of main. The display ends with the (gdb) prompt. Pushing the return key repeats the previous command, and li is smart enough to display the next ten lines.

(gdb) br 19
Breakpoint 1 at 0x4005a9: file gdbExample1.c, line 19.
(gdb) run
Starting program: /home/bob/my_book_64/progs/chap06/gdbExample1

Breakpoint 1, main () at gdbExample1.c:19
19        printf("Enter an integer: ");

I set a breakpoint at line 19 then run the program. When line 19 is reached, the program is paused before the statement is executed, and control returns to gdb.

(gdb) print ex
$1 = 305441741
(gdb) print &ex
$2 = (int *) 0x7fffffffe044

I use the print command to view the value assigned to the ex variable and learn its memory address.

(gdb) help x
Examine memory: x/FMT ADDRESS.
ADDRESS is an expression for the memory address to examine.
FMT is a repeat count followed by a format letter and a size letter.
Format letters are o(octal), x(hex), d(decimal), u(unsigned decimal),
  t(binary), f(float), a(address), i(instruction), c(char) and s(string).
Size letters are b(byte), h(halfword), w(word), g(giant, 8 bytes).
The specified number of objects of the specified size are printed
according to the format.

Defaults for format and size letters are those previously used.
Default count is 1.  Default address is following last thing printed
with this command or "print".

The help command will provide very brief instructions on using a command. We want to display values stored in specific memory locations in various formats, and the help command provides a reminder of how to use the command.

(gdb) x/1dw 0x7fffffffe044
 0x7fffffffe044: 305441741

I verify that the value assigned to the ex variable is stored at location 0x7fffffffe044.

(gdb) x/1xw 0x7fffffffe044
0x7fff504c473c:     0x1234abcd

I examine the same integer in hexadecimal format.

 (gdb) x/4xb  0x7fffffffe044
0x7fffffffe044:     0xcd    0xab    0x34    0x12

Next, I examine all four bytes of the word, one byte at a time. In this display,

In other words, the byte-wise display appears to be backwards. This is due to the values being stored in the little endian storage scheme as explained on page 46 in Chapter 2.

(gdb) x/2xh 0x7fffffffe044
0x7fffffffe044:     0xabcd  0x1234

I also examine all four bytes of the word, two bytes at a time. In this display,

This shows how gdb displays these four bytes as though they represent two 16-bit ints stored in little endian format. (You can now see why I entered such a strange integer in this demonstration run.)

(gdb) print ptr
$3 = (int *)  0x7fffffffe044
(gdb) print &ptr
$4 = (int **) 0x7fffffffe048

Look carefully at the ptr variable. It is located at address 0x7fffffffe048 and it contains another address, 0x7fffffffe044, that is, the address of the variable ex. It is important that you learn to distinguish between a memory address and the value that is stored there, which can be another memory address. Perhaps a good way to think about this is a group of numbered mailboxes, each containing a single piece of paper that you can write a single number on. You could write a number that represents a “data” value on the paper. Or you can write the address of a mailbox on the paper. One of the jobs of a programmer is to write the program such that it interprets the number appropriately — either a data value or an address.

(gdb) print wye
$5 = -1
(gdb) print &wye
Address requested for identifier "wye" which is in register $rbx

The compiler has honored our request and allocated a register for the wye variable. Registers are located in the CPU and do not have memory addresses, so gdb cannot print the address. We will need to use the i r command to view the register contents.

(gdb) i r
rax            0x7fffffffe044 140737488347204
rbx            0xffffffff 4294967295
rcx            0x4005e0 4195808
rdx            0x7fffffffe158 140737488347480
rsi            0x7fffffffe148 140737488347464
rdi            0x1 1
rbp            0x7fffffffe060 0x7fffffffe060
rsp            0x7fffffffe040 0x7fffffffe040
r8             0x400670 4195952
r9             0x7ffff7de9740 140737351948096
r10            0x7fffffffdec0 140737488346816
r11            0x7ffff7a3e680 140737348101760
r12            0x400480 4195456
r13            0x7fffffffe140 140737488347456
r14            0x0 0
r15            0x0 0
rip            0x4005a9 0x4005a9 <main+29>
eflags         0x202 [ IF ]
cs             0x33 51
ss             0x2b 43
ds             0x0 0
es             0x0 0
fs             0x0 0
gs             0x0 0

The i r command displays the current contents of the CPU registers. The first column is the name of the register. The second shows the current bit pattern in the register, in hexadecimal. Notice that leading zeros are not displayed. The third column shows some the register contents in 64-bit signed decimal. The registers that always hold addresses are also shown in hexadecimal in the third column. The columns are often not aligned due to the tabbing of the display.

We see that the value in the ebx general purpose register is the same as that stored in the wye variable, 0xffffffff.3 (Recall that ints are 32 bits, even in 64-bit mode.) We conclude that the compiler chose to allocate ebx as the wye variable.

Notice the value in the rip register, 0x4005a9. Refer back to where I set the breakpoint on source line 19. This shows that the program stopped at the correct memory location.

It is only coincidental that the address of the ex variable is currently stored in the rax register. If a general purpose register is not allocated as a variable within a function, it is often used to store results of intermediate computations. You will learn how to use registers this way in subsequent chapters of this book.

(gdb) br 21
Breakpoint 2 at  0x4005ce: file gdbExample1.c, line 21.
(gdb) br 22
Breakpoint 3 at  0x4005d6: file gdbExample1.c, line 22.

These two breakpoints will allow us to examine the value stored in the wye variable just before and after it is modified.

(gdb) cont
Enter an integer: 123

Breakpoint 2, main () at gdbExample1.c:21
21        wye += *ptr;
(gdb) print ex
$6 = 123
(gdb) print wye
$7 = -1

This verifies that the user’s input value is stored correctly and that the wye variable has not yet been changed.

(gdb) cont

Breakpoint 3, main () at gdbExample1.c:22
22        printf("The result is %i∖  n", wye);
(gdb) print ex
$8 = 123
(gdb) print wye
$9 = 122

And this verifies that our (rather simple) algorithm works correctly.

(gdb) i r rbx rip
rbx            0x7a     122
rip            0x4005d6 0x4005d6 <main+74>

We can specify which registers to display with the i r command. This verifies that the rbx register is being used as the wye variable.

And we see that the rip has incremented from 0x4005a9 to 0x4005d6. Don’t forget that the rip register always points to the next instruction to be executed.

(gdb) cont
The result is 122
[Inferior 1 (process 4463) exited normally]
(gdb) q

Finally, I continue to the end of the program. Notice that gdb is still running and I have to quit the gdb program.

6.6 Exercises


6.2, §6.5) Enter the program in Listing 6.1 and trace through the program one line at a time using gdb. Use the n  command, not s  or si. Keep a written record of the rip register at the beginning of each line. Hint: use the i r command. How many bytes of machine code are in each of the C statements in this program? Note that the addresses you see in the rip register may differ from the example given in this chapter.


6.2, §6.4) As you trace through the program in Exercise 6-1 stop on line 22:

     wye += *ptr;

We determined in the example above that the %rbx register is used for the variable wye. Inspect the registers.


What is the address of the first instruction that will be executed when you enter the n command?


How will %rbx change when this statement is executed?


6.5) Modify the program in Listing 6.1 so that a register is also requested for the ex variable. Were you able to convince the compiler to do this for you? Did the compiler produce any error or warning messages? Why do you think the compiler would not use a register for this variable.


6.2, §6.5) Use the gdb debugger to observe the contents of memory in the program from Exercise 2-31. Verify that your algorithm creates a null-terminated string without the newline character.


6.2, §6.5) Write a program in C that allows you to determine the endianess of your computer. Hint: use unsigned char* ptr.


6.2, §6.5) Modify the program in Exercise 6-5 so that you can demonstrate, using gdb, that endianess is a property of the CPU. That is, even though a 32-bit int is stored little endian in memory, it will be read into a register in the “proper” order. Hint: declare a second int that is a register variable; examine memory one byte at a time.