Chapter 15
Interrupts and Exceptions

Thus far in this book all programs have been executed under the Linux operating system. An operating system (OS) can be viewed as a set of programs that provide services to application programs. These services allow the application programs to use the hardware, but only under the auspices of the OS.

Linux allows multiple programs to be executing concurrently, and each of the programs is accessing the hardware resources of the computer. One of the jobs of the OS is to manage the hardware resources in such a way that the programs do not interfere with one another. In this chapter we introduce the CPU features that enable Linux to carry out this management task.

The read system call is a good example of a program using the services of the OS. It requests input from the keyboard. The OS handles all input from the keyboard, so the read function must first request keyboard input from the OS. One of the reasons this request must be funneled through the OS is that other programs may also be requesting input from the keyboard, and the OS needs to ensure that each program gets the keyboard input intended for it.

Once the request for input has been made, it would be very inefficient for the OS to wait until a user strikes a key. So the OS allows another program to use the CPU, and the keyboard notifies the OS when a key has been struck. To avoid losing a character, this notification interrupts the CPU so that the OS can read the character from the keyboard.

Another example comes from something you probably did not intend to do. Unless you are a perfect programmer, you have probably seen a “segmentation fault.” This can occur when your program attempts to access memory that has not been allocated for your program. I have gotten this error (yes, I still make programming mistakes!) when I have made a mistake using the stack, or when I dereference a register that contains a bad address.

We can summarize these three types of events:

In response to any of these events, the CPU performs an operation that is very similar to the call instruction. The value in the rip register is pushed onto the stack, and another address is placed in the rip register. The net effect is that a function is called, just as in the call instruction, but the address of the called function is specified in a different way, and additional information is pushed onto the stack. Before describing the differences, we discuss what ought to occur in order for the OS to deal with each of these events.

15.1 Hardware Interrupts

Keyboard input is a good place to start the discussion. It is impossible to know exactly when someone will strike a key on the keyboard, nor how soon the next key will be struck. For example, if a key is struck in the middle of executing the first of the following two instructions

        cmpb    $0, (%ebx) 
        je      allDone

in order to avoid losing the keystroke, we would like to read the character immediately after the cmpb instruction is executed but before the CPU starts working on the je instruction.

The function that reads the character from the keyboard is called an interrupt handler or simply handler. Handlers are part of the OS. In Linux they can either be built into the kernel or loaded as separate modules as needed.

The timing — between the two instructions — means that the CPU will acknowledge an interrupt only between instruction execution cycles. Just before executing the je instruction the rip register has the address of the instruction, and it is that address that gets pushed onto the stack. That is, since calling a handler occurs automatically and does not involve fetching an instruction, the current value of the rip pushed onto the stack is the correct return address from the handler.

There is another important issue. It is almost certain that the rflags register will be changed by the handler that gets called. When program control returns to the je instruction (which is supposed to depend on the state of the rflags register as a result of executing the cmpb instruction), there is little chance that the program will do what the programmer intended. Thus we conclude that in addition to saving the rip register,

The next issue is the question of how the CPU knows the address of the appropriate handler to call. In the call instruction, the address of the function to call is specified as an operand to the instruction. For example,

        call    toUpperCase

Since the keyboard has no knowledge of the software, there must be some other mechanism for specifying the address of the handler to call. The answer to this problem is that addresses of interrupt handlers are stored in an Interrupt Descriptor Table (IDT). Each possible interrupt in the system is associated with a unique entry in the IDT.

The IDT table entries are data structures (128 bits in 64-bit mode, 64 bits in 32-bit mode) called gate descriptors. In addition to the handler address, they contain information that the CPU uses to help protect the integrity of the OS.

After it has completed execution of the current instruction, the following actions must occur when there is an interrupt from a device external external to the CPU:

15.2 Exceptions

We next consider exceptions. These are typically the result of a number that the CPU cannot deal with. Examples are

In a perfect world, the application software would include all the checks that would prevent the occurrence of many of these errors. The reality is that no program is perfect, so some of these errors will occur.

When they do occur, it is the responsibility of the OS to take an appropriate action. The currently executing instruction may have caused the exception to occur. So the CPU often reacts to an exception in the midst of a normal instruction execution cycle. The actions that the CPU must take in response to an exception are essentially the same as those for an interrupt:

Not all exceptions are due to actual program errors. For example, when a program references an address in another part of the program that has not yet been loaded into memory, it causes a page fault exception. The OS must provide a handler that loads the appropriate part of the program from the disk into memory, then continues with normal program execution.

15.3 Software Interrupts

The usefulness of the interrupt/exception handling mechanism for requesting OS services is not apparent until we discuss privilege levels. As mentioned above, one of the jobs of the OS is to keep concurrently executing programs from interfering with one another. It uses the privilege level mechanism in the CPU to do this.

At any given time, the CPU is running in one of four possible privilege levels. The levels, from most privileged to least, are:

0

Provides direct access to all hardware resources. Restricted to the lowest-level operating system functions, e.g., BIOS, memory management.

1

Somewhat restricted access to hardware resources. Might be used by library routines and software that controls I/O devices.

2

More restricted access to hardware resources. Might be used by library routines and software that controls I/O devices.

3

No direct access to hardware resources. Applications programs run at this level.

The OS needs to have direct access to all the hardware, so it executes at privilege level 0. Application programs should be limited, so they execute at privilege level 3. The CPU includes a mechanism for recognizing the privilege level of the memory associated with each program. A program can access memory at a lower privilege level, but not at a higher level. Thus, an application program (running at level 3) cannot access memory that belongs to the OS.

Gate descriptors include privilege level information in addition to the handler address. The CPU’s interrupt/exception mechanism automatically switches to this privilege level when it calls the handler function. Thus, for example, the keyboard might interrupt during the execution of an application program running at privilege level 3, but its handler function would execute at privilege level 0.

The software interrupt allows an application program to use OS services while still allowing the OS to control this access. The instruction is

        int   $n

where n specifies the nth entry in the IDT table.

Older versions of the Linux kernel used

        int   $0x80    # Should be avoided in 64-bit mode.

to make system calls. The code corresponding to the desired action is loaded into eax and the arguments are loaded into the proper registers before the system call is executed. The recommended technique for making system calls is discussed in Section 15.6 on page 878.

15.4 CPU Response to an Interrupt or Exception

Each entry in the IDT is called a vector. The CPU is hardwired to associate vectors 0 – 31 with specific exceptions. For example, vector number 0 represents a divide-by-zero exception. Vector number 14 is a page fault exception.

Vectors 32 – 255 can be assigned to interrupts, both external and the int $n instruction. These assignments are determined by the OS programmers.

During OS initialization, the address of a handler function is stored in the gate descriptor corresponding to the vector number it is designed to handle. Other information in the gate descriptor causes the CPU to switch to a higher (numerically lower) privilege level, so the handler function has appropriate access to the hardware.

Whenever an interrupt or exception occurs the CPU executes an exception processing cycle, which consists of the following actions:

  1. Push the rflags register onto the stack.
  2. Push the rip register onto the stack.
  3. Determine the address of the corresponding gate descriptor in the IDT table.
  4. Load the handler address from the gate descriptor into the rip register.

The CPU continues with a normal instruction processing cycle — fetch the instruction at the address in rip, etc. Thus, control will transfer to the handler function.

Depending upon the nature of an exception and what actually caused it, CPU execution may or may not be returned to the program that was executing when the exception occurred.

15.5 Return from Interrupt/Exception

There is one more part of this puzzle. Since the ret instruction simply pops the value at the top of the stack into the rip register, it will not work for the OS’s handler function. The CPU has another instruction

iret

that correctly pops everything off the stack into the rip and rflags registers and restores the privilege level to where it was before the handler function was invoked. (The privilege level information was also stored on the stack.)

15.6 The syscall and sysret Instructions

Using a software interrupt to invoke one of the services provided by the OS is somewhat of an overkill. The x86-64 architecture includes another instruction that causes the CPU to change priority levels but not use the stack nor go through the IDT table, thus saving execution time. The instruction is

syscall

We first introduced it in Section 8.6 (page 589) to perform I/O.

The syscall instruction causes the CPU to

  1. Move the low-order 32 bits of the rflags register to the r11 register.
  2. Move the address in the rip register to the rcx register.
  3. Load the address from the LSTAR register into the rip register. The LSTAR register is a Model-Specific Register; see Table 6.2 on page 461.
  4. Change the privilege level to 0.

Now the CPU has been switched to privilege level 0, and the OS has control and can enforce orderly use of the hardware.

The program in Listing 15.1 illustrates the use of syscall to do system calls without using the C libraries. See Exercise 15-1 for using syscall within the C runtime environment.

 
1# myCat.s 
2# Writes a file to standard out 
3# Does not use C libraries 
4# Bob Plantz -- 18 June 2009 
5 
6# Useful constants 
7        .equ    STDIN,0 
8        .equ    STDOUT,1 
9   # from asm/unistd_64.h 
10        .equ    READ,0 
11        .equ    WRITE,1 
12        .equ    OPEN,2 
13        .equ    CLOSE,3 
14        .equ    EXIT,60 
15   # from bits/fcntl.h 
16        .equ    O_RDONLY,0 
17        .equ    O_WRONLY,1 
18        .equ    O_RDWR,3 
19# Stack frame 
20        .equ    aLetter,-16 
21        .equ    fd, -8 
22        .equ    localSize,-16 
23        .equ    fileName,24 
24# Code 
25        .text                  # switch to text segment 
26        .globl  __start 
27        .type   __start, @function 
28__start: 
29        pushq   %rbp           # save callers frame pointer 
30        movq    %rsp, %rbp     # establish our frame pointer 
31        addq    $localSize, %rsp   # for local variable 
32 
33        movl    $OPEN, %eax        # open the file 
34        movq    fileName(%rbp), %rdi # the filename 
35        movl    $O_RDONLY, %esi    # read only 
36        syscall 
37        movl    %eax, fd(%rbp)     # save file descriptor 
38 
39        movl    $READ, %eax 
40        movl    $1, %edx           # 1 character 
41        leaq    aLetter(%rbp), %rsi # place to store character 
42        movl    fd(%rbp), %edi     # standard in 
43        syscall                    # request kernel service 
44 
45writeLoop: 
46        cmpl    $0, %eax           # any chars? 
47        je      allDone            # no, must be end of file 
48        movl    $1, %edx           # yes, 1 character 
49        leaq    aLetter(%rbp), %rsi # place to store character 
50        movl    $STDOUT, %edi      # standard out 
51        movl    $WRITE, %eax 
52        syscall                    # request kernel service 
53 
54        movl    $READ, %eax        # read next char 
55        movl    $1, %edx           # 1 character 
56        leaq    aLetter(%rbp), %rsi # place to store character 
57        movl    fd(%rbp), %edi     # standard in 
58        syscall                    # request kernel service 
59        jmp     writeLoop          # check the char 
60allDone: 
61        movl    $CLOSE, %eax       # close the file 
62        movl    fd(%rbp), %edi     # file descriptor 
63        syscall                    # request kernel service 
64        movq    %rbp, %rsp         # delete local variables 
65 
66        popq    %rbp               # restore callers frame pointer 
67        movl    $EXIT, %eax        # end this process 
68        syscall
Listing 15.1: Using syscall to cat a file. Use “ld -e __start -o myCat myCat.o” after assembling this file.

In Section 8.1 (page 544) we saw how to call the write system call function to write characters to standard out (the screen). write and the other system call functions are simply C wrappers that load the proper code in eax and the arguments into the appropriate registers.

Several system call codes are shown in Table 15.1.


function eax rdi rsi rdx returns






read 0 file descriptor pointer to number of number of
0 storage area bytes to read bytes read






write 1 file descriptor pointer to number of number of
1 first byte bytes to write bytes written






open 2 pointer to flags mode file descriptor
open 2 filename






close 3 file descriptor






exit 60







Table 15.1: Some system call codes for the syscall instruction.

For additional system call codes see the unistd_64.h file on your system. The arguments for each system call are given in the man page for the corresponding C version. For example,

   bob@bob-desktop:~$ man 2 write

describes the write system call.

There is a complementary instruction, sysret, which the OS executes in order to return from a system call:

sysret

The sysret instruction causes the CPU to

  1. Move the low-order 32 bits of the rll register to the rflags register.
  2. Move the value in the rcx register to the rip register.
  3. Change the privilege level to 3. (We omit the details of how this is done.)

15.7 Summary

We summarize the differences between a call instruction and an interrupt/exception. The similarities are

The additional features of the interrupt/exception are

15.8 Instructions Introduced Thus Far

This summary shows the assembly language instructions introduced thus far in the book. The page number where the instruction is explained in more detail, which may be in a subsequent chapter, is also given. This book provides only an introduction to the usage of each instruction. You need to consult the manuals ([2][6], [14][18]) in order to learn all the possible uses of the instructions.

15.8.1 Instructions

data movement:
opcode source destination action page





cbtw convert byte to word, al ax 699





cwtl convert word to long, ax eax 699





cltq convert long to quad, eax rax 699





cwtd convert word to long, ax dx:ax 788





cltd convert long to quad, eax edx:eax 788





cqto convert quad to octuple, rax rdx:rax 788





cmovcc %reg/mem %reg conditional move 709





movs $imm/%reg %reg/mem move 508





movs mem %reg move 508





movsss $imm/%reg %reg/mem move, sign extend 696





movzss $imm/%reg %reg/mem move, zero extend 696





popw %reg/mem pop from stack 568





pushw $imm/%reg/mem push onto stack 568










s = b, w, l, q; w = l, q; cc = condition codes

arithmetic/logic:
opcode source destination action page





adds $imm/%reg %reg/mem add 609





adds mem %reg add 609





ands $imm/%reg %reg/mem bit-wise and 750





ands mem %reg bit-wise and 750





cmps $imm/%reg %reg/mem compare 679





cmps mem %reg compare 679





decs %reg/mem decrement 702





divs %reg/mem unsigned divide 780





idivs %reg/mem signed divide 786





imuls %reg/mem signed multiply 778





incs %reg/mem increment 701





leaw mem %reg load effective address 581





muls %reg/mem unsigned multiply 772





negs %reg/mem negate 791





ors $imm/%reg %reg/mem bit-wise inclusive or 750





ors mem %reg bit-wise inclusive or 750





sals $imm/%cl %reg/mem shift arithmetic left 759





sars $imm/%cl %reg/mem shift arithmetic right 754





shls $imm/%cl %reg/mem shift left 759





shrs $imm/%cl %reg/mem shift right 754





subs $imm/%reg %reg/mem subtract 614





subs mem %reg subtract 614





tests $imm/%reg %reg/mem test bits 679





tests mem %reg test bits 679





xors $imm/%reg %reg/mem bit-wise exclusive or 750





xors mem %reg bit-wise exclusive or 750










s = b, w, l, q; w = l, q

program flow control:
opcode location action page




call label call function 548




iret return from kernel function 878




ja label jump above (unsigned) 686




jae label jump above/equal (unsigned) 686




jb label jump below (unsigned) 686




jbe label jump below/equal (unsigned) 686




je label jump equal 682




jg label jump greater than (signed) 689




jge label jump greater than/equal (signed) 689




jl label jump less than (signed) 689




jle label jump less than/equal (signed) 689




jmp label jump 694




jne label jump not equal 682




jno label jump no overflow 682




jcc label jump on condition codes 682




leave undo stack frame 582




ret return from function 585




syscall call kernel function 589




sysret return from kernel function 883








cc = condition codes

x87 floating point:
opcode source destination action page





fadds memfloat add 861





faddp add/pop 861





fchs change sign 861





fcoms memfloat compare 861





fcomp compare/pop 861





fcos cosine 861





fdivs memfloat divide 861





fdivp divide/pop 861





filds memint load integer 861





fists memint store integer 861





flds memint load floating point 861





fmuls memfloat multiply 861





fmulp multiply/pop 861





fsin sine 861





fsqrt square root 861





fsts memint floating point store 861





fsubs memfloat subtract 861





fsubp subtract/pop 861










s = b, w, l, q; w = l, q

SSE floating point conversion:
opcode source destination action page





cvtsd2si %xmmreg/mem %reg scalar double to signed integer 847





cvtsd2ss %xmmreg %xmmreg/%reg scalar double to single float 847





cvtsi2sd %reg %xmmreg/mem signed integer to scalar double 847





cvtsi2sdq %reg %xmmreg/mem signed integer to scalar double 847





cvtsi2ss %reg %xmmreg/mem signed integer to scalar single 847





cvtsi2ssq %reg %xmmreg/mem signed integer to scalar single 847





cvtss2sd %xmmreg %xmmreg/mem scalar single to scalar double 847





cvtss2si %xmmreg/mem %reg scalar single to signed integer 847





cvtss2siq %xmmreg/mem %reg scalar single to signed integer 847










15.8.2 Addressing Modes

__________________________________________________________

register direct:

The data value is located in a CPU register.

syntax: name of the register with a “%” prefix.

example: movl    %eax, %ebx



immediate data:

The data value is located immediately after the instruction. Source operand only.

syntax: data value with a “$” prefix.

example: movl    $0xabcd1234, %ebx



base register plus offset:

The data value is located in memory. The address of the memory location is the sum of a value in a base register plus an offset value.

syntax: use the name of the register with parentheses around the name and the offset value immediately before the left parenthesis.

example: movl    $0xaabbccdd, 12(%eax)



rip-relative:

The target is a memory address determined by adding an offset to the current address in the rip register.

syntax: a programmer-defined label

example: je     somePlace



indexed:

The data value is located in memory. The address of the memory location is the sum of the value in the base_register plus scale times the value in the index_register, plus the offset.

syntax: place parentheses around the comma separated list (base_register, index_register, scale) and preface it with the offset.

example: movl    $0x6789cdef, -16(%edx, %eax, 4)



15.9 Exercises

15-1

15.6) Modify the program in Figure 15.1 so that it uses the C environment. That is, turn it into a main function using the prototype int main(int argc, char **argv);. argc is the number of space-delimited strings on the command line, including the command to execute the program. argv is a pointer to an array of pointers to each of the command line strings.