Section 9.5 Assemblers and Linkers
In this section we look at what assemblers and linkers do to process source code and create an executable program. The goal of this presentation is to introduce the concepts. Most assemblers and linkers have capabilities that go far beyond the concepts described here (e.g., macro expansion, dynamic load/link). We leave a more thorough discussion of assemblers and linkers to a book on systems programming.
Subsection 9.5.1 Assemblers
An assembler must perform the following tasks:
Translate assembly language mnemonics into machine language.
Translate symbolic names for addresses into numeric addresses.
Since the numeric value of an address may be required before an instruction can be translated into machine language, there is a problem with forward references to memory locations. For example, let us say we have an algorithm that gives a correct result if the r3
register contains \(100\text{.}\) In that case, we want our function to return \(0\text{.}\) Otherwise, it should return \(-1\text{.}\) So we might end the function with code like (you can probably guess that cmp
means “compare” and bne
means “branch if not equal”):
mov r0, 0 @ assume a good result cmp r3, 100 @ is it good? bne doReturn @ yes, return 0 mov r0, -1 @ no, return a -1 doReturn: sub sp, fp, 0 @ delete allocated memory ldr fp, [sp], 4 @ restore caller's frame pointer bx lr @ back to caller
This creates a problem for the assembler when it needs to determine the address of the label doReturn
on the line
bne doReturn @ yes, return 0
because it has not yet encountered the label itself. So it does not know how far away that location is.
The simplest solution to this problem is to use a two-pass assembler
The first pass builds a symbol table, which provides an address for each memory label.
The second pass performs the actual translation into machine language, consulting the symbol table for numeric values of the symbols.
Algorithm 9.5.1 is a highly simplified description of how the first pass of an assembler works.
Algorithm 9.5.1. First pass of a two-pass assembler.
\(\displaystyle locationCounter = 0\)
Get first line of source code
-
While (more lines of code)
-
If (line has a label)
\(\displaystyle symbolTable.symbol = label\)
\(\displaystyle symbolTable.location = locationCounter\)
Determine number of bytes required by line when assembled
\(\displaystyle locationCounter = locationCounter + number of bytes\)
Get next line of source code
-
The symbol table is carried from the first pass to the second. The second pass also consults a table of operation codes, which provides the machine code corresponding to each instruction mnemonic. A highly simplified description of the second pass is given in Algorithm 9.5.2.
Algorithm 9.5.2. Second pass of a two-pass assembler.
\(\displaystyle locationCounter = 0\)
Get first line of source code
-
While (more lines of code)
-
If (line is an instruction)
Find machine code from opCodeTable
Find symbol value form symbolTable
Assemble instruction into machine code
-
Else
Carry out assembler directive
Write machine code to object file
Determine number of bytes used
\(\displaystyle locationCounter = locationCounter + number of bytes\)
Get next line of source code
-
Subsection 9.5.2 Linkers
You have already seen functions that call other functions by name, which are located in another file or library. The assembler has no way to determine the address of these other labels for the symbol table during the first pass. The only thing the assembler can do during the second pass is to leave enough memory space for the address when it assembles this instruction. The actual address will have to be filled in later in order to create the entire program. Filling in these references to external memory locations is the job of the linker program.
Actually, many of the functions used by a program are not even included in the executable program file. They are loaded as required when the program is executing. The linker program must provide dynamic links for the executable program file.
The mechanisms for implementing dynamic function loading is beyond the scope of this book. However, you can get the general idea of linking separately assembled (or compiled) functions together by studying Algorithm 9.5.3 and Algorithm 9.5.4. First, notice that the assembler (or compiler) must include other information in addition to machine code in the object file. The additional information includes:
The name of the function.
The name of each external memory reference.
The location relative to the beginning of the function where the external memory reference is made.
Algorithm 9.5.3. First pass of a two-pass assembler.
\(\displaystyle locationCounter = 0\)
Open first object file
-
While (more object files)
globalSymbolTable.symbol = functionName
globalSymbolTable.location = locationCounter
Determine number of bytes required by the function.
\(\displaystyle locationCounter = locationCounter + number of bytes\)
Open next object file
Algorithm 9.5.4. Second pass of a two-pass assembler.
\(\displaystyle memoryPointer = address from Operating System\)
Open first object file
-
While (more object files)
-
While (more machine code)
*memoryPointer = byte of code read from object file
memoryPointer = memoryPointer + 1
-
While (more external memory references)
Get value from globalSymbolTable
Determine where value should be stored
Store value in code that was just loaded
Open next object file
-
In Algorithm 9.5.4 “*memoryPointer
” is the C/C++ syntax to specify that the variable, memoryPointer
, holds the address of the value being referenced. The ‘*
’ character is the dereference operator. So in this algorithm, the byte of code that is read from the object file is stored at whatever address is currently in the memoryPointer
variable. The next step in the algorithm adds \(1\) to this variable, so it then points to the next byte in memory. Variables used in this way are called pointer variables.