Section 9.2 First Assembly Language Instructions
I start introducing instructions in this section. They will be introduced as we need them, and I will not provide all the details of the instruction. To see the details, you need to read the ARM manuals, ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition[1] for 32-bit and Architecture Reference Manual ARMv8, for ARMv8-A architecture profile[2] for 64-bit. When describing an instruction, I will use essentially the same notation as in the manuals to make it easier for you to learn how to read the manuals. The notation in ARM's manuals does tend to vary a little between the manuals. One difference is that I will use ‘%
’ to add my comments. I also omit some of the detailed options for many of the instructions because they do not apply to the programming used in this book.
The ARM actually provides a second instruction set called “Thumb.” It allows for either 16-bit or 32-bit instructions. It can be used to improve efficiency. Using Thumb is beyond the scope of this book, but you will see it mentioned in the description of some instructions in the ARM manuals.
Subsection 9.2.1 Some Notation
The syntax that ARM uses for their assembly language is called Unified Assembler Language (UAL). Our assembler, as
, recognizes the UAL syntax if you use the assembler directives I used in Listing 9.1.3 to identify the ARM model. Unfortunately, the version of gcc
currently (August 2016) running on Raspbian uses pre-UAL syntax. The differences are minor. For example, the compiler-generated assembly language (Listing 9.1.2) uses a ‘#
’ character to prefix each literal value:
str fp, [sp, #-4]!
But the UAL syntax specifies that the ‘#
’ character is optional, so in Listing 9.1.3 I wrote:
str fp, [sp, -4]! @ save caller frame pointer
I will not use the ‘#
’ character for immediate values in my examples in this book. I strongly urge you to use the UAL syntax when writing your own assembly language programs. This will become very important when we get to the floating-point instructions in Section 16.7. That is where you will see the biggest differences between the UAL and pre-UAL syntaxes.
The general notation used to explain instructions is:
Upper case represents the characters that you need to use, although you can type them in lower case.
Lower case represents places where you need to supply the appropriate text.
Angle brackets, <...>, are places where you need to supply the appropriate value (without the ‘<’ and ‘>’).
Curly braces, {...} represent optional items (without the ‘{’ and ‘}’).
The ‘
#
’ prefix of constant values will be shown, but it is optional when using the.syntax unified
directive.
Subsection 9.2.2 Condition Codes
As mentioned in Section 8.2, most AARCH32 ARM instructions have an option that allows you to specify that it will be executed only if a specific setting of the condition flags exists. These settings are expressed by adding a mnemonic Condition Code to the instruction mnemonic. These codes are shown in Table 9.2.1. The “cond
” column shows the machine code portion of the instruction, which will be described in Section 11.3.3.
cond
column shows the machine code.cond |
Mnemonic | Integer | Float | Condition Flags |
\(\binary{0000}\) | EQ |
Equal | Equal | Z == 1 |
\(\binary{0001}\) | NE |
Not equal | Not equal | Z == 0 |
\(\binary{0010}\) | CS |
Carry set | Greater than, equal, or unordered | C == 1 |
\(\binary{0011}\) | CC |
Carry clear | Less than | C == 0 |
\(\binary{0100}\) | MI |
Negative | Less than | N == 1 |
\(\binary{0101}\) | PL |
Positive, or zero | Greater than, equal, or unordered | N == 0 |
\(\binary{0110}\) | VS |
Overflow | Unordered | V == 1 |
\(\binary{0111}\) | VC |
No overflow | Not unordered | V == 0 |
\(\binary{1000}\) | HI |
Unsigned higher | Greater than, or unordered | C == 1 AND Z == 0 |
\(\binary{1001}\) | LS |
Unsigned lower or same | Less than or equal | C == 0 OR Z == 1 |
\(\binary{1010}\) | GE |
Signed greater than or equal | Greater than or equal | N == V |
\(\binary{1011}\) | LT |
Signed less than | Less than, or unordered | N != V |
\(\binary{1100}\) | GT |
Signed greater than | Greater than | Z == 0 AND N == V |
\(\binary{1101}\) | LE |
Signed less than or equal | Less than, equal, or unordered | Z == 1 OR N != V |
\(\binary{1110}\) | none (AL ) |
Always | Always | Any |
Subsection 9.2.3 Shift Options
Many ARM instructions include an option to shift one of the data values during the operation that the instruction performs. Table 9.2.2 shows the mnemonic codes that are used and their respective effect.
Rs
.Constant | Variable | Effect |
LSL #<n> |
LSL <Rs> |
Logical shift left \(n\) bits. \(1 \le n \le 31\text{.}\) |
LSR #<n> |
LSR <Rs> |
Logical shift right \(n\) bits. \(1 \le n \le 32\text{.}\) |
ASR #<n> |
ASR <Rs> |
Arithmetic shift right \(n\) bits. \(1 \le n \le 32\text{.}\) |
ROR #<n> |
ROR <Rs> |
Rotate right \(n\) bits. \(1 \le n \le 31\text{.}\) |
RRX |
Rotate right one bit, with extend. Bits [\(31:1\)] are shifted right one bit, and the carry flag is shifted into bit [\(31\)]. |
where:
- Logical shift
Zeroes are written into the vacated bit positions.
- Arithmetic shift right
The value in the high-order bit, \(\binary{0}\) or \(\binary{1}\text{,}\) is copied into each of the vacated bit positions, thus preserving the sign of the value being shifted.
- Rotate right
As the low-order bits spill off the right side (low-order positions), they are rotated around to flow into the high-order positions as the value is rotated.
Most instructions that allow the shift option allow you specify the shift amount, \(n\text{,}\) either as a constant or as a variable in a register. The RRX
operation is always one bit.
As an example of how the shifting syntax is used, the three instructions:
mov r0, 12 mov r1, 60 add r2, r0, r1, lsl 2
would store \(252\) in r2
. The first two instructions store \(12\) in r0
and \(60\) in r1
. The lsl #2
in the third instruction shifts the value in r1
two bit positions to the left, multiplying it by \(4\text{.}\) That result, \(240\text{,}\) is added to the value in r0
, and the result is stored in r2
. As you will see in the description of add
below, the values in r0
and r1
remain unchanged.
If your algorithm requires that the amount of the shift be under program control, you would first store the shift amount in a register. For example:
mov r0, 12 mov r1, 60 mov r3, 2 add r2, r0, r1, lsl r3
would produce the same result in r2
as above. But your program could change the value in r3
for achieving a different shift amount.
Subsection 9.2.4 First Instructions
Even though the program in Listing 9.1.1 does nothing, it requires six instructions.
MOV
-
Copies (moves) a value into a register.
MOV{S}{<c>} <Rd>, #<const> % immediate MOV{S}{<c>} <Rd>, <Rm> % register
If ‘
S
’ is present the condition flags are updated according to the value being moved. If absent, the condition flags are not changed.<c>
is the condition code, Table 9.2.1.<Rd>
specifies the destination register, and<Rm>
is the source register.\(-257 \le const \le +256\text{,}\) or \(const = +256, +260, +264, \ldots, +65280\text{,}\) or \(const = -261, -265, \ldots, -65281\text{.}\) This odd sequence of values will be explained in Section 11.3.3.
In the “immediate” form, the value of the
<const>
is stored in<Rd>
. In the “register” form, the value in<Rm>
is copied to<Rd>
. There is also a shift version, which will be explained in Section 14.3.
Actually, the assembler does not use the mov
instruction for negative values in the immediate form. If you use a negative constant, the assembler subtracts \(1\) from the absolute value, uses the result as the constant, and substitutes the mvn
instruction. If you use a negative constant for the immediate form of the mvn
instruction, the assembler performs the same computation and substitutes the mov
instruction.
So the following two instructions:
mov r3, 1 mov r4, -2
do the same thing as:
mov r3, 1 mvn r4, 1
And the following two instructions:
mvn r5, 1 mvn r6, -2
are the same as:
mov r5, -2 mov r6, 1
The result is that you can simply use both positive and negative constants with the mov
instruction, which I strongly urge you to do, and the assembler will do the correct thing. But if you are reading compiler-generated assembly language, or you use the disassemble
command in gdb
, you may see the mvn
instruction being used, so I will describe it here.
MVN
-
Copies (moves) the complement (bitwise NOT) of a value into a register..
MVN{S}{<c>} <Rd>, #<const> % immediate MVN{S}{<c>} <Rd>, <Rm>{, <shift>} % register MVN(S}{<c>} <Rd>, <Rm>, <type> <Rs> % register-shifted register
If ‘
S
’ is present the condition flags are updated according to the value being moved. If absent, the condition flags are not changed.<c>
is the condition code, Table 9.2.1.<Rd>
specifies the destination register, and<Rm>
is the source register.<Rs>
contains the shift amount in the “register-shifted register” form.\(-257 \le const \le +256\text{,}\) or \(const = +256, +260, +264, \ldots, +65280\text{,}\) or \(const = -261, -265, \ldots, -65281\text{.}\) This odd sequence of values will be explained in Section 11.3.3
<shift>
and<type>
are explained in Section 9.2.3
In the “immediate” form, a copy of the bitwise NOT of
<const>
is stored in<Rd>
. In the “register” and “register-shifted register” forms, the bitwise NOT of the value in<Rm>
is copied to<Rd>
. If a shift is specified, the value in<Rm>
is shifted by the specified amount before the bitwise NOT is computed, and the result is stored in<Rd>
. The values in<Rm>
and<Rs>
are unchanged. ADD
-
Adds two integers.
ADD{S}{<c>} {<Rd>,} <Rn>, #<const> % immediate ADD{S}{<c>} {<Rd>,} <Rn>, <Rm>{, <shift>} % register ADD{S}{<c>} {<Rd>,} <Rn>, <Rm>, <type> <Rs> % register-shifted register
If ‘
S
’ is present the condition flags are updated according to the value being moved. If absent, the condition flags are not changed.<c>
is the condition code, Table 9.2.1.<Rd>
specifies the destination register, and<Rm>
and<Rn>
are the source registers.<Rs>
contains the shift amount in the “register-shifted register” form.\(-257 \le const \le +256\text{,}\) or \(const = +256, +260, +264, \ldots, +65280\text{,}\) or \(const = -261, -265, \ldots, -65281\text{.}\) This odd sequence of values will be explained in Section 11.3.3
<shift>
and<type>
are explained in Section 9.2.3
In the “immediate” form,
<const>
is added to the value in<Rn>
. In the “register” and “register-shifted register” forms, the value in<Rm>
is added to the value in<Rn>
. If a shift is specified, the value in<Rm>
is shifted by the specified amount before the addition is performed. If<Rd>
is present the result is stored there and<Rn>
is unchanged. If not, the result is stored in<Rn>
. The values in<Rm>
and<Rs>
are unchanged. SUB
-
Subtracts two integers.
SUB{S}{<c>} {<Rd>,} <Rn>, #<const> % immediate SUB{S}{<c>} {<Rd>,} <Rn>, <Rm>{, <shift>} % register SUB{S}{<c>} {<Rd>,} <Rn>, <Rm>, <type> <Rs> % register-shifted register
If ‘
S
’ is present the condition flags are updated according to the results of the subtraction. If absent, the condition flags are not changed.<c>
is the condition code, Table 9.2.1.<Rd>
specifies the destination register, and<Rm>
and<Rn>
are the source registers.<Rs>
contains the shift amount in the “register-shifted register” form.\(-257 \le const \le +256\text{,}\) or \(const = +256, +260, +264, \ldots, +65280\text{,}\) or \(const = -261, -265, \ldots, -65281\text{.}\) This odd sequence of values will be explained in Section 11.3.3.
<shift>
and<type>
are explained in Section 9.2.3
In the “immediate” form, the
<const>
is subtracted from the value in<Rn>
. If<Rd>
is absent, the difference is stored in<Rn>
. If<Rd>
is present, the difference is stored there and<Rn>
remains unchanged.In the “register” and “register-shifted register” forms, the value in
<Rm>
is subtracted from the value in<Rn>
. If<Rd>
is absent, the difference is stored in<Rn>
. IfRd
is present, the difference is stored there and<Rn>
remains unchanged. If a shift is specified, the value in<Rm>
is shifted by the specified amount before being subtracted from the value in<Rn>
, and the result is stored in<Rd>
. BX
-
Branches to another location in the program. The address of that location is in a register. Also allows switching between ARM and Thumb instruction sets.
BX{<c>} <Rm>
<c>
is the condition code, Table 9.2.1.<Rm>
is the target address register.
The value in the
Rm
register is moved to thepc
, thus causing program execution to branch to that location. Bit \(0\) specifies the instruction set: \(\binary{0}\) for ARM, \(\binary{1}\) for Thumb. The value inRm
does not change. LDR
-
Loads a word from memory into a register.
LDR<c> <Rt>, <label> % Label LDR<c> <Rt>, [<Rn>{, #+/-<imm>}] % Offset LDR<c> <Rt>, [<Rn>, #+/-<imm>]! % Pre-indexed LDR<c> <Rt>, [<Rn>], #+/-<imm> % Post-indexed
<c>
is the condition code, Table 9.2.1.<Rt>
is the destination register, and<Rn>
is the base register.<label>
is a labeled memory address.<imm>
is a signed integer in the range \(-2048 \ldots +2047\text{.}\)
The memory address to load the word from is determined the following way (further explained in Section 11.1):
The Label form uses the address corresponding to the
<label>
.In the Offset form, the signed integer,
<imm>
, is added to the value in the base register,<Rn>
, the value at this address is loaded into<Rt>
, but the base register is not changed.In the Pre-indexed form, the signed integer is added to the value in the base register,
<Rn>
, the base register is updated to the new address, and then the value at this new address is loaded into<Rt>
.In the Post-indexed form, the value in the base register,
<Rn>
, is used as an address, and the value at that address is loaded into<Rt>
. Then the signed integer is added to the value in the base register.
STR
-
Stores a word from a register into memory.
STR<c> <Rt>, <label> % Label STR<c> <Rt>, [<Rn>{, #+/-<imm>}] % Offset STR<c> <Rt>, [<Rn>, #+/-<imm>]! % Pre-indexed STR<c> <Rt>, [<Rn>], #+/-<imm> % Post-indexed
<c>
is the condition code, Table 9.2.1.<Rt>
is the source register, and<Rn>
is the base register.<label>
is a labeled memory address.<imm>
is a signed integer in the range \(-2048 \ldots +2047\text{.}\)
The memory address to store the word at is determined the following way (further explained in Section 11.1):
The Label form uses the address corresponding to the
<label>
.In the Offset form, the signed integer,
<imm>
, is added to the value in the base register,<Rn>
, the value in<Rt>
is stored at this address, but the base register is not changed..In the Pre-indexed form, the signed integer is added to the value in the base register,
<Rn>
, the base register is updated to the new address, and then the value in<Rt>
is stored at this address.In the Post-indexed form, the value in the base register,
<Rn>
, is used as an address, and the value in<Rt>
is stored at that address. Then the signed integer is added to the value in the base register.
Subsection 9.2.5 Code Walkthrough
We are now able to walk through the code in Listing 9.1.3 and explain the purpose of each instruction. The two lines:
main: str fp, [sp, -4]! @ save caller frame pointer
are only one instruction. Placing the label on its own (otherwise blank) line associates the label with the next instruction. This makes it easier to use longer labels, which often improves the readability of the code.
From the description of str
, this instruction first determines a memory address by subtracting \(4\) from the address in the sp
register and updating the sp
register to this new address. It then stores the address in the fp
register in memory at this new address.
The ‘!
’ character following the [sp, -4]
construct causes the value in the sp
register to be modified by the numerical value (\(-4\)). So the value in sp
is \(4\) less after this instruction is executed. The reasons for this will be explained in Sections 10.2–10.3.
Programs executing in the C runtime environment make extensive use of the stack. Each function in the program has its own area of the stack, known as a Stack Frame. The function keeps track of where its frame is by maintaining its memory address in the fp
register. You will learn all about stacks and stack frames in Section 10.2, but for now, the important thing to know is that this instruction saves the calling function's frame pointer and establishes its own frame pointer.
You will learn in Section 10.3 that each function will have an area on the stack for its own use. The function needs a reference point to that area, which is an address stored in the Frame Pointer register. The previous instruction saved the calling function's frame pointer on the stack, and this instruction,
add fp, sp, 0 @ establish our frame pointer
establishes a frame pointer for this function. Yes, mov fp,sp
would produce exactly the same results, but most meaningful functions will save other items on the stack, and the frame pointer needs to be set accordingly. Again, you will see how this works in Section 10.3.
This function returns \(0\) to the calling function (which is in the operating system), and it uses a register, r3
, as a local variable to hold this value:
mov r3, 0 @ return 0;
The operating system requires that a return value be in register r0
, so the program moves it there:
mov r0, r3 @ return values go in r0
As you will learn in Section 10.2, it is essential to clean up any use of the stack before returning to the calling function. Recall that the program created a frame pointer relative to the stack pointer when the function first started. It needs to move the stack pointer back to where it was when the function first began:
sub sp, fp, 0 @ restore stack pointer
The next clean up operation is to restore the frame pointer to the one being used by the calling function:
ldr fp, [sp], 4 @ restore caller's frame pointer
Look carefully at the instruction at the beginning of the function that saved the calling function's frame pointer:
str fp, [sp, -4]! @ save caller frame pointer
When you compare it with the one that restores it, you can see that saving it pre-decrements the stack pointer before saving it, and restoring the frame pointer is done by retrieving the address and then post-incrementing the stack pointer. I think of these two operations as being a little like parentheses that surround a term in an algebraic equation.
Now that the stack has been cleaned up, the function returns to the calling function. The instruction that called this one placed the return address in the link register, lr
. So all this function needs to do is to branch to the address in the lr
register:
bx lr @ back to caller
You will learn how this works in Chapter 13.