First Assembly Language Instructions

Section 9.2 First Assembly Language Instructions

I start introducing instructions in this section. They will be introduced as we need them, and I will not provide all the details of the instruction. To see the details, you need to read the ARM manuals, ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition[1] for 32-bit and Architecture Reference Manual ARMv8, for ARMv8-A architecture profile[2] for 64-bit. When describing an instruction, I will use essentially the same notation as in the manuals to make it easier for you to learn how to read the manuals. The notation in ARM's manuals does tend to vary a little between the manuals. One difference is that I will use ‘%’ to add my comments. I also omit some of the detailed options for many of the instructions because they do not apply to the programming used in this book.

The ARM actually provides a second instruction set called “Thumb.” It allows for either 16-bit or 32-bit instructions. It can be used to improve efficiency. Using Thumb is beyond the scope of this book, but you will see it mentioned in the description of some instructions in the ARM manuals.

Subsection 9.2.1 Some Notation

The syntax that ARM uses for their assembly language is called Unified Assembler Language (UAL). Our assembler, as, recognizes the UAL syntax if you use the assembler directives I used in Listing 9.1.3 to identify the ARM model. Unfortunately, the version of gcc currently (August 2016) running on Raspbian uses pre-UAL syntax. The differences are minor. For example, the compiler-generated assembly language (Listing 9.1.2) uses a ‘#’ character to prefix each literal value:

str     fp, [sp, #-4]!

But the UAL syntax specifies that the ‘#’ character is optional, so in Listing 9.1.3 I wrote:

str     fp, [sp, -4]!   @ save caller frame pointer

I will not use the ‘#’ character for immediate values in my examples in this book. I strongly urge you to use the UAL syntax when writing your own assembly language programs. This will become very important when we get to the floating-point instructions in Section 16.7. That is where you will see the biggest differences between the UAL and pre-UAL syntaxes.

The general notation used to explain instructions is:

Upper case represents the characters that you need to use, although you can type them in lower case.
Lower case represents places where you need to supply the appropriate text.
Angle brackets, <...>, are places where you need to supply the appropriate value (without the ‘<’ and ‘>’).
Curly braces, {...} represent optional items (without the ‘{’ and ‘}’).
The ‘#’ prefix of constant values will be shown, but it is optional when using the .syntax unified directive.

Subsection 9.2.2 Condition Codes

As mentioned in Section 8.2, most AARCH32 ARM instructions have an option that allows you to specify that it will be executed only if a specific setting of the condition flags exists. These settings are expressed by adding a mnemonic Condition Code to the instruction mnemonic. These codes are shown in Table 9.2.1. The “cond” column shows the machine code portion of the instruction, which will be described in Section 11.3.3.

Table 9.2.1. Mnemonic suffixes for conditional execution of instructions. Meaning depends on whether the values are integers or floats. The cond column shows the machine code.

`cond`	Mnemonic	Integer	Float	Condition Flags
\(\binary{0000}\)	`EQ`	Equal	Equal	`Z == 1`
\(\binary{0001}\)	`NE`	Not equal	Not equal	`Z == 0`
\(\binary{0010}\)	`CS`	Carry set	Greater than, equal, or unordered	`C == 1`
\(\binary{0011}\)	`CC`	Carry clear	Less than	`C == 0`
\(\binary{0100}\)	`MI`	Negative	Less than	`N == 1`
\(\binary{0101}\)	`PL`	Positive, or zero	Greater than, equal, or unordered	`N == 0`
\(\binary{0110}\)	`VS`	Overflow	Unordered	`V == 1`
\(\binary{0111}\)	`VC`	No overflow	Not unordered	`V == 0`
\(\binary{1000}\)	`HI`	Unsigned higher	Greater than, or unordered	`C == 1 AND Z == 0`
\(\binary{1001}\)	`LS`	Unsigned lower or same	Less than or equal	`C == 0 OR Z == 1`
\(\binary{1010}\)	`GE`	Signed greater than or equal	Greater than or equal	`N == V`
\(\binary{1011}\)	`LT`	Signed less than	Less than, or unordered	`N != V`
\(\binary{1100}\)	`GT`	Signed greater than	Greater than	`Z == 0 AND N == V`
\(\binary{1101}\)	`LE`	Signed less than or equal	Less than, equal, or unordered	`Z == 1 OR N != V`
\(\binary{1110}\)	none (`AL`)	Always	Always	Any

Subsection 9.2.3 Shift Options

Many ARM instructions include an option to shift one of the data values during the operation that the instruction performs. Table 9.2.2 shows the mnemonic codes that are used and their respective effect.

Table 9.2.2. Mnemonic codes for adding shifts to instructions. The ‘#’ is optional. In the variable form \(n\) is stored in the low byte of Rs.

Constant	Variable	Effect
`LSL #<n>`	`LSL <Rs>`	Logical shift left \(n\) bits. \(1 \le n \le 31\text{.}\)
`LSR #<n>`	`LSR <Rs>`	Logical shift right \(n\) bits. \(1 \le n \le 32\text{.}\)
`ASR #<n>`	`ASR <Rs>`	Arithmetic shift right \(n\) bits. \(1 \le n \le 32\text{.}\)
`ROR #<n>`	`ROR <Rs>`	Rotate right \(n\) bits. \(1 \le n \le 31\text{.}\)
`RRX`		Rotate right one bit, with extend. Bits [\(31:1\)] are shifted right one bit, and the carry flag is shifted into bit [\(31\)].

where:

Logical shift: Zeroes are written into the vacated bit positions.
Arithmetic shift right: The value in the high-order bit, \(\binary{0}\) or \(\binary{1}\text{,}\) is copied into each of the vacated bit positions, thus preserving the sign of the value being shifted.
Rotate right: As the low-order bits spill off the right side (low-order positions), they are rotated around to flow into the high-order positions as the value is rotated.

Most instructions that allow the shift option allow you specify the shift amount, \(n\text{,}\) either as a constant or as a variable in a register. The RRX operation is always one bit.

As an example of how the shifting syntax is used, the three instructions:

mov     r0, 12
mov     r1, 60
add     r2, r0, r1, lsl 2

would store \(252\) in r2. The first two instructions store \(12\) in r0 and \(60\) in r1. The lsl #2 in the third instruction shifts the value in r1 two bit positions to the left, multiplying it by \(4\text{.}\) That result, \(240\text{,}\) is added to the value in r0, and the result is stored in r2. As you will see in the description of add below, the values in r0 and r1 remain unchanged.

If your algorithm requires that the amount of the shift be under program control, you would first store the shift amount in a register. For example:

mov     r0, 12
mov     r1, 60
mov     r3, 2
add     r2, r0, r1, lsl r3

would produce the same result in r2 as above. But your program could change the value in r3 for achieving a different shift amount.

Subsection 9.2.4 First Instructions

Even though the program in Listing 9.1.1 does nothing, it requires six instructions.

MOV

Copies (moves) a value into a register.

MOV{S}{<c>}   <Rd>, #<const>           % immediate
MOV{S}{<c>}   <Rd>, <Rm>               % register

If ‘S’ is present the condition flags are updated according to the value being moved. If absent, the condition flags are not changed.
<c> is the condition code, Table 9.2.1.
<Rd> specifies the destination register, and <Rm> is the source register.
\(-257 \le const \le +256\text{,}\) or \(const = +256, +260, +264, \ldots, +65280\text{,}\) or \(const = -261, -265, \ldots, -65281\text{.}\) This odd sequence of values will be explained in Section 11.3.3.

In the “immediate” form, the value of the <const> is stored in <Rd>. In the “register” form, the value in <Rm> is copied to <Rd>. There is also a shift version, which will be explained in Section 14.3.

Actually, the assembler does not use the mov instruction for negative values in the immediate form. If you use a negative constant, the assembler subtracts \(1\) from the absolute value, uses the result as the constant, and substitutes the mvn instruction. If you use a negative constant for the immediate form of the mvn instruction, the assembler performs the same computation and substitutes the mov instruction.

So the following two instructions:

mov     r3, 1
mov     r4, -2

do the same thing as:

mov     r3, 1
mvn     r4, 1

And the following two instructions:

mvn     r5, 1
mvn     r6, -2

are the same as:

mov     r5, -2
mov     r6, 1

The result is that you can simply use both positive and negative constants with the mov instruction, which I strongly urge you to do, and the assembler will do the correct thing. But if you are reading compiler-generated assembly language, or you use the disassemble command in gdb, you may see the mvn instruction being used, so I will describe it here.

MVN

Copies (moves) the complement (bitwise NOT) of a value into a register..

MVN{S}{<c>}   <Rd>, #<const>           % immediate
MVN{S}{<c>}   <Rd>, <Rm>{, <shift>}    % register
MVN(S}{<c>}   <Rd>, <Rm>, <type> <Rs>  % register-shifted register

If ‘S’ is present the condition flags are updated according to the value being moved. If absent, the condition flags are not changed.
<c> is the condition code, Table 9.2.1.
<Rd> specifies the destination register, and <Rm> is the source register. <Rs> contains the shift amount in the “register-shifted register” form.
\(-257 \le const \le +256\text{,}\) or \(const = +256, +260, +264, \ldots, +65280\text{,}\) or \(const = -261, -265, \ldots, -65281\text{.}\) This odd sequence of values will be explained in Section 11.3.3
<shift> and <type> are explained in Section 9.2.3

In the “immediate” form, a copy of the bitwise NOT of <const> is stored in <Rd>. In the “register” and “register-shifted register” forms, the bitwise NOT of the value in <Rm> is copied to <Rd>. If a shift is specified, the value in <Rm> is shifted by the specified amount before the bitwise NOT is computed, and the result is stored in <Rd>. The values in <Rm> and <Rs> are unchanged.

ADD

Adds two integers.

ADD{S}{<c>}  {<Rd>,} <Rn>, #<const>           % immediate
ADD{S}{<c>}  {<Rd>,} <Rn>, <Rm>{, <shift>}    % register
ADD{S}{<c>}  {<Rd>,} <Rn>, <Rm>, <type> <Rs>  % register-shifted register

If ‘S’ is present the condition flags are updated according to the value being moved. If absent, the condition flags are not changed.
<c> is the condition code, Table 9.2.1.
<Rd> specifies the destination register, and <Rm> and <Rn> are the source registers. <Rs> contains the shift amount in the “register-shifted register” form.
\(-257 \le const \le +256\text{,}\) or \(const = +256, +260, +264, \ldots, +65280\text{,}\) or \(const = -261, -265, \ldots, -65281\text{.}\) This odd sequence of values will be explained in Section 11.3.3
<shift> and <type> are explained in Section 9.2.3

In the “immediate” form, <const> is added to the value in <Rn>. In the “register” and “register-shifted register” forms, the value in <Rm> is added to the value in <Rn>. If a shift is specified, the value in <Rm> is shifted by the specified amount before the addition is performed. If <Rd> is present the result is stored there and <Rn> is unchanged. If not, the result is stored in <Rn>. The values in <Rm> and <Rs> are unchanged.

SUB

Subtracts two integers.

SUB{S}{<c>}   {<Rd>,} <Rn>, #<const>           % immediate
SUB{S}{<c>}   {<Rd>,} <Rn>, <Rm>{, <shift>}    % register
SUB{S}{<c>}   {<Rd>,} <Rn>, <Rm>, <type> <Rs>  % register-shifted register

If ‘S’ is present the condition flags are updated according to the results of the subtraction. If absent, the condition flags are not changed.
<c> is the condition code, Table 9.2.1.
<Rd> specifies the destination register, and <Rm> and <Rn> are the source registers. <Rs> contains the shift amount in the “register-shifted register” form.
\(-257 \le const \le +256\text{,}\) or \(const = +256, +260, +264, \ldots, +65280\text{,}\) or \(const = -261, -265, \ldots, -65281\text{.}\) This odd sequence of values will be explained in Section 11.3.3.
<shift> and <type> are explained in Section 9.2.3

In the “immediate” form, the <const> is subtracted from the value in <Rn>. If <Rd> is absent, the difference is stored in <Rn>. If <Rd> is present, the difference is stored there and <Rn> remains unchanged.

In the “register” and “register-shifted register” forms, the value in <Rm> is subtracted from the value in <Rn>. If <Rd> is absent, the difference is stored in <Rn>. If Rd is present, the difference is stored there and <Rn> remains unchanged. If a shift is specified, the value in <Rm> is shifted by the specified amount before being subtracted from the value in <Rn>, and the result is stored in <Rd>.

BX

Branches to another location in the program. The address of that location is in a register. Also allows switching between ARM and Thumb instruction sets.

BX{<c>}    <Rm>

<c> is the condition code, Table 9.2.1.
<Rm> is the target address register.

The value in the Rm register is moved to the pc, thus causing program execution to branch to that location. Bit \(0\) specifies the instruction set: \(\binary{0}\) for ARM, \(\binary{1}\) for Thumb. The value in Rm does not change.

LDR

Loads a word from memory into a register.

LDR<c>  <Rt>, <label>                  % Label
LDR<c>  <Rt>, [<Rn>{, #+/-<imm>}]      % Offset
LDR<c>  <Rt>, [<Rn>, #+/-<imm>]!       % Pre-indexed
LDR<c>  <Rt>, [<Rn>], #+/-<imm>        % Post-indexed

<c> is the condition code, Table 9.2.1.
<Rt> is the destination register, and <Rn> is the base register.
<label> is a labeled memory address.
<imm> is a signed integer in the range \(-2048 \ldots +2047\text{.}\)

The memory address to load the word from is determined the following way (further explained in Section 11.1):

The Label form uses the address corresponding to the <label>.
In the Offset form, the signed integer, <imm>, is added to the value in the base register, <Rn>, the value at this address is loaded into <Rt>, but the base register is not changed.
In the Pre-indexed form, the signed integer is added to the value in the base register, <Rn>, the base register is updated to the new address, and then the value at this new address is loaded into <Rt>.
In the Post-indexed form, the value in the base register, <Rn>, is used as an address, and the value at that address is loaded into <Rt>. Then the signed integer is added to the value in the base register.

STR

Stores a word from a register into memory.

STR<c>  <Rt>, <label>                  % Label
STR<c>  <Rt>, [<Rn>{, #+/-<imm>}]      % Offset
STR<c>  <Rt>, [<Rn>, #+/-<imm>]!       % Pre-indexed
STR<c>  <Rt>, [<Rn>], #+/-<imm>        % Post-indexed

<c> is the condition code, Table 9.2.1.
<Rt> is the source register, and <Rn> is the base register.
<label> is a labeled memory address.
<imm> is a signed integer in the range \(-2048 \ldots +2047\text{.}\)

The memory address to store the word at is determined the following way (further explained in Section 11.1):

The Label form uses the address corresponding to the <label>.
In the Offset form, the signed integer, <imm>, is added to the value in the base register, <Rn>, the value in <Rt> is stored at this address, but the base register is not changed..
In the Pre-indexed form, the signed integer is added to the value in the base register, <Rn>, the base register is updated to the new address, and then the value in <Rt> is stored at this address.
In the Post-indexed form, the value in the base register, <Rn>, is used as an address, and the value in <Rt> is stored at that address. Then the signed integer is added to the value in the base register.

Subsection 9.2.5 Code Walkthrough

We are now able to walk through the code in Listing 9.1.3 and explain the purpose of each instruction. The two lines:

main:
       str     fp, [sp, -4]!   @ save caller frame pointer

are only one instruction. Placing the label on its own (otherwise blank) line associates the label with the next instruction. This makes it easier to use longer labels, which often improves the readability of the code.

From the description of str, this instruction first determines a memory address by subtracting \(4\) from the address in the sp register and updating the sp register to this new address. It then stores the address in the fp register in memory at this new address.

The ‘!’ character following the [sp, -4] construct causes the value in the sp register to be modified by the numerical value (\(-4\)). So the value in sp is \(4\) less after this instruction is executed. The reasons for this will be explained in Sections 10.2–10.3.

Programs executing in the C runtime environment make extensive use of the stack. Each function in the program has its own area of the stack, known as a Stack Frame. The function keeps track of where its frame is by maintaining its memory address in the fp register. You will learn all about stacks and stack frames in Section 10.2, but for now, the important thing to know is that this instruction saves the calling function's frame pointer and establishes its own frame pointer.

You will learn in Section 10.3 that each function will have an area on the stack for its own use. The function needs a reference point to that area, which is an address stored in the Frame Pointer register. The previous instruction saved the calling function's frame pointer on the stack, and this instruction,

add     fp, sp, 0       @ establish our frame pointer

establishes a frame pointer for this function. Yes, mov fp,sp would produce exactly the same results, but most meaningful functions will save other items on the stack, and the frame pointer needs to be set accordingly. Again, you will see how this works in Section 10.3.

This function returns \(0\) to the calling function (which is in the operating system), and it uses a register, r3, as a local variable to hold this value:

mov     r3, 0           @ return 0;

The operating system requires that a return value be in register r0, so the program moves it there:

mov     r0, r3          @ return values go in r0

As you will learn in Section 10.2, it is essential to clean up any use of the stack before returning to the calling function. Recall that the program created a frame pointer relative to the stack pointer when the function first started. It needs to move the stack pointer back to where it was when the function first began:

sub     sp, fp, 0	      @ restore stack pointer

The next clean up operation is to restore the frame pointer to the one being used by the calling function:

ldr     fp, [sp], 4     @ restore caller's frame pointer

Look carefully at the instruction at the beginning of the function that saved the calling function's frame pointer:

str     fp, [sp, -4]!   @ save caller frame pointer

When you compare it with the one that restores it, you can see that saving it pre-decrements the stack pointer before saving it, and restoring the frame pointer is done by retrieving the address and then post-incrementing the stack pointer. I think of these two operations as being a little like parentheses that surround a term in an algebraic equation.

Now that the stack has been cleaned up, the function returns to the calling function. The instruction that called this one placed the return address in the link register, lr. So all this function needs to do is to branch to the address in the lr register:

bx      lr              @ back to caller

You will learn how this works in Chapter 13.