Division

Section 14.6 Division

Algorithm 2.5.1 shows how we can compute the decimal equivalent of an int stored in binary format. It repeatedly divides the int by \(10\text{.}\) The remainder after each integer division is the equivalent decimal digit, starting with the low-order digits.

Many programming languages use “modulo” (‘%’ in C) and “remainder” interchangeably. The definitions of “modulo” vary in the literature. The differences arise when dealing with negative numbers. Our use here will not use negative numbers, so it will not be an issue.

There are two simple divide instructions, sdiv and udiv.

SDIV

Divides a signed 32-bit value into another signed 32-bit value, producing a 32-bit signed result.

SDIV{<c>}  <Rd>, <Rn>, <Rm>

The condition flags are not changed.
<c> is the condition code, Table 9.2.1.
<Rd> specifies the destination register. <Rm> contains the divisor and <Rn> the dividend.

The value in <Rn> is divided by the value in <Rm> and the result is stored in <Rd>. All values are treated as signed values. The remainder is lost.

UDIV

Divides an unsigned 32-bit value into another unsigned 32-bit value, producing a 32-bit unsigned result.

UDIV{<c>}  <Rd>, <Rn>, <Rm>

The condition flags are not changed.
<c> is the condition code, Table 9.2.1.
<Rd> specifies the destination register. <Rm> contains the divisor and <Rn> the dividend.

The value in <Rn> is divided by the value in <Rm> and the result is stored in <Rd>. All values are treated as unsigned values. The remainder is lost.

Since the divide instructions in the ARM ignore the remainder, we will need to compute it on our own in order to use Algorithm 2.5.1. The sequence of instructions:

udiv    r0, r6, r7      @ no, div to get quotient
mul     r1, r0, r7      @ need for computing remainder
sub     r2, r6, r1      @ the mod (remainder)

uses the udiv instruction to compute the quotient. The quotient is then multiplied by the divisor. Subtracting this result from the original dividend yields the remainder. Listing 14.6.1 shows how this can be done.

@ uIntToDec.s
@ Converts an int to the corresponding unsigned
@ decimal text string.
@ Calling sequence:
@       r0 <- address of place to store string
@       r1 <- int to convert
@       bl uIntToDec
@ 2017-09-29: Bob Plantz

@ Define my Raspberry Pi
        .cpu    cortex-a53
        .fpu    neon-fp-armv8
        .syntax unified         @ modern syntax

@ Constant for assembler
        .equ    tempString,-32  @ for temp string
        .equ    locals,16       @ space for local vars
        .equ    zero,0x30       @ ascii 0
        .equ    NUL,0

@ The program
        .text
        .align  2
        .global uIntToDec
        .type   uIntToDec, %function
uIntToDec:
        sub     sp, sp, 24      @ space for saving regs
        str     r4, [sp, 0]     @ save r4
        str     r5, [sp, 4]     @      r5
        str     r6, [sp, 8]     @      r6
        str     r7, [sp, 12]    @      r7
        str     fp, [sp, 16]    @      fp
        str     lr, [sp, 20]    @      lr
        add     fp, sp, 20      @ set our frame pointer
        sub     sp, sp, locals  @ for local vars
        
        mov     r4, r0          @ caller's string pointer
        add     r5, fp, tempString @ temp string
        mov     r7, 10          @ decimal constant
        
        mov     r0, NUL         @ end of C string
        strb    r0, [r5]
        add     r5, r5, 1       @ move to char storage

        mov     r0, zero        @ assume the int is 0
        strb    r0, [r5]
        movs    r6, r1          @ int to convert
        beq     copyLoop        @ zero is special case
convertLoop:
        cmp     r6, 0           @ end of int?
        beq     copy            @ yes, copy for caller
        udiv    r0, r6, r7      @ no, div to get quotient
        mul     r1, r0, r7      @ need for computing remainder
        sub     r2, r6, r1      @ the mod (remainder)
        mov     r6, r0          @ the quotient
        orr     r2, r2, zero    @ convert to numeral
        strb    r2, [r5]
        add     r5, r5, 1       @ next char position
        b       convertLoop
copy:
        sub     r5, r5, 1       @ last char stored locally
copyLoop:
        ldrb    r0, [r5]        @ get local char
        strb    r0, [r4]        @ store the char for caller
        cmp     r0, NUL         @ end of local string?
        beq     allDone         @ yes, we're done
        add     r4, r4, 1       @ no, next caller location
        sub     r5, r5, 1       @ next local char
        b       copyLoop
        
allDone:        
        strb    r0, [r4]        @ end C string
        add     sp, sp, locals  @ deallocate local var
        ldr     r4, [sp, 0]     @ restore r4
        ldr     r5, [sp, 4]     @      r5
        ldr     r6, [sp, 8]     @      r6
        ldr     r7, [sp, 12]    @      r7
        ldr     fp, [sp, 16]    @      fp
        ldr     lr, [sp, 20]    @      lr
        add     sp, sp, 24      @      sp
        bx      lr              @ return

Listing 14.6.1. Function to convert an unsigned decimal text string to an integer. (prog asm)

This algorithm produces the decimal numeral characters starting with the low-order digits. So the function stores the characters backwards in a local char array. It then copies the characters to the address passed by the calling function, thus reversing the string.

As discussed in the Preface, we consider only a small subset of the ARM instruction set architecture in this book. But there are many instructions that can be very useful for improving the efficiency of some computations. One such instruction is mls which performs the multiply and subtract in one operation, thus simplifying the computation of the remainder.

MLS

Multiplies two 32-bit values in registers, subtracts the result from the value in a third register, and stores that in a fourth register.

MLS{<c>}  <Rd>, <Rn>, <Rm>, <Ra>

The condition flags are not changed.
<c> is the condition code, Table 9.2.1.
<Rd> specifies the destination register. <Rm> and <Rn> contain the multiplier and multiplicand. <Ra> contains the minuend.

The values in <Rm> and <Rn> are multiplied, the result is subtracted from the value in <Ra>, and the result is stored in <Rd>. Only the low-order 32 bits are retained.