Floating-Point Hardware

Section 16.7 Floating-Point Hardware

Floating-point operations can be implemented in software, but the ARM processor in the Raspberry Pi includes floating-point hardware. Early models (Table 8.0.1) include a Coprocessor that provides additional registers and the capability to perform floating-point operations on values in those registers. The ARM Cortex A-53 used in the Raspberry Pi 3 B includes floating-point hardware in the main CPU. As with integer operation differences between AARCH32 and AARCH64, there are some differences between the coprocessor and built-in floating-point instructions. Both use the IEEE-754 32-bit and 64-bit storage formats.

The AARCH32 architecture defines a Vector Floating-point subarchitecture. The versions used in the Raspberry Pi are shown in Table 16.7.1. It has thirty-two 32-bit registers, s0—s31. Each register can hold one float. These registers can be used in pairs for double (64-bit) operations, as shown in Table 16.7.2. Note that the pairing is not arbitrary. For example, registers s0 and s1 can be paired and called d0, but s1 cannot be paired with s2.

Table 16.7.1. Floating-point versions available in different Raspberry Pi models.

Raspberry Pi	ARM CPU	VFP version
Pi Zero
Pi 1 A+	ARM1176JZFS	VFPv2
Pi 1 B+
Pi 2 B	Cortex-A7	VFPv4
Pi 3 B	Cortex-A53	VFPv4

Table 16.7.2. Floating-point registers in ARM CPUs running a 32-bit operating system—AARCH32 state. See Table 8.0.1 for the CPUs used in different Raspberry Pi models.

	Float	Double
Bank	Name	Name	Usage
0	`s0`—`s7`	`d0`—`d3`	Scalar
1	`s8`—`s15`	`d4`—`d7`	Vectorial
2	`s16`—`s23`	`d8`—`d11`	Vectorial
3	`s24`—`s31`	`d12`—`d15`	Vectorial

The VFP registers are arranged in four banks. Bank 0 is scalar, and banks 1–3 are vectorial. The differences between the banks come into play when doing vector computations, which are not covered in this book.

Listing 16.7.3 shows the addition of two floats.

/*
 * addFloat1.c
 * Add two floats.
 * 2017-09-29: Bob Plantz
 */

#include <stdio.h>

int main()
{
  float x = 1.23;
  float y = 4.56;
  float z;

  z = x + y;

  printf("%f + %f = %f\n", x, y, z);

  return 0;
}

Listing 16.7.3. Addition of two floats. (C)

Listing 16.7.4 shows how the gcc compiled the C code in Listing 16.7.3.

        .arch   armv6
        .file   "addFloat1.c"
        .section  .rodata
        .align  2
.LC0:
        .ascii  "%f + %f = %f\012\000"
        .text
        .align  2
        .global main
        .syntax unified
        .arm
        .fpu    vfp
        .type   main, %function
main:
        @ args = 0, pretend = 0, frame = 16
        @ frame_needed = 1, uses_anonymous_args = 0
        push    {fp, lr}
        add     fp, sp, #4
        sub     sp, sp, #32
        ldr     r3, .L3
        str     r3, [fp, #-8]       @ float
        ldr     r3, .L3+4
        str     r3, [fp, #-12]      @ float
        vldr.32 s14, [fp, #-8]    @@ load x into fp reg
        vldr.32 s15, [fp, #-12]   @@ load y into fp reg
        vadd.f32  s15, s14, s15   @@ fp add
        vstr.32 s15, [fp, #-16]
        vldr.32 s15, [fp, #-8]
        vcvt.f64.f32  d5, s15     @@ convert x to double
        vldr.32 s15, [fp, #-12]
        vcvt.f64.f32  d7, s15     @@ convert y to double
        vldr.32 s13, [fp, #-16]
        vcvt.f64.f32  d6, s13     @@ convert z to double
        vstr.64 d6, [sp, #8]      @@ pass z on stack
        vstr.64 d7, [sp]          @@ pass y on stack
        vmov    r2, r3, d5        @@ pass x in r2/r3
        ldr     r0, .L3+8         @@ pointer to format string
        bl      printf
        mov     r3, #0
        mov     r0, r3
        sub     sp, fp, #4
        @ sp needed
        pop     {fp, pc}
.L4:
        .align  2
.L3:
        .word   1067282596
        .word   1083304837
        .word   .LC0
        .ident  "GCC: (Raspbian 6.3.0-18+rpi1) 6.3.0 20170516"

Listing 16.7.4. Addition of two floats. (gcc asm)

As pointed out earlier in this book, the gcc compiler generates pre-UAL assembly language. The differences between this and the UAL syntax are greatest with the floating-point instructions, so we will go directly to my solution of this problem in Listing 16.7.5, which uses the UAL syntax.

@ addFloat2.s
@ Adds two floats and prints results
@ 2017-09-29: Bob Plantz

@ Define my Raspberry Pi
        .cpu    cortex-a53
        .fpu    neon-fp-armv8
        .syntax unified         @ modern syntax

@ Constants for assembler
        .equ    arg3,0  @ args to printf
        .equ    arg4,8
        .equ    argSpace,16

@ Constants for assembler
        .section  .rodata
        .align  2
format:
        .asciz  "%f + %f = %f\n"

@ The program
        .text
        .align  2
        .global main
        .type   main, %function
main:
        sub     sp, sp, 24      @ space for saving regs
                                @ (keeping 8-byte sp align)
        str     r4, [sp, 4]     @ save r4
        str     r5, [sp, 8]     @      r5
        str     r6, [sp,12]     @      r6
        str     fp, [sp, 16]    @      fp
        str     lr, [sp, 20]    @      lr
        add     fp, sp, 20      @ set our frame pointer
        sub     sp, sp, argSpace  @ room to pass args

        vldr    s0, x           @ load x into fp reg
        vldr    s1, y           @ load y into fp reg
        vadd.f32 s2, s1, s0     @ fp add
        
        ldr     r0, formatAddr  @ pointer to format string
        vcvt.f64.f32  d5, s0    @ convert x to double
        vmov   r2, r3, d5       @ pass x in r2/r3
        vcvt.f64.f32  d6, s1    @ convert y to double
        vstr    d6, [sp, arg3]  @ pass y on stack
        vcvt.f64.f32  d7, s2    @ convert z to double
        vstr    d7, [sp, arg4]  @ pass z on stack
        bl      printf

        mov     r0, 0
        add     sp, sp, argSpace   @ deallocate arguments
        ldr     r4, [sp, 4]     @ restore r4
        ldr     r5, [sp, 8]     @      r5
        ldr     r6, [sp,12]     @      r6
        ldr     fp, [sp, 16]    @      fp
        ldr     lr, [sp, 20]    @      lr
        add     sp, sp, 24      @      sp
        bx      lr              @ return

        .align  2
x:
        .float  1.23
y:
        .float  4.56
formatAddr:
        .word   format

Listing 16.7.5. Addition of two floats. (prog asm)

The program in Listing 16.7.5 introduces five floating-point instructions.

VADD

Adds two floats or two doubles.

VADD{<c>}.F32  {<Sd>,} <Sn>, <Sm>    % float
VADD{<c>}.F64  {<Dd>,} <Dn>, <Dm>    % double

<c> is the condition code, Table 9.2.1.
<Sd> and <Dd> are the destination registers, and <Sm> and <Sn>, <Dm> and <Dn> are the source registers.

All numbers are stored in IEEE 754 format. In the “float” form, the value in <Sm> is added to the 32-bit value in <Sn> and the result is stored in <Sd>. In the “double” form, the 64-bit value in <Dm> is added to the value in <Dn> and the result is stored in <Dd>.

VCVT

Converts between a float and a double.

VCVT{<c>}.F32.F64  <Sd>, <Dm>    % double to float
VCVT{<c>}.F64.F32  <Dd>, <Sm>    % float to double

<c> is the condition code, Table 9.2.1.
<Sd> and <Dd> are the destination registers, and <Sm> and <Dm> and <Dn> are the source registers.

All numbers are stored in IEEE 754 format. In the “double to float” form, the 64-bit value in <Dm> is converted to a 32-bit value and stored in <Sd>. In the “float to double” form, the 32-bit value in <Sm> is converted to a 64-bit value and stored in <Dd>.

VLDR

Loads a value from memory into a floating-point register.

VLDR{<c>}     <Sd>, <label>    % float
VLDR{<c>}     <Dd>, <label>    % double

<c> is the condition code, Table 9.2.1.
<Sd> and <Dd> are the destination registers.
<label> is a programmer-labeled memory address. The labeled address must be within \(\pm 1,020\) bytes of the location of this instruction.

In the “float” form, the 32-bit value stored at the memory location is loaded into <Sd>. In the “double” form, the 64-bit value stored at the memory location is loaded into<Dd>. No format conversions are made.

VSTR

Stores a value in a floating-point register in memory.

VSTR{<c>}     <Sd>, <label>    % float
VSTR{<c>}     <Dd>, <label>    % double

<c> is the condition code, Table 9.2.1.
<Sd> and <Dd> are the source registers.
<label> is a programmer-labeled memory address. The labeled address must be within \(\pm 1,020\) bytes of the location of this instruction.

In the “float” form, the 32-bit value in <Sd> is stored at the memory location. In the “double” form, the 64-bit value in <Dd> is stored at the memory location. No format conversions are made.

VMOV

Transfers a 64-bit value between a floating point register and two integer registers.

VMOV{<c>}      <Dm>, <Rt>, <Rt2>    % to double
VMOV{<c>}      <Rt>, <Rt2>, <Dm>    % to int

<c> is the condition code, Table 9.2.1.
<Dm> is the floating point register. <Rt> is the low-order 32-bit portion and <Rt2> is the high-order 32-bit portion in the integer registers.

In the “to double” form, the two 32-bit values in <Rt2> and <Rt> are copied into <Dm> . In the “to int” form, the 64-bit value in <Dm> is copied into the <Rt2> and <Rt> registers. No format conversions are made.