FPU assembler programming
By Erik H. Bakke
Written 13/10-93
--- I ------------------------- INTRODUCTION ---------------------- I ---
1.1 Introduction
Many people have asked me to explain how to program the 68881/68882/040
floating point coprocessors, and here it is, a guide in the "magic art"
I have tried to keep this text as system neutral as possible, but it
may, as the other articles in this series be influenced by the fact
that I usually program at Amiga computers.
If you need more information about the topics discussed herein, please
contact the author.
1.2 Index
Chapter I--------Introduction--------------------
1.1 Introduction
1.2 Index
Chapter II-----The Coprocessor interface---------
2.1 The Interface mechanics
2.2 The Floating Point Coprocessor
Chapter III----Floating Point Programming--------
3.1 Floating Point Data Formats
3.2 Floating Point Constant ROM
3.3 Floating Point Instructions
1...Data transfer instructions
2...Dyadic operations
3...Monadic operations
4...Program control instructions
5...System control instructions
Chapter IV-------The 68040 FPU-------------------
4.1 Differences
4.2 Instruction set
Chapter V------------Sources---------------------
5.1 Sourcecodes
--- II -------------------THE COPROCESSOR INTERFACE -------------- II ---
2.1 The Interface Mechanics
A coprocessor may be thought of as an extension to the main CPU,
extending its register set and instructions.
Different coprocessors that can be interfaced to the 68020+ CPU's
are the 68881/2 FPU and 68851 MMU.
Coprocessor instructions are placed inline with ordinary CPU codes,
all recognized by being LINE-F instructions. (Having the op-code
format of $Fxxx) In assembler, they are generally noted as cpXXXX
instructions.
The coprocessors require a communication protocol with the CPU for
various reasons:
-----------------------------------------------------------
1. The CPU must recognize that a coprocessor is to receive
the LINE-F op-code, and establish contact with that
coprocessor.
2. The coprocessor may need to signal it's internal status
to the CPU.
3. The coprocessor may need to read/write data to/from
system memory or CPU registers.
4. The coprocessor may have to inform the CPU of error
conditions, such as an illegal instruction or divide by
zero. The CPU will have to process the corresponding
exceptions.
-----------------------------------------------------------
This protocol is called the MC68000 coprocessor interface. Knowledge
of this interface is not required of a programmer who wishes to
utilize a coprocessor, therefore I will not go into specific detail
about the interface, but briefly sum up the main mechanisms.
The coprocessor instructions are F-line instructions that have
all bits in the upper nibble set to generate $Fxxx op-codes.
Up to 8 coprocessors may reside on the bus (The Amiga coprocessors
are NOT part of this system, and should not be counted in).
Each of these co-processors have their own 3-bit address.
Two such addresses are reserved by Motorola:
%000 MC68851 PMMU
%001 MC68881/2 FPU
It is perfectly possible to install 6 FPU's in the same
system.
The general format of a coprocessor op-code is shown below:
15 1211 9 8 6 5 0
================================
1 1 1 1 Cp-ID Type Instruction dependent
Followed by a number of coprocessor defined extension words and
effective address extension words.
If the instruction is not accepted by the coprocessor it is
addressed to (if the CP is not present) the CPU will take
an F-line exception.
2.2 The Floating Point Coprocessor
The Motorola floating point coprocessor has the number 68881 or
68882. The 68882 is considerably faster than the 881, due
to optimized internal design. In addition, the 68040 CPU has an
internal FPU has is even faster than the 882. There are other
differences between the 881/2 and the 040, but I'll return to
those later. The 68881 and 68882 are pin compatible, and available
in speeds up to 20Mhz and 50MHz respectively
The FPU implements IEEE compatible floating point formats, and
implements instructions to perform arithmetics on these formats,
as well as several trancendental functions, such as SIN(x),E^x
and so on. In addition the FPU has an on-chip constant ROM where
different mathematical constants are stored.
2.2.1 The floating point registers
The FPU has 8 floating point registers, each 80 bits wide.
These are named FP0-FP7, just as D0-D7 in the CPU. In addition
the FPU have 3 32-bit registers:
Control Register FPCR
31.................15..........7..........0
Exeception Mode
Enable Control
Status Register FPSR
31.......23........15..........7..........0
Condition Quotient Accrued Exception
Codes Exception Status
Instruction Address Register FPIAR
31.......23........15..........7..........0
2.2.1.1 Floating point data registers
The data registers always contain an 80 bit wide extended precision
floating point number. Before any floating point data is used in
calculation, it is converted to extended-precision.
For example, the instruction FMOVE.L #10,FP3 converts the
longword #10 to extended precision before transferring it to register
FP3. All calculations with the FPU uses the internal registers
as either source or destination, or both.
2.2.1.2 Floating Point Status Register
This register is split in two bytes, the exception enable byte, and
the mode control byte.
2.2.1.2.1 Exception Enable byte
This register contains a bit for each of the possible eight
exceptions that may be generated by the FPU. Setting or clearing
one of these bits will enable/disable the corresponding exception.
The exception bytes are organized this way:
Bit Name Meaning
========================
7 BSUN Branch/Set on UNordered
6 SNAN Signalling Not A Number
5 OPERR OPerand ERRor
4 OVFL OVerFLow
3 UNFL UNderFLow
2 DZ Divide by Zero
1 INEX2 INEXact operation
0 INEX1 INEXact decimal input
2.2.1.2.2 Mode control byte
This register controls the rounding modes and rounding precisions.
A result may be rounded or chopped to either double, single or
extended precision. For most usage of the FPU, however, this
register could be set to all zeroes, which will round the result
to the nearest extended precision value.
Mode control byte:
Bit Name Meaning
========================
7 PREC1 Precision bit 1
6 PREC0 Precision bit 0
5 RND1 Rounding bit 1
4 RND0 Rounding bit 0
3 ---- -----
2 ---- -----
1 ---- -----
0 ---- -----
PREC=00 Round to extended precision
PREC=01 Round to single precision
PREC=10 Round to double precision
RND=00 Round toward nearest possible number
RND=01 Round toward zero
RND=10 Round toward the smallest number
RND=11 Round toward the highest number
2.2.1.3 Floating point Status Register
This register is just what you may think it is, the parallell to
the CPU CCR register, and reflects the status of the last
floating point computation. The quotient byte is used with
floating point remaindering operations. The exception status
byte tells what exceptions that occured during the last operation.
The accrued exception byte contains a bitmask of the exceptions
that have occurred since the last time this field was cleared.
Status bits:
Bit Name Meaning
=============================
7 ---- -----
6 ---- -----
5 ---- -----
4 ---- -----
3 N Negative
2 Z Zero
1 I Infinity
0 NAN Not A Number
2.2.1.4 Floating point Instruction Address Register
This register contains the address of the instruction currently
executing. The FPU can execute instructions in parallell with
the CPU, so that time-consuming instrcutions, such as division
and multiplication don't tie up the CPU unnecessary. This means
that if an exception occurs in the floating point operation,
the address that are pushed on to the stack is not necessarily
the address of the instruction that caused the exception.
The exception handler would have to read this register to find
the address of the offending instruction.
--- III ----------------FLOATING POINT PROGRAMMING ------------- III ---
3.1 Floating Point Data Formats
The FPU can handle 3 integer formats, and 2 IEEE compatible formats.
In addition, it has an extended-precision format and can handle a
Packed-Decimal Real format.
3.1.1 Integer Formats
The 3 integer formats that are supported by the FPU are compatible
with the formats used by the 68000 CPU's. They are Byte (8 bits),
Word (16 bits), and Longword (32 bits).
3.1.2 Real Formats
The FPU supports 4 real formats, the Single precision (32 bits),
Double precision (64 bits), Extended precision (80 bits), and
Packed-decimal string (80 bits)
3.1.2.1 Single Precision
The single precision format is indicated with the extension .S
It consists of a 23-bit fraction, an 8-bit exponent, and 1 bit
indicating the sign of the fraction. The Single Precision format
is defined by IEEE and uses excess-127 notation for the exponent.
A hidden 1 is assumed as the most significant digit of the fraction.
The format is defined as follows:
30 22 0
S EEEEEEEE FFFFFFFFFFFFFFFFFFFFFFF
S=Sign of fraction
E=Exponent
F=Fraction
The single precision format takes 4 bytes when written to memory.
3.1.2.2 Double Precision
The double precision format is indicated with the extension .D
It consists of a 52-bit fraction, an 11-bit exponent, and 1 bit
indicating the sign of the fraction. As the single precision,
this format is also defined by the IEEE, and uses excess-1023
notation for the exponent. A hidden 1 is assumed as the most
significant digit of the fraction. The format is defined as
follows:
62 51 0
S EEEEEEEEEEE FFFFFFF........FFFFFFFFFF
S=Sign of fraction
E=Exponent
F=Fraction
The double precision format takes 8 bytes when written to memory.
3.1.2.3 Extended precision
The extended precision is indicated with the extension .X
This is the format that is used in all computations, and
consists of a 64-bit mantissa, a 15-bit exponent and 1 bit
indicating the sign of the mantissa. A hidden 1 is not assumed,
so all digits of the mantissa are present. Excess-16383
is used for the exponent. When data of this format is written
to memory, it is "exploded" by 16 zero-bits between the mantissa
and the exponent in order to make it longword aligned.
The extended-precision format is defined as follows:
79 63 0
S EEEEEEEEEEEEEEE MMMMMMMMMMMMMMM............MMMMM
When written to memory, it looks like this:
94 80 63 0
S EEEEEEEEEEEEEEE 000...000 MMMMMMMM...........MMM
When written to memory, this format takes 12 bytes when written
to memory
3.1.2.4 Packed-decimal string
To simplify input/output of floating point numbers, a special
96-bit packed-decimal format.
This format consists of 17 BCD digits mantissa, some padding
bits, 4 BCD digits exponent, 2 control bits, 1 bit indicating
the sign of the exponent, and 1 indicating the sign of the
mantissa.
Bits 68-79 are stored as zero bits unless an overflow occurs
during the conversion.
Positive and negative infinity is represented by numbers that are
outside the range of the floating point representation used.
If the result of an operation has no mathematical meaning, a NAN
is produced. In the case of a NAN or infinity, bits 92 and 93
are both 1.
3.2 Floating point Constant ROM
The FPU have an on-chip ROM where frequently used mathematical
instructions are stored. How to retrieve these constants will be
discussed below. Each constant has its own address in the ROM:
Offset Constant
=============================
$00 Pi
$0b Log10(2)
$0c e
$0d Log2(e)
$0e Log10(e)
$0f 0.0
$30 ln(2)
$31 ln(10)
$32 10^0
$33 10^1
$34 10^2
$35 10^4
.
.
.
$3e 10^2048
$3f 10^4096
3.3 Floating Point Instructions
The FPU provides an extension to the normal 68000 instruction set.
The floating point instructions can be divided into 5 groups:
1...Data transfer instructions
2...Dyadic instructions (two operands)
3...Monadic instructions (one operand)
4...Program control instructions
5...System control instructions
The syntax and addressing modes for these instructions are the same
as those of the ordinary 68000 instructions.
3.3.1 Data Transfer Instructions
Instruction Syntax Op. Sizes Operation
==================================================================
FMOVE FPm,FPn | X | The FMOVE instruction
FMOVE ,FPn | B/W/L/S/D/X/P | copies data from the
FMOVE FPm, | B/W/L/S/D/X | source operand to the
| | destination operand
==================================================================
FMOVE FPm,(#k) | P | When writing a .P real, an
FMOVE FPm,(Dn) | P | optional rounding precision
| | may be specified as a constant
| | or in a data register
| | See below for details
==================================================================
FMOVE ,FPcr | L | These two FMOVE's copies
FMOVE FPcr, | L | data to/from control registers
==================================================================
FMOVECR #ccc,FPn | X | This instruction retrieves a
| | constant from the ROM, where
| | ccc is the offset to be read
==================================================================
FMOVEM , | L/X | Moves multiple FP registers
FMOVEM ,Dn | X | The register list may be
FMOVEM , | L/X | specified as in 68000 assembler,
FMOVEM Dn, | X | or be contained as a bitmask
| | in a data register
==================================================================
When writing floating point numbers to the memory using the .P
format, an optional precision may be specified as a constant or
in a data register.
Meaning of the precision:
-64<=k<=0 Rounded to |k| decimal places
0,FPn
Instruction Function
====================================================
FABS | Absolute value
FACOS | FP Arc Cosine
FASIN | FP Arc Sine
FATAN | FP Arc Tangent
FATANH | FP Hyperbolic Arc Tangent
FCOS | FP Cosine
FCOSH | FP Hyperbolic Cosine
FETOX | FP e^x
FETOXM1 | FP e^(x-1)
FGETEXP | Get exponent
FGETMAN | Get mantissa
FINT | FP Integer
FINTRZ | Get integer and round down
FLOGN | FP Ln(n)
FLOGNP1 | FP Ln(n+1)
FLOG10 | FP Log10(n)
FLOG2 | FP Log2(n)
FNEG | Negate a floating point number
FSIN | FP Sine
FSINH | FP Hyperbolic Sine
FSQRT | FP Square Root
FTAN | FP Tangent
FTANH | FP Hyperbolic Tangent
FTENTOX | FP 10^x
FTWOTOX | FP 2^x
====================================================
There is one more monadic operation that uses a double
destination operand, the FSINCOS instruction:
FSINCOS ,FPc:FPs Calculates sine and cosine
FSINCOS FPm,FPc:FPs of the same argument
All trigonometric operations operate on values in
radians.
3.3.4 Program Control Instructions
3.3.4.1 Instructions
This group of instructions allows control of program flow based
on condition codes generated by the FPU. These instructions are
parallells to the 68000 instructions with the same names.
Instruction Formats Operation
=====================================================================
FBcc