Technical information on 26/32-bit RISC OS binary interfaces      version 0.2
============================================================      05 Sep 2002


This document summarises information on the changes to the RISC OS binary
interfaces between 26 and 32-bit versions of RISC OS. Although details may be
subject to change, it is believed accurate, and changes will not be taken
lightly. Hence, it should provide a basis for developers to ensure that their
software is compatible with future systems.

Future systems may not contain IOMD/VIDC compatible peripheral sets.
This will introduce extra issues for device drivers - normal applications
should not be accessing hardware directly, so they should not be affected.
Information on support for different peripheral worlds will be made
available at a later date.

A lot of the information in this document is at a much lower level than the
majority of software developers require. Developers producing standard
Desktop applications written in C or BASIC will probably find the basic
"32-bit forward compatibility release" ReadMe file sufficient.



                                  GENERAL
                                  =======

A version of RISC OS will either be 26-bit, or 32-bit, with no in-between
state.

When running 32-bit, RISC OS will make no use of 26-bit modes, and will call
all routines in a 32-bit mode. Although programs could use 26-bit modes (if
available on the processor), there will be no run-time selection of whether
to call program entry points in 26 or 32 bit mode, and the memory map may
preclude the use of 26-bit modes (eg the RMA being above 64M). However, hooks
will be available to permit a 26-bit emulator.

When running 26-bit, RISC OS will make no more use of 32-bit modes than it
does currently, and the same restrictions on 32-bit code will apply as now.

Any system running on an ARM 9/10/XScale etc will have to be 32-bit.

Software should be modified to work whether it is called in a 26-bit or a
32-bit mode. For pure C applications and modules, this can just be a
recompile; the compiled code will then run on both existing 26-bit systems,
back to RISC OS 3.1, and 32-bit systems. A new Shared C Library will be
required to support 32-bit programs on old systems.

Assembler modules should also run on either 26 or 32-bit systems; however to
achieve this use of MSR and MRS instructions will often be required to
manipulate the PSR - if this cannot be avoided the module will become RISC OS
3.1 incompatible. (RISC OS 3.1 ran on 26-bit only ARMv2 processors which
did not support MRS/MSR).

Most of the differences can be hidden inside macros - use macros instead of
using TEQP directly, and 26/32-bit forms can be selected at compile time by
build switches. You will either have a module that uses TEQP that will work
on 26-bit versions of RISC OS, or a module that uses MSR which will work on
RISC OS 3.5 or later. With a bit more effort, run-time selection is possible
(some examples are shown below).

The 32-bit RISC OS API is largely unmodified - almost all binary APIs will
act the same in a 32-bit system as they do now, except that they will be
called in a 32-bit version of the documented processor mode.

One side-effect of this is that R14 on entry to routines will contain just
the return address, with no flags. Hence to preserve flags, the CPSR must be
stacked on entry, and restored on exit. This is cumbersome, but can be hidden
inside macros. Note that this behaviour is then slightly different - you are
preserving flags across the call, not restoring the flags passed in in R14.
Most of the time this doesn't matter, as the API is documented in terms of
preserving flags. There are exceptions to this rule, notably SWI handling.

On a 32-bit system, SWIs are no longer expected to preserve the N Z and C
flags. They may still set/clear them to return results. 32-bit code should
not assume that SWIs preserve flags. Requiring flag preservation would impose
an unacceptable burden on SWI dispatch. This effectively respecifies *all*
SWIs by changing the default rule in PRM 1-29. Also, it becomes impossible
for SWIs outside the Kernel to depend on the NZCV flags on entry. SWIs inside
the Kernel, such as OS_CallAVector, can still manipulate flags freely. This
should not be an onerous restriction; it is impossible to specify entry flags
for SWIs in C or BASIC, for example.

Many existing APIs do not actually require flag preservation, such as service
call entries. In this case, simply changing MOVS PC... to MOV PC... and LDM
{}^ to LDM {} is sufficient to achieve 32-bit compatibility.

A flags word, pointed to by a new module header entry, contains a bit which
is used to indicate that the module is 32-bit ready. It is therefore
essential that all modules are updated to add this flag, even if they are
otherwise 32-bit clean.


                               PSR MANIPULATIONS
                               =================

To just set and clear NZCV flags you can use macros which do the right
thing for the different processor types. To actually preserve flags, you will
probably be forced to use MRS and MSR instructions. These are NOPs on pre-
ARM 6 ARMs, so you may be able to do clever stuff to keep ARM2 compatibility.

Some example macros are supplied here in Libraries.Hdr - set the logical
switches No26bitCode and No32bitCode as required, then GET Hdr:CPU.Generic26
and Hdr:CPU.Generic32. "No26bitCode" means don't rely on 26-bit instructions
(eg TEQP and LDM ^) - the code will work on 32-bit systems. "No32bitCode" means
don't rely on 32-bit instructions (eg MSR and MRS) - the code will work on
RISC OS 3.1. Setting both to {TRUE} is too much for the macros to cope with -
you will have to use run-time code as shown below.

The recommended general-purpose code to check whether you're in a 26-bit mode
is:

        TEQ     R0, R0          ; sets Z (can be omitted if not in User mode)
        TEQ     PC, PC          ; EQ if in a 32-bit mode, NE if 26-bit

As another example, here is the case of calling a SWI from an IRQ routine:

        TEQ     PC, PC
        MRSEQ   R8, CPSR
        MOVNE   R8, PC
        ORR     R9, R8, #3      ; IRQ26->SVC26, IRQ32->SVC32
        MSREQ   CPSR_c, R9
        TEQNEP  R9, #0
        NOP                     ; NOP to avoid problems on ARM2
        STR     R14, [R13, #-4]! ; faster than STMFD on some new processors
        SWI     XOS_AddCallBack
        LDR     R14, [R13], #4   ; ditto
        TEQ     PC, PC
        MSREQ   CPSR_c, R8
        TEQNEP  R8, #0
        NOP

(Theoretically one could engineer for the TEQP to occur before the MSR and
hence have the MSR be the required NOP for the ARM 2, but this would in
turn hit the StrongARM bug detailed below.)

The complexity of the above example occurs because of the need to support
pre-ARM 6 processors that don't have the MRS and MSR instruction (ie
RISC OS 3.1 machines). If RISC OS 3.1 support is not required, it reduces to:

        MRS     R8, CPSR
        ORR     R9, R8, #3      ; IRQ26->SVC26, IRQ32->SVC32
        MSR     CPSR_c, R9
        STR     R14, [R13, #-4]! ; faster than STMFD on some new processors
        SWI     XOS_AddCallBack
        LDR     R14, [R13], #4   ; ditto
        MSR     CPSR_c, R8

This is possible because the MRS and MSR instructions are available on ARM6
and ARM7 processors even when running in 26-bit mode.

Sometimes you may be forced to play with the SPSR registers. Beware:
interrupt code will corrupt SPSR_svc if it calls a SWI. Existing interrupt
handlers know to preserve R14_svc before calling a SWI, but not SPSR_svc.
Hence you MUST disable interrupts around SPSR manipulations; the SPSR is not
suitable as a general mechanism for PSR restoration on function return.

Disabling Interrupts
--------------------
The I-bit in the CPSR is at a different position to the I-bit in the PC in
26-bit mode.

To disable interrupts on a 26-bit system:

        MOV     R14,PC          ; turn off interrupts
        TST     R14,#&08000000
        TEQEQP  R14,#&08000000
        ...
        TEQP    R14,#0          ; restore interrupt state

If RISC OS 3.1 support is not required, this can be replaced with:

        MRS     R14,CPSR
        ORR     R0, R14, #&80
        MSR     CPSR_c, R0      ; not conditional for StrongARM (see below)
        ...
        MSR     CPSR_c, R14     ; restore interrupt state

StrongARM MSR bug
-----------------
The StrongARM has a bug in its implementation of MSR which should be born
in mind, particularly when attempting to write 26/32-bit neutral code (cf
the SWI call from IRQ mode example above).

The bug is triggered when:

     a) the processor is in a privileged mode,
     b) the last instruction was an MSR to the CPSR with the 'c' bit set,
     c) the MSR was not executed due to the condition test failing,
     d) and the MSR was not the last instruction in a cache line.

In this circumstance, the instruction following the MSR will be executed
twice. To avoid this problem, there are a number of approaches, depending
on how cautious you wish to be:

     1) follow all MSR CPSR_c[xsf] instructions with a NOP, much like TEQP;
     2) don't use conditional MSR CPSR_c[xsf] instructions;
     3) follow all conditional MSR CPSR_c[xsf] instructions with a NOP;
     4) ensure all instructions following conditional MSR CPSR_c[xsf]
        instructions are idempotent, ie can be executed multiple times
        without ill effects. This then excludes instructions like BL
        or SWI, instructions with the same source and destination
        register, or loads and stores with base register writeback.

This bug only affects the StrongARM processor, but is present in all
current revisions.


                                 MODULE ENTRIES
                                 ==============

Most module entries are treated the same in the 32-bit world, except they
will be entered in a 32-bit mode, and hence R14 will be a return address
with no flags. This section also clarifies flag significance on 26-bit
systems.

Any given system will only be 26 or 32-bit, so it is possible to note the
system type in your initialisation routine by checking the processor mode,
rather than checking on every entry point.

Init,Final
----------
Unchanged. Flag preservation not required - only V on exit is looked at.

Service
-------
Unchanged. Flags on exit ignored.

Help/command
------------
Unchanged. Flag preservation not required - only V on exit is looked at.

SWI decoding code
-----------------
Unchanged. Flag preservation not required - V on exit is looked at in number
to text case.

SWI handler
-----------
On 26 bit systems, R14 is a return address (inside the Kernel) with the
user's NZCIF flags in it, V clear, mode set to SVC. The CPSR NZCV
flags on exit are then passed back to the SWI caller. Hence MOVS PC,R14
preserves the SWI caller's NZC flags and clears V. The NZ and C flags in the
current PSR on entry are undefined, and are NOT the caller's (but V is
clear). Thus you can simply read, modify and preserve the caller's flags.

On 32 bit systems, R14 is a return address only. There is no way of
determining the caller's flags, so you are not expected to preserve them. The
NZC and V flags you exit with will be returned to the caller.

If writing a new module, simply specify that all your SWIs corrupt flags,
then your SWI dispatchers can return with MOV PC,R14, regardless of whether
running on a 26 or 32 bit system.

If converting an existing module to run on 32-bit, it is highly recommended
that the same binary continue to work on 26-bit systems. You should therefore
take steps to preserve flags when running in a 26-bit mode, if the module did
before. When running on a 32-bit system, you needn't preserve flags. The
following wrapper around the original SWI entry (converted to be 32-bit safe)
achieves this, assuming you always want NZ preserved on a 26-bit system.

        Push    R14
        BL      Original_SWI_Code ; NZ(C) corrupted, (C)V set
        Pull    R14
        TEQ     PC,PC             ; are we in a 32-bit mode?
        MOVEQ   PC,R14            ; 32-bit exit: NZ corrupted, CV passed back
      [ PassBackC
        BICCC   R14,R14,#C_bit    ; Extra guff to pass back C as well
        ORRCS   R14,R14,#C_bit
      ]
        MOVVCS  PC,R14            ; 26-bit exit: NZ(C) preserved, V clear
        ORRVSS  PC,R14,#V_bit     ; 26-bit exit: NZ(C) preserved, V set

This is cumbersome, but it can be removed when backwards compatibility is no
longer desired. The alternative, which would be to pass in caller flags in
R14, would impose a permanent carbuncle on the 32-bit API.

Module flags
------------
This is a new module header entry at &30. It is an offset to the module
flags word(s). The first module flag word looks like:

          Bit 0       Module is 32-bit compatible
          Bits 1-31   Reserved (0)

Non 32-bit compatible modules will not be loaded by a 32-bit RISC OS.
If no flags word is present, the module is assumed to be 26-bit compatible.



                              ENVIRONMENT HANDLERS
                              ====================

Undefined instruction handler
-----------------------------
32 bit system: Now called in UND32 mode. No preveneer.
26 bit: as before

Prefetch abort, data abort
--------------------------
32 bit system: Now called in ABT32 mode. No preveneer.
26 bit: as before

On a 32-bit system, there will be an Abort mode stack.

Error
-----
32 bit system: USR32 mode. PC contains no PSR flags.
26 bit: as before - PC contains PSR flags, but may not be reliable.

BreakPoint
----------
32 bit system: register block must be 17 words long.
               contains R0-R15,CPSR.
               entered in SVC32 mode
26 bit: as before

Handlers can check format by looking at mode on entry to handler - the
correct 26 or 32-bit version of the handler should be called as apprpriate.

The following code is suitable to restore the user registers and return in
the 32-bit case:

        ADR     R14, saveblock          ; get address of saved registers
        LDR     R0, [R14, #16*4]        ; get user PSR
        MRS     R1, CPSR                ; get current PSR
        ORR     R1, R1, #&80            ; disable interrupts to prevent
        MSR     CPSR_c, R1              ; SPSR_SVC corruption by IRQ code
        MSR     SPSR_cxsf, R0           ; put it into SPSR_SVC
        LDMIA   R14, {R0-R14}^          ; load user registers
        MOV     R0, R0                  ; no-op after forcing user mode
        LDR     R14, [R14, #15*4]       ; load user PC into R14_SVC
        MOVS    PC, R14                 ; return to correct address and mode

Escape
------
32 bit system: as before, but called in SVC32

Event
----
32 bit system: as before, but in IRQ32 or SVC32

Exit
----
32 bit system: as before, but in USR32

Unused SWI
----------
26 bit system: called in SVC26 mode.
               R14 = a return address in the Kernel, with NZCVIF flags the
                     same as the caller's (except V clear).
               PSR flags undefined (except I+F as caller)
32 bit system: called in SVC32 mode.
               R14 = return address in the Kernel
               No way to determine caller condition flags
               PSR flags undefined (except I+F as caller)

UpCall
------
32 bit system: as before, but SVC32 mode

CallBack
--------
32 bit system: register block must be 17 words long.
               contains R0-R15,CPSR.
               entered in SVC32 mode, IRQs disabled
26 bit: as before

Handlers can check format by looking at mode on entry to handler.

The following code is suitable to restore the user registers and return
in the 32-bit case:

        ADR     R14, saveblock          ; get address of saved registers
        LDR     R0, [R14, #16*4]        ; get user PSR
        MSR     SPSR_cxsf, R0           ; put it into SPSR_SVC/IRQ
        LDMIA   R14, {R0-R14}^          ; load user registers
        MOV     R0, R0                  ; no-op after forcing user mode
        LDR     R14, [R14, #15*4]       ; load user PC into R14_SVC/IRQ
        MOVS    PC, R14                 ; return to correct address and mode

Exception registers
-------------------
32 bit system: block must be 17 words long.
               will contain R0-R15,PSR
Exception handlers can determine block format by looking at mode on entry
to handler.


                         SOFTWARE VECTORS
                         ================

Software vectors have a number of different properties. They can be called
under a variety of conditions, and the flags they exit with may or may not
be significant.

When called using OS_CallAVector, the caller's NZCV flags always used to be
passed in in R14, and the claimant's flags on exit would be passed back.

In a 32-bit system, the caller's flags are not passed in R14. Their C and V
flags are visible in the PSR though, just as in a 26-bit system. N and Z are
not visible. Again, exit flags are passed back.

Most vectors are not intended to be called with OS_CallAVector, and their
exit flags have never had significance, for example KeyV, EventV and TickerV.

Others are vectored SWIs, such as ByteV and ReadLineV. These pass back
C and V flags only.

A few vectors, like RemV, attach significance to entry flags. If not
claiming, you mustn't change those flags for the next callee. In 26-bit mode
this might have been achieved by:

        CMP     R1,#mybuffer
        MOVNES  PC,LR

In the 32-bit world, you could change the CMP to a TEQ to preserve C and V,
or you could use something like:

        Push    R14
        MRS     R14, CPSR
        CMP     R1, #maxbuffers
        BLS     handleit
        MSR     CPSR_f, R14
        Pull    PC
handleit
        ...

                            EXPANSION CARDS
                            ===============

Expansion card headers may contain loaders. These must be 32-bit compatible
to work on a 32-bit system. 32-bit compatibility is indicated by the fifth
word of the loader header containing "32OK". Expansion card loader entry
points are always called with V in the PSR being clear, even on existing
systems, so MOV PC, R14 is an adequate non-error return in the simplest case,
rather than the BICS PC, R14, #V_bit shown in the PRM. Loaders that are not
32-bit compatible will be faulted with an error.


                               MEMORY MAP
                               ==========

The memory map of a 32-bit system will be considerably different. In
particular, the RMA, screen memory, ROM and usually I/O space will all be
differently located. Application space will remain based at &8000. Use system
calls to find the address of locations.

Privileged mode stacks (SVC, IRQ, UND and ABT) will move, but will remain
based on a one megabyte boundary.

IOMD-compatible systems may optionally map I/O space in in the traditional
locations (03000000 and 88000000). However, this disrupts the memory map and
limits the maximum size of applications.


                           MISCELLANEOUS SWIS
                           ==================

The behavior of some SWIs has been changed for 32-bit systems. For example,
the entry parameters to OS_HeapSort, OS_SubstituteArgs and OS_ReadLine use a
register to hold a 26 bit address with flags in the spare bits. In a 32-bit
environment the register can only be used to hold the address and the flags
will move to another register. The details of such calls have changed will be
supplied in a separate document.

For backwards compatibility a new module will be developed for existing
versions of RISC OS. It will intercept calls to these SWIs and if it sees the
new format in use it will adjust the registers before calling the original
SWI. This module can be distributed with any software that uses the new form
of the SWIs. Alternatively, software may be written to adapt depending on the
version of the OS that is detected.

OS_EnterOS
----------
On a 32-bit system, if called in a 26-bit mode, takes you into SVC26, else
into SVC32.

History
-------
v0.2  5 Sep 2002 (PS)  Added IRQ example code.
                       Added NOP after TEQP to example calling a SWI
                       from an IRQ routine.
                       Added details of OS_ReadLine and SWI changes.


END.
