ObjAsm
======

Arm Assembler
See the Acorn Assembler manual for further documentation.

This file details the changes to ObjAsm since version 4.00 (April 2011).
For details of the much more extensive changes between the previous major
release in 2002 and this version, see Appendix A of the manual.

Changes from ObjAsm 4.00 to 4.01
================================

Update to support VFP and Advanced SIMD instruction sets.

* 248 new instructions, including both pre-UAL and UAL forms
* 1 new pseudo-instruction: VMOV2
* 1 new built-in variable: {UAL}

Changes from ObjAsm 4.01 to 4.02
================================

ARMv7VE and ARMv8 support, plus numerous bugfixes

New settings accepted for the --cpu switch:

  8-A
  Cortex-M0plus
  SC000
  Cortex-R5
  Cortex-R5F
  Cortex-R5-rev1
  Cortex-R5F-rev1
  Cortex-R5F-rev1.sp
  Cortex-R7
  Cortex-R7.no_vfp
  Cortex-A7
  Cortex-A7.no_neon
  Cortex-A7.no_neon.no_vfp
  Cortex-A15
  Cortex-A15.no_neon
  Cortex-A15.no_neon.no_vfp
  PJ4
  PJ4.no_vfp
  Cortex-A57


New instructions:

  ARMv6 (instructions previously omitted in error):
    MCRR2, MRRC2
  ARMv7 optional extensions (mandatory in ARMv7VE; implemented in
                             Cortex-R5,R7,A15,A7):
    SDIV, UDIV
  ARMv7VE:
    ERET, HVC, MRS (banked), MSR (banked)
  ARMv8:
    CRC32{C}{B|H|W}, DMB/DSB (new options), HLT, LDA{B|H}, LDAEX{B|D|H}, SEVL,
    STL{B|H}, STLEX{B|D|H}, VCVT(A|M|N|P), VCVT(B|T) (double-precision),
    VMAXNM, VMINNM, VMULL.P64, VRINT(A|M|N|P|R|X|Z), VSEL
  ARMv8 cryptographic extension:
    AES(D|E|IMC|MC), SHA1(C|H|M|P|SU0|SU1), SHA256(H|H2|SU0|SU1)


New built-in variable:

{TARGET_ARCH_8_A} evaluates as {TRUE} if the selected CPU is of Arm
architecture 8-A.


Changes to errors and warnings:

* CPS, SRS and MRS CPSR_c can all reference hypervisor mode on ARMv7VE and
  later without generating a warning.
* Error "Instruction cannot be conditional in A32 instruction set" replaced
  in some cases with new error "Instruction cannot be conditional", because
  for some instructions (especially true of many of the ARMv8 extensions)
  this applies to their T32 encoding as well as their A32 encodings.
* New warning "Unable to substitute symbol" when an unescaped $ character is
  encountered on a source line, and ObjAsm is unable to substitute the
  following symbol name with either a macro parameter or a variable.
* New warning "May trigger StrongARM MSR bug if instruction is not
  idempotent" when the instruction following a conditional MSR CPSR_c, if
  the MSR is not executed, is likely be harmful to execute twice, as it
  would be on the StrongARM. The warning is suppressed for a number of
  common classes of instructions which are not a problem, but some false
  positive warnings remain in more obscure cases.


Bugfixes:

* Can now accept RISC OS path specifiers (ending with a colon) via the -I
  switch.
* Dynamic dependencies are now emitted when the BIN directive or the
  :FLOAD:, :FEXEC:, :FSIZE: or :FATTR: operators are encountered.
* :FLOAD:, :FEXEC:, :FSIZE: and :FATTR: now work on image files.
* Failure to parse LE, LO, LS or LT conditions on VABA, VABD, VADD, VMLA,
  VMLS, VMOV, VMUL, VPADD, VSHL or VSUB, or NE or NV conditions on VMOV,
  VRSHR or VSHR (but note that only a subset of those could be conditionally
  executed in the A32 instruction set, and even then only when handling
  floating point data).
* Incorrect conditions for generation of "Macro ignores suffix parameter"
  warning: this happened when the macro was invoked with a suffix parameter,
  but the definition of the macro didn't specify a label parameter. The
  test should of course have been if the definition lacked a suffix
  parameter instead, to match.
* The following warnings are no longer generated for LDM^ and STM^
  instructions:
    Deprecated instruction (LDM with SP in register list)
    Deprecated instruction (LDM with LR and PC in register list)
    Deprecated instruction (STM with SP or PC in register list)
    LDM or STM of single register is probably slower than LDR or STR
* PLI of a literal (PC-relative symbol) was misassembled as PLD.
  Note, PLI [pc,#label-(.+8)]  was unaffected.
* The Rt field of VMOV <Dm>, <Rt>, <Rt2> was encoded incorrectly (it was
  ORed with the bottom 4 bits of Dm). Note, VMOV <Rt>, <Rt2>, <Dm> was not
  affected.
* If a variable was both predefined from the command line and declared
  using GBLA/GBLL/GBLS, then an attempt to use the variable name before the
  GBLA/GBLL/GBLS would be faulted as undefined, rather than using the
  command-line definition.
* Warnings on the use of banked registers after a mode change aren't
  triggered when the instructions are in the NV condition code space.
* MSR CPSR_c, #&1F (system mode) now generates a warning on ARMv3.

Changes from ObjAsm 4.02 to 4.03
================================

Changes to pathname resolution in -desktop mode

Now delegates the handling of non-rooted filenames in -desktop mode to the
DDEUtils module; this mirrors the way -desktop mode works in the C compiler.

Benefits of this fix include

  * Fixes the bug in previous versions whereby the use of RISC OS path
    variable filespecs in GET or LNK directives failed
  * Other directives (INCBIN, :FSIZE: etc) also now use the -desktop path
  * The final directory separator in the input filename can now be ':'
    as well as '.'

Also fixed the pathname buffer lengths to permit the desktop path to be
up to 1024 characters.

The behaviour in non-desktop mode should be unaffected.

Change from ObjAsm 4.03 to 4.04
===============================

The mnemonic for CRC32 has been corrected to match the Arm ARM and the 
above tables, previously the assembler was looking for a syntax from a draft
specification.

Changes from ObjAsm 4.04 to 4.05
================================

Relaxed constraints on NEON VLDn and VSTn instructions

  * Accept ':' as an alternative to '@' when introducing alignment specifiers,
    per recent changed in Arm's guidance (possibly to avoid a clash with GCC's
    use of '@' for comments).
  * Permit Q registers in the register lists for VLD2, VLD3, VLD4, VST2, 
    VST3 or VST4. Whilst strictly speaking, the Arm ARM doesn't even permit
    them in VLD1 and VST1, the common use of such in ported code means that
    it is desirable to support it. Technically, the two halves of any 
    such Q register have different meanings, being loaded from or stored to a 
    different lane of the memory structure - take care if using this feature.

Changes from ObjAsm 4.05 to 4.06
================================

Bugfixes to NEON support

  * VABA and VABD did not assemble correctly when used with Q registers
  * VABD did not permit the dyadic syntax variant

Changes from ObjAsm 4.06 to 4.07
================================

The --cpu switch now accepts 8-A.32.crypto or 8-A.32 to follow the convention
armasm has moved to, and obsoletes the former 8-A name. Also adds:

  Cortex-A17
  Cortex-A53
  Cortex-A53.no_vfp
  Cortex-A53.no_neon.no_vfp

as possible CPU target names.

Changes from ObjAsm 4.07 to 4.08
================================

ObjAsm will now warn when it encounters a macro using a suffix parameter
which is a substring of a longer macro defintion, for example DoOP$type
and DoOPERATION. Other assemblers treat this condition as an error, so relying
on ObjAsm's behaviour would make your source code non-portable.

GET directives can now appear within a macro body.

Changes from ObjAsm 4.08 to 4.10
================================

Adds ARMv8.0 to ARMv8.5 support for those extensions that affect the AArch32,
A32 instruction set, plus numerous bugfixes.

New settings accepted for the --cpu switch:

  8-A.32.no_neon
  8-M.Base
  8-M.Main
  8-M.Main.dsp
  8-R
  8-R.crypto
  8-R.no_neon
  8.1-A.32
  8.1-A.32.crypto
  8.1-M.Main
  8.1-M.Main.dsp
  8.1-M.Main.mve
  8.1-M.Main.mve.fp
  8.2-A.32
  8.2-A.32.crypto
  8.2-A.32.crypto.dotprod
  8.2-A.32.dotprod
  8.3-A.32
  8.3-A.32.crypto
  8.3-A.32.crypto.dotprod
  8.3-A.32.dotprod
  8.4-A.32
  8.4-A.32.crypto
  8.5-A.32
  8.5-A.32.crypto
  8.6-A.32
  8.6-A.32.crypto
  Cortex-A12
  Cortex-A12.no_neon.no_vfp
  Cortex-A17.no_neon.no_vfp
  Cortex-A32
  Cortex-A32.crypto
  Cortex-A35
  Cortex-A35.crypto
  Cortex-A5.no_neon
  Cortex-A5.no_neon.no_vfp
  Cortex-A53.crypto
  Cortex-A55
  Cortex-A55.crypto
  Cortex-A57.crypto
  Cortex-A72
  Cortex-A72.crypto
  Cortex-A73
  Cortex-A73.crypto
  Cortex-A75
  Cortex-A75.crypto
  Cortex-A76
  Cortex-A76.crypto
  Cortex-A77
  Cortex-A77.crypto
  Cortex-A78
  Cortex-A78.crypto
  Cortex-M23
  Cortex-M33
  Cortex-M33.no_dsp
  Cortex-M33.no_dsp.no_fp
  Cortex-M33.no_fp
  Cortex-M35P
  Cortex-M35P.no_dsp
  Cortex-M35P.no_dsp.no_fp
  Cortex-M35P.no_fp
  Cortex-M4.no_fp
  Cortex-M55
  Cortex-M55.no_fp
  Cortex-M55.no_mve
  Cortex-M55.no_mve.no_fp
  Cortex-M55.no_mvefp
  Cortex-M7
  Cortex-M7.fp.sp
  Cortex-M7.no_fp
  Cortex-R5.no_vfp
  Cortex-R5.sp
  Cortex-R52
  Cortex-R52.crypto
  Cortex-R52.crypto.no_ras
  Cortex-R52.no_neon
  Cortex-R52.no_neon.no_ras
  Cortex-R52.no_ras
  Cortex-R8
  Cortex-R8.no_vfp
  Cortex-X1
  Cortex-X1.crypto

New instructions:

  ARMv8.0:
    ESB, SB
  ARMv8.1:
    SETPAN, QRDMLAH, VQRDMLSH
  ARMv8.2:
    VSMMLA, VSUDOT, VUMMLA, VUSDOT, VUSMMLA, VSDOT, VUDOT, VFMAL, VFMSL
    Pseudo-instructions VSUDOT (vector variant) and VSUMMLA assemble as VUSDOT
    and VUSMMLA respectively, with source registers exchanged
    BF16 half precision support
      .BF16 data type qualifiers for VCVT, VCVTB, VCVTT
      New instructions: VDOT, VFMAB, VFMAT, VMMLA
      New directives DCFB and DCFBU store BFloat16 constant data (aligned and
      unaligned respectively)
      Additional pseudo-instructions for VLDR-literal, VMOV-immediate and 
      VMOV2 that take BF16 data types
    FP16 half precision support
      .F16 data type qualifiers for VABD, VABS, VACGE, VACGT, VACLE, VACLT,
      VADD, VCEQ, VCGE, VCGT, VCLE, VCLT, VCMP, VCMPE, VCVT, VCVTA, VCVTM,
      VCVTN, VCVTP, VCVTR, VDIV, VFMA, VFMS, VFNMA, VFNMS, VLDR, VMAX, VMAXNM,
      VMIN, VMINNM, VMLA, VMLS, VMOV, VMUL, VNEG, VNMLA, VNMLS, VNMUL, VPADD,
      VPMAX, VPMIN, VRECPE, VRECPS, VRINTA, VRINTM, VRINTN, VRINTP, VRINTR,
      VRINTX, VRINTZ, VRSQRTE, VRSQRTS, VSELEQ, VSELGE, VSELGT, VSELVS, VSQRT,
      VSTR, VSUB
      New instructions: VINS, VMOVX
  ARMv8.3:
    VCADD, VCMLA, VJCVT
    Pseudo-instruction VCADD with rotation of 0 or 180 degress assembles to an
    VADD or VSUB instruction respectively
  ARMv8.4:
    TSB
  ARMv8.5:
    CSDB, PSSBB, SSBB

Improvements to existing functionality:

* VMRS now knows about the MVFR2 register (present in ARMv8.0).
* Add `UDF` as an alternative name for the `UND` pseudo-instruction.
* When LDRH= or LDRSH= are used for expressions that contain no
  forward-reference or relocated symbols, the literals are now packed to
  16-bit alignment within the literal pool.
* SN, DN and QN directives now permit definition of register symbols with
  BF16 element type
* SN and QN directives are extended to permit scalars to be specified in
  relation to S or Q registers. 64-bit element types are not permitted in QN
  and only 8-bit and 16-bit element types are permitted in SN. Note that when
  the value a symbol defined using SN or QN is evaluated in an expression
  (including using the :RCONST: operator) the value corresponds to the number
  of the D register containing the scalar.
* Additional VCVT pseudo-instructions for converting between half-precision
  floating-point numbers and fixed-point numbers with zero fractional bits:
  - for a single operation involving a 32-bit fixed-point number; or
  - for 4 or 8 parallel operations involving 16-bit fixed-point numbers
* Additional VMOV.U16 Sd,#constant pseudo-instruction. Analogous to the
  pre-existing VMOV.I32 Sd,#constant, this assembles to the
  VMOV.F16 Sd,#constant that sets Sd to the bit pattern that would be
  interpreted as the specified constant by integer instructions.
* Additional VMOV.F16 Sd,#constant pseudo-instructions where the constant is
  not an exact match for any of the available quarter-precision constants:
  these are rounded (to nearest with halves to even) to an available constant
  and a warning is emitted. This is analogous to how VMOV.F32 and VMOV.F64
  variants already behave. Note that VMOV.F16 Dd,#constant and
  VMOV.F16 Qd,#constant were already implemented as pseudo-instructions in
  earler versions of ObjAsm, and these do not perform rounding because the
  bit patterns available do not correspond to promoted quarter-precision
  numbers.
* When VLDR or VSTR specify an element size of 16 bits, the pre- or
  post-indexed writeback pseudo-instruction forms (which would otherwise be
  assembled as VLDM or VSTM) are no longer permitted.
* Change of behaviour of VLDR Sd,=expression pseudo-instructions where the
  element size is less than 32 bits: now all but the lowest-numbered elements
  are zeroed. To further reflect this, the permitted 8- and 16-bit element
  types are now restricted to .U8, .U16 and .F16. (When the target is a D
  register, the expression continues to be replicated across all elements,
  and all previously accepted element types are still supported.) This
  permits the pseudo-instruction to be substituted, on suitable CPUs, with
  VMOV.F16 Sd,#constant or VLDR.16 Sd,[pc,#offset], and is in line with the
  behaviour of other .F16 instructions which target S registers. Where
  VLDR.16 is used to load a constant from the literal pool, and the value of
  the constant is known in pass 1, the constants are packed into 16 bits
  within the literal pool.

New built-in variables:

  {TARGET_ARCH_8_R}
  {TARGET_ARCH_8_M_BASE}
  {TARGET_ARCH_8_M_MAIN}
  {TARGET_ARCH_8_A_32}
  {TARGET_ARCH_8_A_64}
  {TARGET_ARCH_8_1_A_32}
  {TARGET_ARCH_8_1_A_64}
  {TARGET_ARCH_8_2_A_32}
  {TARGET_ARCH_8_2_A_64}
  {TARGET_ARCH_8_3_A_32}
  {TARGET_ARCH_8_3_A_64}
  {TARGET_ARCH_8_4_A_32}
  {TARGET_ARCH_8_4_A_64}
  {TARGET_ARCH_8_5_A_32}
  {TARGET_ARCH_8_5_A_64}
  {TARGET_ARCH_8_6_A_32}
  {TARGET_ARCH_8_6_A_64}
  {TARGET_ARCH_A64}
  {TARGET_ARCH_AARCH32}
  {TARGET_ARCH_AARCH64}
  {TARGET_FEATURE_CRYPTOGRAPHY}
  {TARGET_FEATURE_DMB}
  {TARGET_FEATURE_HALFWORD}
  {TARGET_FEATURE_THUMB}
  {TARGET_FEATURE_WMMX}
  {TARGET_FEATURE_WMMX2}
  {TARGET_FPU_FZ_POSZERO}
  {TARGET_FPU_NONE}
  {TARGET_FPU_VFP_DOUBLE}
  {TARGET_FPU_VFP_SINGLE}
  {TARGET_FPU_VFPV5}
  {TARGET_FPU_VFPV6}

Changes to errors and warnings:

* Warnings about invalid ISB options added to match those for invalid
  DMB or DSB options.
* Use of instructions forbidden within an IT block are now flagged.
* Enforce the restriction that certain instructions are only permitted
  in the final position of an IT block, such as BLX.
* Generate new, more meaningful error "Previous IT block is still active" if
  you use IT instructions too close together
* Floating-point constant underflow to zero is now a warning
* Floating-point constant overflow to infinity is now a warning (previously
  this was an error for F16 constants and not warned for any other precisions)
* Narrowing of floating-point constants that are signalling NaNs is now a
  warning
* Warnings about LDM and STM where SP is in the list of registers now follow
  the updated guidance in section D.3 in later editions of the ARMv7-AR ARM
* The warning for STM with writeback, where the base register is also the
  1st in the list, has been removed as per the description in the ARMv8-A ARM

Bugfixes:

* Symbols declared using `ALIAS` directive are now included in the table
  output by the `-x` command-line option.
* Instructions following IT AL now assemble correctly, previously they were
  reported as an error.
* FCMPEZD previously required wrong register type
* When LDRSB=, LDRSH= or LDRH= were used and the literal pool was too distant,
  an error was reported but using unhelpful error message text.
* When LDRSB=, LDRSH= or LDRH= were used with one literal pool between
  256-4096 bytes before the instruction and another < 256 bytes after it,
  where the earlier literal pool contained the required literal, an error was
  emitted rather than placing the literal in the subsequent literal pool and
  referencing that.
* De-duplication of double-precision floating point or 64-bit integer literals
  could cause false positive or false negative matches, resulting in either
  incorrect functionality or wasted space in the output binary, respectively.
* When VLDR.i Dn,= used an expression that utilised a forward-reference
  symbol, ObjAsm crashed.
* VMULL with three data type qualifiers (or where the element types were
  inferred from the register symbols) didn't expect the destination elements
  to be wider than the source ones.
* Don't accept floating-point element types for VMVN, VAND, VBIC, VORN, VORR.
* Registers S16-S31 and D16-D31 were displayed using invalid hexadecimal in
  -list mode.
* Fixed {TARGET_FEATURE_DIVIDE} to correctly report the ability of various
  ARMv7-A CPUs to perform SDIV and UDIV.

Changes from ObjAsm 4.10 to 4.11
================================

Fixes a bug whereby occasionally, A2 encodings of VMOV (register) of the form
  VMOV<cond>.F64 <Dd>,<Dm>
when
* <cond> is omitted or set to "AL", would emit the A1 encoding instead
* any other <cond>, would emit a conditional EOR or RSB (immediate) instead

Changes from ObjAsm 4.11 to 4.12
================================

Changed the ARMv7-A Cortex CPU properties, selected by --cpu at the command
line, to assume the security extension (first available in ARM architecture
6Z) is present.

Added 'hdr' to the accepted name suffixes which will be translated
automatically when used in a file name spec, like 's' for a source files.
This means a GET Leafname.hdr => maps to hdr.Leafname on RISC OS
         and GET Leafname.s   => maps to s.Leafname on RISC OS
When expressed in this style, ObjAsm sources may be assembled on POSIX
systems and RISC OS alike.

Changes from ObjAsm 4.12 to 4.13
================================

A new unary operator :MDEF: is available which can be used to test whether a
macro has been defined, similar in use to :DEF: which tests symbols.
This operator is expected to be most useful for conditionals around
definitions of macros of the same name, to avoid the error that would
otherwise be generated.

The argument is a macro name (excluding any suffix argument), so for example

  MACRO
  Add$s
  MEND

  ! 0, :STR::MDEF:Add :CC: :STR::MDEF:Adds

will print TF when assembled.

A bug, where a macro invocation has a better match (meaning a shorter suffix
argument) for a macro definition later in the file than one earlier in the
file, has been fixed. ObjAsm would use the earlier definition in pass 1 and
the later one in pass 2, leading them to get out of step and fail in any one
of a multitude of ways.
