Assembler Language Reference Manual for the Sun Workstation Sun Microsystems, Inc. • 2550 Garcia Avenue • Mountain View, CA 94043 • 415-960-1300 Part No: 800-1179-01 Revision 3 of 15 May, 1985 Credits and Acknowledgements This Assembly Language Reference Manual for the Sun Workstation started life as an edited ver- sion of the MICAL Manual for the Intel 8080, written by Mike Patrick; transformed by James L. Gula and Thomas J. Teixeira, March 1980; revised by Henry McGilton at Unisoft Systems of Berkeley Corporation during March 1982; rewritten by Henry McGilton and Richard Tuck, of Sun Microsystems, during October and November 1982. Trademarks Multibus is a trademark of Intel Corporation. Sun Workstation is a trademark of Sun Microsystems Incorporated. UNIX is a trademark of Bell Laboratories. Copyright ® 1983 by Sun Microsystems. This publication is protected by Federal Copyright Law, with all rights reserved. No part of this publication may be reproduced, stored in a retrieval system, translated, transcribed, or transmit- ted, in any form, or by any means manual, electric, electronic, electro-magnetic, mechanical, chemical, optical, or otherwise, without prior explicit written permission from Sun Microsystems. Revision History Preface This manual is the Programmer’s Reference Manual for as — the assembler for the UNEXf system running on the Sun Workstation. As converts source programs written in Assembler Language into a form that the linker utility, ld( 1) will turn into a program that is runnable on the UNIX operating system. As provides the assembly language programmer with a minimal set of facilities to write programs in assembly language. Since the majority of programming is done in high level languages, as doesn’t provide any elaborate macro facilities or conditional assembly features. It is assumed that the volume of assembly code produced is so small that these facilities aren’t required. This manual describes the syntax and usage of the as assembler for the Motorola MC68010 microprocessor. The basic format of as is loosely based on the Digital Equipment Corp Macro-11 assembler described in DEC’s publication DEC-11-0MACA-A-D but also contains elements of the UNIX PDP-11 as( 1) assembler. The instruction mnemonics and effective address format are derived from a Motorola publication on the MC68000: the MACSS MC68000 Design Specification Instruction Set Processor dated June 30, 1979. This is a reference manual as opposed to a treatise on writing in assembly language. It is assumed that the reader is familiar with the concepts of machine architecture, the reasons for an assembler, the ideas of instruction mnemonics, operands, and effective address modes, and assem- bler directives. It is also assumed that the reader is familiar with the MC68010 processor, its instruction set, its addressing modes, and especially the irregularities in them. t UNIX is a trademark of Bell Laboratories. Contents Chapter 1 Introduction 1-1 Chapter 2 Elements of Assembly Language 2-1 Chapter 3 Expressions 3-1 Chapter 4 Layout of an Assembly Language Source Program 4-1 Chapter 8 Assembler Directives 5-1 Chapter 8 Instructions and Addressing Modes 8-1 Appendix A Error Codes A-l Appendix B List of As Opcodes B-l — vii — Contents Preface j v Chapter 1 Introduction 1.1. How to Use the Assembler 1-1 1.2. Notation 1-2 1.3. Further Reading 1-3 Chapter 2 Elements of Assembly Language 2-1 2.1. Character Set Which the Assembler Recognizes 2-1 2.2. Identifiers 2-1 2.3. Numeric Labels 2-2 2.4. Local Labels 2-2 2.5. Scope of Labels 2-2 2.6. Constants 2-3 2.7. Numeric Constants 2-3 2.8. String Constants 2-4 2.9. Assembly Location Counter 2-4 Chapter 3 Expressions 3-i 3.1. Operators 3-1 3.2. Terms 3-2 3.3. Expressions 3_2 3.4. Absolute, Relocatable, and External Expressions 3-2 Chapter 4 Layout of an Assembly Language Source Program 4-1 4.1. Label Field 4_1 4.2. Operation Code Field 4-2 4.3. Operand Field 4_2 4.3.1. Register Operands 4-3 4.4. Comment Field 4.3 4.5. Direct Assignment Statements 4-4 Chapter S Assembler Directives 5*4 5.1. .ascii — Generate Sequence of Character Data 5-2 5.2. . asciz — Generate Zero Terminated Sequence of Character Data 5.3 5.3. .byte, .word, .long — Generate Data 5-3 — ix — 5.4. .text, .data, .bss — Switch Location Counter 5-4 5.5. .skip — Advance the Location Counter 5-5 5.6. . lcomm — Reserve Space in .bss Area 5-5 5.7. .globl — Designate an External Identifier 5-6 5.8. .comm — Define Name and Size of a Common Area 5-6 5.9. . even — Force Location Counter to Even Byte Boundary 5-6 Chapter 0 Instructions and Addressing Modes 0-1 6.1. Instruction Mnemonics 6-1 6.2. Extended Branch Instruction Mnemonics 6-1 6.3. Addressing Modes 6-2 6.4. Addressing Categories 6-4 Appendix A Error Codes A-l A.l. Usage Errors A-l A. 2. Assembler Error Messages A-l Appendix B List of As Opcodes 0=1 x Tables Table 5-1 Assembler Directives 5-2 Table 6-1 Addressing Modes 6-3 Table 6-2 Addressing Categories 6-5 — xi — ) Chapter 1 Introduction 1.1. How to Use the Assembler By convention, the assembly language source code of the program should be in a file with a suffix. Suppose that your program is in a file called parts. s. To run the assembler, type the command: tutor ial% a® parts As runs silently (if there are no errors), and generates a file called a. out. As also accepts several command line options. These are: — o file Place the output of the assembler in file. — R Make initialized data segments read only (actually the assembler places them at the end of the .text area). — L Keep local (compiler generated) symbols that start with the letter L. This is a debugging feature. If the — L option is omitted, the assembler discards those symbols and does not include them in the symbol table. — j Make all jumps to external symbols (jsr and jmp) PC relative rather than long absolute. This is intended for use when the programmer knows that the program is short. If there are any externals which are too far away, the loader will complain when the program is linked. — J Suppress span-dependent instruction calculations and force all branches and calls to take the most general form. This is used when assembly time must be minimized, but program size and run time are not important. — h Suppress span-dependent instruction calculations and force all branches to be of medium length, but all calls to take the most general form. This is used when assembly time must be minimized, but program size and run time are not important. This option results in a smaller and faster program than that produced by the — J option, but some very large pro- grams may not be able to use it because of the limits of the medium-length branches. — d2 This is intended for small stand-alone programs. The assembler makes all program refer- ences PC relative and all data references short absolute. Note that the — j option does half Revision E of 15 May 1985 1-1 Introduction Assembly Language Reference Manual this job anyway. Readers should also consult the UNIX Programmer’s Manual page for the man entry on as. 1.2. Notation The notation used in this chapter is a somewhat modified Backus-Naur Form (BNF). A string of characters on its own stands for itself, for example: WIDGET is an occurrence of the literal string ‘WIDGET’, and: 1983 is an occurrence of the literal constant 1983. An element enclosed in < and > signs is a non- terminal symbol, and must eventually be defined in terms of some other entities. For example, stands for the syntactic construct called ‘identifier’, which is eventually defined in terms of basic objects. A syntactic object followed by an ellipsis: . . . denotes one or more occurrences of . Syntactic objects occuring one after the other, as in: simply means an occurrence of first thing followed by second thing . Syntactic elements separated by a vertical bar sign ( J), as in: | means an occurrence of or but not both. Brackets and braces define the order of interpretation. Brackets also indicate that the syntax described by the subexpression they enclose is optional. That is: [ ] denotes zero or one occurrences of , while: { \ } denotes a or a , followed by a . 1-2 Revision E of 15 May 1985 Assembly Language Reference Manual Introduction 1.3. Further Reading Motorola MC68010 16-bit Microprocessor Programmer’s Reference Manual. Revision E of 15 May 1985 1-3 Chapter 2 Elements of Assembly Language This chapter covers the lexical elements which comprise an assembly language program. The next chapter discusses the rules for expressions and operand formation. Topics covered in this chapter are: © Character set which the assembler recognizes, ® Rules for identifiers , @ Syntax for numeric constants, • Syntax for string constants, © Rules for comments, • Layout of an assembly language source statement. An assembly language program is ultimately constructed from characters. Characters are com- bined to make up lexical elements or tokens of the language. Combinations of tokens then form assembly language statements, and sequences of statements then form an assembly language pro- gram. This section describes the basic lexical elements of as. 2.1. Character Set Which the Assembler Recognises As recognizes the following character set: © The letters A through Z and a through z. ® The digits O through 9. © The ASCn graphic characters — the printing characters other than letters and digits. • The ASCII non-graphics: space, tab, carriage return, and newline (also known as line feed). 2.2. Identifiers Identifiers are used to tag assembler statements (where they are called labels), as the location tag for data, and as the symbolic names of constants. An identifier in an as program is a sequence of from 1 to 255 characters from the set: # Upper case letters A through Z. Revision E of 15 May 1985 2-1 Elements of Assembly Language Assembly Language Reference Manual ® Lower case letters a through z. ® Digits 0 through 9. © The characters underline ( _ ), period ( . ), and dollar sign ( $ ). The first character of an identifier must not be numeric. Other than that restriction, there are a few other points to note: ® All 255 characters of an identifier are significant and are checked in comparisons with other identifiers. • Upper case letters and lower case letters are considered distinct, so that ki"t_of_part:s and KIT_OF_PARTS are two different identifiers. ® Although the period ( . ) and dollar sign ( $ ) characters can be used to construct identifiers, they are reserved for special purposes (pseudo-ops for instance) and should not appear in user-defined identifiers. Examples of Identifiers Grab_Hold Widget Pot_of_Message MAXNAME 2.3. Numeric Labels A numeric label consists of a digit 0 to 9 followed by a colon. As in the case of name labels, a numeric label assigns the current value of the location counter to the symbol. However, several numeric labels with the same digit may be used within the same assembly. References of the form nb refer to the first numeric label n backwards from the reference; nf symbols refer to the first numeric label n /orwards from the reference. 2.4. Local Labels Local labels are a special form of identifier which are strictly local to a control section. Local labels provide a convenient means of generating labels for branch instructions and such. Use of local labels reduces the possibility of multiply defined labels in a program, and separates entry point labels from local references, such as the top of a loop. LocaL labels cannot be referenced from outside of the current assembly unit. Local labels are of the form n$ where n is any integer. Valid local labels include: 1$ 27$ 394$ 2.5. Scope of Labels The scope of a label is the ‘distance’ over which it is visible to other parts of the program which want to reference it. An ordinary label which tags a location in the program or data is visible only within the current assembly. An identifier which is designated as an external identifier via a .globl directive is visible to other assembly units at link time. 2-2 Revision E of 15 May 1985 Assembly Language Reference Manual Elements of Assembly Language Local labels have a scope, or span of reference, which extends between one ordinary label and the next. Every time an ordinary label is encountered, all previous local labels associated with the current location counter are discarded, and a new local label scope is created. The following example illustrates the scopes of the different kinds of labels: first: addl dO, dl | creates a new local label scope 100$: addqw bees #7,d3 | 100$ | first appearance of 100$ branches to the label above second : andl #0x7 f f , d4 | above 100$ has gone away 100$: empw beqs dl , d3 | 100$ | this is a different 100$ branches to the previous instruction third: mow beqs d0,d7 | 100$ | now 100$ has gone away again generates an error message if no 100$ below The labels first, second , and third all have a scope which is the entire source file containing them. The first appearance of the local label 100$ has a scope which extends between first and second. The second appearance of the local label 100$ has a scope which extends between second and third. After the appearance of the label third, the branch to 100$ will generate an error message because that label is no longer defined in this scope. 2.6. Constants There are two forms of constants available to as users, namely numeric constants and string con- stants. All constants are considered absolute quantities when they appear in an expression (see section 3 for a discussion on absolute and relocatable expressions). 2.7. Numeric Constants As assumes that any token which starts with a digit is a numeric constant. As accepts numeric quantities in either decimal (base 10), hexadecimal (base 16), or octal (base 8) radices. Numeric constants can represent quantities up to 32 bits in length. Decimal numbers consist of between one and ten decimal digits (0 through 9). The range of decimal numbers is between —2,147,483,648 and 2,147,483,647. Note that you can’t have com- mas in decimal numbers even though they are shown here for readability. Note also that decimal numbers can’t be written with leading zeros, because a numeric constant starting with a zero is taken as either an octal constant or a hexadecimal constant, as described below. Hexadecimal constants must start with the notation Ox (zero-ex) and can then have between one and eight hexadecimal digits. The hexadecimal digits consist of the decimal digits O through 9 and the hexadecimal digits a through f or A through F. Octal constants must start with the digit 0. There can then be from one to 11 octal digits (0 through 7) in the number. But note that 11 octal digits is 33 bits, so the largest octal number is 037777777777. The assembler generates an error message if the decimal digits 8 and 9 appear in an octal constant. Revision E of 15 May 1985 2-3 Elements of Assembly Language Assembly Language Reference Manual 2.8. String Constants A string is a sequence of ASCII characters, enclosed in quote signs ". Within string constants, the quote sign is represented by a backslash character followed by a quote sign. The backslash character itself is represented by two backslash characters. Any other character can be represented by a backslash character followed by one, two, or three octal digits. The table below shows the octal representation of some of the more common non printing characters. Character Octal Representation Backspace 010 Horizontal Tab on Newline (Line-Feed) 012 Form-Feed 014 Carriage-Return 015 2.9. Assembly Location Counter The assembly location counter is the period character ( . ). It is colloquially known as dot. When used in the operand field of any statement, dot represents the address of the first byte of the statement. Even in assembler directives, dot represents the address of the start of that assembler directive. For example, if dot appears as the third argument in a . long directive, the value placed at that location is the address of the first location of the directive — dot is not updated until the next machine instruction or assembler directive. For example: Ralph: xnovl . , aO | load value of Ralph into aO At the beginning of each assembly pass, the assembler clears the location counter. Normally, consecutive memory locations are assigned to each byte of generated code. However, the loca- tion where the code is stored may be changed by a direct assignment altering the location counter: . = This must not contain any forward references, and must not change value from one pass to another. Storage may also be reserved be advancing dot. For example, if the current value of dot is 1000, the direct assignment statement: Table: .=.+0x100 reserves 256 bytes (100 hexadecimal) of storage, with the address of the first byte as the value of Table. The next instruction is stored at address OxllOO. Also see the .skip assembler direc- tive for another means of achieving the same effect. 2-4 Revision E of 15 May 1985 Assembly Language Reference Manual Elements of Assembly Language The value of dot is always relative to the start of the current control section. For instance: . = 0x1000 does not set dot to absolute location 0x1000, but to location 0x1000 relative to the start of the current control section. This practice is not recommended. Revision E of 15 May 1985 2-5 ) Chapter 3 Expressions Expressions are combinations of operands (numeric constants and identifiers) and operators, forming new values. The sections below define the operators which as provides, then gives the rules for combining terms into expressions. 3.1. Operators Identifiers and numeric constants can be combined, via arithmetic operators, to form expres- sions. As provides unary operators and binary operators, described below. Unary Operators Operator Function Description — unary minus Returns the two’s complement of its following argu- ment. • logical negation Returns the one’s complement (logical negation) of its following argument. Operator Function Binary operators Description addition Arithmetic addition of its arguments. — subtraction Arithmetic subtraction of its arguments. a multiplication Arithmetic multiplication of its arguments. / division Arithmetic division of its arguments. Note that division in as is integer division, which truncates to- wards zero. Each operator is assumed to work on a 32-bit number. If the value of a particular term occupies only 8 bits or 16 bits, the short quantity is sign extended to a full 32-bit value. Revision E of 15 May 1985 3-1 Expressions Assembly Language Reference Manual 3o2* Terms A term is a component of an expression. A term may be one of the following: © A numeric constant, whose 32-bit value is used. The assembly location counter, known as dot, is considered a number in this context. © An identifier. • An expression or term enclosed in parentheses () . Any quantity enclosed in parentheses is evaluated before the rest of the expression. This can be used to alter the normal left-to-right evaluation of expressions — for example, differentiating between a*b+c and a* (b+c) or to apply a unary operator to an entire expression — for example, — (a*b+c) . ® A term preceded by a unary operator. For example, both double_plus_ungood and ~double_plus_ungood are terms. Multiple unary operators can be used in a term. For example, — positive has the same value as positive. 3.3* Expressions Expression are combinations of terms joined together by binary operators. An expression is always evaluated to a 32-bit value. If the operand only requires a single byte value (a .byte directive or an addq instruction, for example) the low order eight bits of the expression are used. If the operand only requires a single 16-bit word value (a .word directive or an movem instruc- tion, for example) the low order 16 bits of the expression are used. Expressions are evaluated left to right with no operator precedence. Thus 1 + 2*3 evaluates to 9, not 7. Unary operators have precedence over binary operators since they are considered part of a term, and both terms of a binary operator must be evaluated before the binary operator can be applied. A missing expression or term is interpreted as having a value of zero. In this case, an Invalid expression error is generated. An Invalid Operator error means that a valid end-of-line character or binary operator was not detected after the assembler processed a term. In particular, this error is generated if an expres- sion contains an identifier with an illegal character, or if an incorrect comment character was used. 3.4. Absolute, Relocatable, and External Expressions When an expression is evaluated, its value is either absolute, relocatable, or external: An expression is absolute if its value is fixed. © An expression whose terms are constants is absolute. 3-2 Revision E of 15 May 1985 Assembly Language Reference Manual Expressions ® An identifier whose value is a constant via a direct assignment statement is absolute. 9 A relocatable expression minus a relocatable term is absolute, where both items belong to the same program section. An expression is relocatable if its value is fixed relative to a base address, but will have an offset value when it is linked or loaded into memory. All labels of a program defined in relocatable sec- tions are relocatable terms. Expressions which contain relocatable terms must only add or subtract constants to their value. For example, assuming the identifiers widget and blivet were defined in a relocatable sec- tion of the program, then the following demonstrates the use of relocatable expressions: Expression Description widget is a simple relocatable term. Its value is an offset from the base address of the current control section. widget+5 is a simple relocatable expression. Since the value of widget is an offset from the base address of the current control section , adding a constant to it does not change its relocatable status. widget*2 Not relocatable. Multiplying a relocatable term by a constant invalidates the relocatable status. 2— widget Not relocatable , since the expression cannot be linked by adding widget’s offset to it. widget— blivet Absolute, since the offsets added to widget and blivet cancel each other out. An expression is external (or global) if it contains an external identifier not defined in the current program. With one exception, the same restrictions on expressions containing relocatable identifiers apply to expressions containing external identifiers. The exception is that the expres- sion widget —b 1 ivet is incorrect when both widget and blivet are external identifiers — you cannot subtract an exter- nal relocatable expression. In addition, you cannot multiply or divide any relocatable expression. Revision E of 15 May 1985 3-3 Chapter 4 Layout of an Assembly Language Source Program An as program consists of a series of statements. Several statements can be written on one line, but statements cannot cross line boundaries. The format of a statement is: [< label field>] [ < op-code> [< operand field>] } It is possible to have a statement which consists of only a label field. The fields of a statement can be separated by spaces or tabs. There must be at least one space or tab separating the op-code field from the operand field, but spaces are unnecessary elsewhere. Spaces may appear in the operand field. Spaces and tabs are significant when they appear in a character string (for instance, as the operand of an . ascii pseudo-op) or in a character con- stant. In these cases, a space or tab stands for itself. A line is a sequence of zero or more statements, optionally followed by a comment, ending with a < newline> character. A line can be up to 4096 characters long. Multiple statements on a line are separated by semicolons. Blank lines are allowed. The form of a line is: [< statement > [ ; < statement > ...]][ j < comment> ] 4.1. Label Field Labels are identifiers which the programmer may use to tag the locations of program and data objects. The format of a