, 		     Alpha Word/Byte Instruction Emulation" 		               Life of a Project 			    Design & Functional Spec    				Burns Fisher) 		        July 14, 1995 (Viva la France!)    1.  Statement of Problem   1.1 Background  N ECO 81 to the Alpha SRM proposes to add unsigned byte and word fetch and storeN instructions to the Alpha architecture.  Specifically, these instructions are:   Opcode		Mneumonic	Description   + 0E		STB		Store Byte from Register to Memory + 0D		STW		Store Word from Register to Memory 9 0A		LDBU		Load Zero-Extended Byte from Memory to Register 9 0C		LDWU		Load Zero-Extended Word from Memory to Register   1C.0000		SEXTB		Sign Extend Byte  1C.0001		SEXTW		Sign Extend Word  M Although this ECO has not been officially approved yet, it seems likely to be > approved; I'm told it is already implemented in the EV56 chip.  O In addition, there are many other ECOs at various stages of approval which will # add additional instructions in EV6.   I The problem with adding new instructions to the architecture is that code N which uses them will not run on older machines without some sort of emulation.K While there is some discussion going on about having compilers produce "fat N binaries" which would contain both new and old instruction sequences, it seemsK certain that there will be at least some images which will contain only new I sequences.  It is important that these images not fail on older machines.   M For this reason, the Alpha Architecture Committee/Dick Sites/Supnik/Avery has P asked all three OS platforms to provide emulation for the byte/word instructionsO for use with older Alpha chips.  The emulation is defined as being for backstop K use only.  Performance is not an objective; only correctness.  We have been O asked to be able to deploy an emulator both on VMS V6.2 and on Theta and later.    1.2 What this project will do   F For this project, we will implement the emulation of the new byte/wordN instructions as part of the OpenVMS kernel, and to deal with any repurcussionsP involved with emulation.  Further, the emulator will be designed such that other% new instructions can be added easily.   O For the sake of easier discussion, I am dividing the project into three parts:  L A basic "subset" emulator, a complete emulator, and "frills".   (If you haveN read the investigation report, notice the change in the definiton of frills).   G The subset emulator will emulate all the new instructions accurately if J there are no failures and if the instruction is encountered in user mode. M However, it will not reflect exceptions (most likely ACCVIO) correctly to the P user program.  The user program will see the failing PC pointing to the emulatorL rather than to its own instruction.  If the instruction is encountered in anJ inner mode, an OPCDEC exception will be signalled just as if there were noN emulator. The subset will be designed such that it can be delivered on OpenVMS+ V6.2 by replacing a single, low-risk image.   L The complete emulator will emulate all the new instructions in all modes andP IPLs, will reflect ACCVIOs correctly as though the instruction had been executedO in hardware (for low IPLs), will keep counts of emulated instructions, and (for J low IPLs) can be put into a mode such that it will signal an informationalO condition, SS$_EMULATE.  Either the debugger or an application-supplied handler M can catch the signal or a message will be printed to sys$output. The complete L emulator requires modification to EXCEPTION.EXE as well as added global dataL cells, and thus requires at least some latent support in any OpenVMS version that it is to run on.   M With both the subset and the complete emulator, debuggers (Debug, Delta, SCD, J Xdelta) will see no sign that emulated instructions are not being executed in hardware.  I Neither emulator will correctly emulate byte/word reads and writes to I/O M space.  If the access is in an inner mode, the subset emulator will never see O the OPCDEC generated.  In all other cases, the emulator will attempt the access L as it would for any location; this means that the I/O space location will beN accessed as a longword or quadword, may be accessed multiple times, and in the: case of a STB or STW it will be read before it is written.  M "Frills" could include additional features like a system service interface to L both process and global counts and to turn on and off the SS$_EMULATE signalL mentioned above.  These frills seem unnecessary for the moment, and will not" be mentioned further in this plan.  $ 1.3 What is NOT part of this project  H This project makes no attempt to address strategies other than emulationE in VMS proper (for example, "fat binaries" or emulation in PAL code).   K This project will not deal with other aspects of the new instructions.  For K example, this project will not add support for the new instructions in SDA, H Xdelta, SCD, or Debug.  (This project will, however, deal with any hooksL requried by these components specifically because of emulation).  Changes toJ these components will be documented in plans from their respective groups.  P Although the emulator design produced by this project will accomodate additionalI new opcodes, this project will not actually implement support for any new > opcodes except the byte/word operations listed in section 1.1.   2. Functional Specification    2.1  Operation   2.1.1 Default Behavior  F On an Alpha machine which implements the new instructions in hardware,! the emulator will have no effect.   K On other Alpha processors, the emulator will get control when the Alpha CPU O tries to execute a new instruction. The emulator will perform the same function F that the Alpha SRM defines the new instruction to perform, assuming no exceptions are generated.   O For the subset emulator, or for the complete emulator if the Alpha processor is H running above IPL 2 (ASTDEL), when the new instruction would generate anI exception (as defined in the SRM), the exception frame, signal array, and K mechanism array will show that the exception happened in the emulator code.   L For the complete emulator at IPL 2 or below, it will appear to all exceptionL handlers established in the new instruction's call frame or above (includingL vectored handlers from the primary and secondary vectors) that the exceptionM happened at the location of the new instruction, just as it would have if the * new instruction were executed in hardware.  J In addition, the complete emulator will save several pieces of information  when an instruction is emulated:  7 	*  The number of instructions emulated in each process 6 	   context and the number emulated in system context.  C 	*  The virtual address of the last 5 emulated instructions (except  	    in system context).  0 2.1.2 Additional Behavior Which Can Be Commanded  M If commanded to do so (see section 2.2.3) the emulator will change the OPCDEC P exception it received to a new informational message, SS$_EMULATE, and resignal.L This signal can be used by the debugger to drive a "SET BREAK EMULATED_INST"O command, and by the last chance handler to let the user know that the processor 3 is encountering instructions that must be emulated.   N The emulator can also be commanded to signal SS$_EMULATE only a limited numberO of times (see 2.2.3).  The number of times to signal is reset at image rundown.    2.2  Interfaces    2.2.1 Running the emulator  C No user or system manager action is required to start the emulator.   I The subset emulator will reside in VMS$IEEE_HANDLER.EXE, and thus will be N automatically merged into a process the first time the process takes an OPCDEC exception in user mode.   L The complete emulator will reside in EXCEPTION.EXE (mostly in pagable code)," and thus will always be available.  I In either case, the emulator code runs when the processor takes an OPCDEC O exception (in user mode for the subset emulator), determines whether the OPCDEC K was caused by an instruction that it can emulate, and returns to the normal  OPCDEC path if not.    2.2.2 Output from the emulator  L All output from the emulator (aside from the direct results of executing theO emulated instructions) appear in data cells in the CPU database and in P1 space N of a process that executes an emulated instruction.  The names and meanings of these cells are:  I     CPU$L_EMULATE_COUNT:	Count of instructions emulated on this CPU since 2 				boot time in system context.  Wraps to 0 after 				2**32 emulations.   J     CTL$GL_EMULATE_COUNT:	Count of instructions emulated for this process.2 				Wraps to 0 after 2**32 emulations.  Is cleared 				by image rundown.   K     CTL$GQ_EMULATE_PC_RING:	The first entry in a ring buffer of 5 quadwords 0 				containing the virtual address of the last 54 				instructions that were emulated in this process./ 				All 5 entries are cleared by image rundown.   H     CTL$GQ_EMULATE_PC_RING_END:	The last entry in the above ring buffer.  K     CTL$GQ_EMULATE_RING_PTR:	A pointer to the next entry in the ring buffer * 				to be filled.  Initialized to point to, 				CTL$GQ_EMULATE_PC_RING at image rundown.   2.2.3 Input to the emulator   M All input to the emulator (aside from the data used directly by executing the M emulated instructions) appear in P1 space cells of a process that executes an A emulated instruction.  The names and meanings of these cells are:        CTL$GQ_EMULATE_SIGNAL   C        This cell is a quadword whose bits represent groups of Alpha L        instructions.  The mapping between bits and instruction groups is theP        same as for the AMASK instruction, defined in the SRM. In particular, forN        this implementation, bit 0 represents byte/word instructions.  If a bitM        is set, then if an instruction in the group represented by that bit is P        emulated (at IPL 2 or below) with no error, the emulator turns the OPCDECO        exception into an SS$_EMULATE informational message and resignals.  This I        signal can be used by the debugger to implement a break-on-emulate H        command or can notify the user of an emulated instruction via the-        last-chance handler's call to $PUTMSG.   M        Bits which are not yet assigned may be set to either 0 or 1.  Thus, it	N        is safe to set this cell to FFFFFFFFFFFFFFFF to signify "signal for anyG        emulation" or to 0 to signify "do not signal for any emulation".o  H        This cell is initialized to FFFFFFFFFFFFFFFF on image rundown andM        process creation.  (Perhaps there should be a dynamic sysgen parametera!        to set the initial value.)o      CTL$GL_EMULATE_SIGNAL_MAX  F 	This unsigned longword contains the maximum number of times per imageH 	that an SS$_EMULATE message will be signalled.  If CTL$GL_EMULATE_COUNTC 	is greater than or equal to this value (after being incremented to	B 	count an emulation), the SS$_EMULATE will be supressed regardless' 	of the value of CTL$GQ_EMULATE_SIGNAL.v  E 	This cell is initialized to 4 on image rundown and process creation.tG 	(Perhaps there should be a dynamic sysgen parameter to set the initial  	value.)   3.0 Emulator Designd   3.1 Basic Design Decisions  J This section contains the basic decisions about the emulator design that IP had to make near the beginning of the process.  These decisions were made duringJ the initial investigation stage, and this section is a near-duplicate of a, similar section in the investigation report.   3.1.1 OS vs PALcodee  N There is little choice in this aspect of the emulator.  PALcode emulation will? not be written, and Bob Supnik has asked all three OS platformslL to emulate the ECO 81 instructions with code in the operating system itself,  9 Thus, the emulation will be done as part of the VMS Exec.v  / 3.1.2  Sharing code with the other OS platformss  N Since DEC Unix and NT also have to implement an emulator, I considered whetherF we should try to keep a common code base with them to reduce the totalN development time.  After talking with both groups and looking at the prototypeG code for NT and for DEC Unix, I have concluded that full sharing is not K practical.  The largest part of the additional code required is OS-specificlP (hooking the OPCDEC exception, dealing with exceptions, etc); we could not shareK this anyway.  The actual decode and emulation of the instructions, which we M might conceivably share, is so simple that the overhead of maintaining common: sources is not worthwhile.   3.1.3  Language choice  N Hooks are required in pieces of the exec (such as EXCEPTION.MAR).  These hooksG will be coded in the language of the routine being modified, of course.i  N The choice of implementation language for new code seems to be Alpha Assembler or C.n   	3.1.3.1 Assembler  : 	Pros:	o We can control the exact behavior of the emulator> 	Cons:   o Assembler is more difficult and error prone than C.B 		o The use of Assembler is not in accordance with VMS policy when 		  it is not necessary.  
 	3.1.3.2 C  @ 	Pros:	o C is easier to use and less error prone than Assembler.@ 		o C's "volatile" attribute appears to be defined to do exactly 		  what we need for atomicity.cA 		o C handles both aligned and unaligned accesses without requir-e 		  ing an alignment fault.aA 		o C now has an "ASM" command which allows one to specify speci-s2 		  fic Alpha machine instructions when necessary.B 	Cons:	o The exact behavior of the emulation might change with new 		  compilers.  O Since "volatile" seems to specify exactly the behavior we want, I don't believehK that we care about other details of how C implements a byte and word store.sI In addition, if it is necessary, we can use ASM to "hard code" a specificaM Alpha instruction sequence.  Therefore, I have chosen C as the implementation = language for any new modules that we create for this project.    3.2  The Subset Emulator  * 3.2.1  High-level description of algorithm  K The subset emulator will be shipped as an add-on to OpenVMS Alpha V6.2.  IneP order to minimize the risk and the potential changes in behavior of V6.2, I haveK chosen to incorporate the emulator in the image VMS$IEEE_HANDLER.EXE.  This G image is merged with the current image by EXCEPTION_ROUTINES when thereeE is an OPCDEC exception in user mode.  Its original job was to emulateaP IEEE floating point instructions for cases where the instruction is not emulatedP in the hardware or when emulation is required to find the location of an inexact HPARITH fault.  : 3.2.2  Specific Changes and Algorithms for Subset Emulator  I This section specifies the locations that will be modified for the subset ; emulator, as well as the specific algorithms I plan to use.e  G The following modules, all in SYS,  will require changes for the subset.	 emulator:O  
 o IEEE_INST.Ho  E   This module is the header file for IEEE_INST.C, and contains opcodeuF   definitions for all the Alpha opcodes. It appears to be unrelated toE   ALPHA_OPCODES.SDL, and thus needs to be updated separately with therE   new opcodes.  In the prototype, I made the following changes (whichi>   I expect to be identical to changes in the production code):  
 	************i# 	File BUILDS:[SYS.SRC]IEEE_INST.H;2n% 	  189   #define op_ldbu         0x0a.% 	  190   #define op_ldq_u        0x0bd% 	  191   #define op_ldwu         0x0ce% 	  192   #define op_stw          0x0dn% 	  193   #define op_stb          0x0et% 	  194   #define op_stq_u        0x0f  	******a# 	File BUILDS:[SYS.SRC]IEEE_INST.H;1 % 	  189   #define op_opc0a        0x0an% 	  190   #define op_ldq_u        0x0ba% 	  191   #define op_opc0c        0x0c.% 	  192   #define op_opc0d        0x0dh% 	  193   #define op_opc0e        0x0en% 	  194   #define op_stq_u        0x0fn
 	************n
 	************o# 	File BUILDS:[SYS.SRC]IEEE_INST.H;2lE 	  207   #define op_sext         0x1c            /* sext b/w group */w% 	  208   #define op_pal1d        0x1de 	******M# 	File BUILDS:[SYS.SRC]IEEE_INST.H;1r% 	  207   #define op_opc1c        0x1cd% 	  208   #define op_pal1d        0x1dh
 	************a
 	************a# 	File BUILDS:[SYS.SRC]IEEE_INST.H;2A 	  410   /*t* 	  411    * op_sext: Sign extension group. 	  412    */$ 	  413   #define sext_sextb      0x0$ 	  414   #define sext_sextw      0x1 	  415$ 	  416   #endif /* _ALPHA_INST_H_ */ 	****** # 	File BUILDS:[SYS.SRC]IEEE_INST.H;12$ 	  410   #endif /* _ALPHA_INST_H_ */
 	************e  ' 	Number of difference sections found: 3a' 	Number of difference records found: 12   
 o IEEE_INST.Ca  I   This module currently contains the routine ieee_handler(chf$mech_array, M   chf$signal_array) which is called as a special exception handler before therL   primary handler whenever EXCEPTION encounters an OPCDEC (or an HPARITH) inG   user mode.  The current purpose of this module is to emulate any IEEEsL   instructions that are not supported in hardware, and to emulate a range ofJ   floating point instructions which may have given an inexact exception to1   determine which instruction gave the exception.v  <   The subset emulator will reside completely in this module.  J   MAINLINE CHANGES: Rather than try to integrate the byte/word instructionM   emulation into the IEEE_HANDLER code the prototype checks as we first entereJ   IEEE_HANDLER to see if we have a non-fp instruction which we know how to(   emulate.  If so, we call a new routineN   emulate_byte_word(pc,chf$mech_array,chf$signal_array) and return passing the7   return value of emulate_byte_word back to our caller.e  M   Routine emulate_byte_word calls the new routine find_context to establish arN   VMS INVO_CONTEXT_BLOCK for the routine in which the original OPCDEC occurredH   using the following algorithm (which is similar to one already used in   IEEE_HANDLER):  <   - Get the current context using LIB$GET_CURR_INVO_CONTEXT.O   - The the previous context in the call chain using LIB$GET_PREV_INVO_CONTEXT. <     If LIB$GET_PREV_INVO_CONTEXT fails, return SS$_RESIGNAL.I   - If the new context's R29 matches the frame pointer in the exception'smK     mechanism array, we have found the right context; return it.  Otherwise !     go back to the previous step.s  N   Now that we have the context block, we can access the register contents thatJ   were current at the time of the exception.  We have the PC of the OPCDECP   excpetion, so we can access the instruction that was being executed.  Based onO   the opcode (and the function code in the case of a sign-extend opcode) we canaP   emulate the instruction using the following algorithms.  (Note that I will useP   the symbols ra, rb, rc just as the SRM does to indicate the first, second, andP   third registers in an Alpha instruction, and displ to indicate the contents of1   the displacement field of a memory instruction.    LDWU  I - Calculate the effective address by adding the contents of rb and displ.yP - Treating the effective address as a pointer to a volatile unsigned word, fetchN   the word that it points to, convert the value to a quadword, and replace the!   contents of ra with that value.sG - Call LIB$PUT_INVO_REGISTERS to insure the updated register values areE    returned to the correct frame.   LDBU  I - Calculate the effective address by adding the contents of rb and displ.hJ - Treating the effective address as a pointer to a volatile unsigned byte,P   fetch the byte that it points to, convert the value to a quadword, and replace%   the contents of ra with that value.tG - Call LIB$PUT_INVO_REGISTERS to insure the updated register values area    returned to the correct frame.   STWr 	iI - Calculate the effective address by adding the contents of rb and displ.AK - Treating the effective address as a pointer to a volatile unsigned word,  I   truncate the contents of ra to a word and store this word into the word &   pointed to by the effective address.   STBe 	nI - Calculate the effective address by adding the contents of rb and displ.bK - Treating the effective address as a pointer to a volatile unsigned byte, eI   truncate the contents of ra to a byte and store this byte into the byteh&   pointed to by the effective address.   SEXTWe  L - Treat the contents of rb as a signed word, convert it to a signed quadword   and store the result in rc.gG - Call LIB$PUT_INVO_REGISTERS to insure the updated register values aree    returned to the correct frame.   SEXTB   L - Treat the contents of rb as a signed byte, convert it to a signed quadword   and store the result in rc. G - Call LIB$PUT_INVO_REGISTERS to insure the updated register values arey    returned to the correct frame.  P Note that there is no special handling for an exception in the emulator and thatP an unexpected condition (a failure return from LIB$GET_PREV_INVO_CONTEXT resultsM in resignalling the OPCDEC so it will appear that the emulator is not there)..   3.3  The Complete Emulator   3.3.1 Design Description  P The complete emulator is required not only to operate at all modes and IPLs, butJ also to must reflect any exceptions (currently ACCVIOs) that it encountersH during emulation back to the original caller as though the exception hadP happened at the PC of the emulated instruction.   In addition, it must incrementL one of several counters, and possibly signal SS$_EMULATE to warn the user or) the debugger of the emulated instruction.e  0 3.3.1.1 Dealing With All-Mode, All-IPL Emulation  M The algorithms for emulating the new instructions for success (non-exception- J generating) cases are the same as those for the subset emulator.  However,N to deal with the all-mode and all-IPL requirement three major differences have to be introduced:   N 1)  The emulator comes in two variants, one simple non-paging variant for highO     IPL and one complete paging variant for low IPL.  The idea here is that theeK     full, low-IPL emulator will be large enough that paging is desireable. OJ     Experience may show that the size of the two variants is not differentN     enough to warrant having a special paging version, and that one non-pagingG     version with conditional code can be used for both high and low IPLt     exceptions.w  H 2)  Both variants are part of EXCEPTION.EXE rather than being separatelyI     merged into a process.  Notice that no piece of the complete emulatoroI     resides in VMS$IEEE_HANDLER; that image was used only for the subset.   F 3)  Special calls to the emulation routines have to be introduced into     into EXCEPTION.M64.C  M In addition, some significant differences are required to support the abilityhI of the emulator to reflect ACCVIOs (and possibly other exceptions) to thea original "caller".    & 3.3.1.2 Reflecting Emulator Exceptions  O Reflecting emulator exceptions means (in the context of this document) catchingcN any exceptions that may occur while the emulator is working and arranging thatP application condition handlers see the exception as though hardware execution ofP the emulated instruction had caused it.  In practice, an algorithm to reflectingO an emulator exception must perform three tasks,  First, it must ensure that the N up-level condition handlers get a signal and mechanism array which reflect theD state of the process when it first attempted to execute the emulatedN instruction, and second, it must ensure that the state of the process actuallyM matches the state in the signal and mechanism array such that the handler caniM continue, resignal, or unwind properly, and finally, it must ensure that datanK about the exception (e.g. the reason mask of an ACCVIO) match what it woulde3 be on hardware supporting the emulated instruction.   N To clarify the first two points, consider the following diagram of the stack. P Assume that the processor has attempted to execute a LDBU instruction, resultingF in an OPCDEC.  The emulator was called, attempted to read the longword1 containing the requested byte, and got an ACCVIO.r  J +--------------------------------------------------------------------+ - AF |                                                                    |F ~                  CHF Context Area for ACCVIO                       ~F |                                                                    |F +--------------------------------------------------------------------+F |                                                                    |F ~                  Mechanism Array for ACCVIO                        ~F |                                                                    |F +----------------------------------+---------------------------------+$ |                                  |$ |  32-Bit Signal Array for ACCVIO  |$ |                                  |F +----------------------------------+---------------------------------+F |                                                                    |F ~                   64-bit Signal Array for ACCVIO                   ~F |                                                                    |F +--------------------------------------------------------------------+F |                                                                    |F ~                  Exception Stack Frame for ACCVIO                  ~F |                                                                    |J +--------------------------------------------------------------------+ - BF |                                                                    |? ~              Stack frames for OPCDEC Condition Handler	     ~#F |                                                                    |F +--------------------------------------------------------------------+F |                                                                    |F ~                  CHF Context Area for OPCDEC                       ~F |                                                                    |F +--------------------------------------------------------------------+F |                                                                    |F ~                  Mechanism Array for OPCDEC                        ~F |                                                                    |F +----------------------------------+---------------------------------+$ |                                  |$ |  32-Bit Signal Array for OPCDEC  |$ |                                  |F +----------------------------------+---------------------------------+F |                                                                    |F ~                   64-bit Signal Array for OPCDEC                   ~F |                                                                    |F +--------------------------------------------------------------------+F |                                                                    |F ~                  Exception Stack Frame for OPCDEC                  ~F |                                                                    |J +--------------------------------------------------------------------+ - CF |                                                                    |A |                    Stack frames for User Program	             |aF :                                                                    :    I The portion of the stack between A and B contains the CHF context for thecL ACCVIO.  Between B and C contains the CHF context for the OPCDEC, as well as- the stack from for the OPCDEC handler itself.h  E If we simply change the ACCVIO signal array's PC to point to the LDBUIM instruction and resignal, we end up reaching the user's handler with too many I frames on the stack:  the entire set of arrays and frames from the OPCDEC K are still there.  The mechanism array's depth is greater than it should be,nN and worse, the signal array's PC is pointing to a routine which does not matchP the depth.  This means that if the user handler returned SS$_CONTINUE, executionL would resume at the LDBU instruction, but with the register set belonging to
 the emulator.o  M If we attempt to unwind from the ACCVIO handler to the user frame rather than K resignaling, there is no way to initiate a signal that the original frame'so signal handler will catch.  J Another method is to add a special entry in EXCEPTION which would, as AndyL Goldstein put it, lift itself and the ACCVIO context by the bootstraps aboveL the stack, clean out the OPCDEC exception, and lower itself back down.  ThisK method, while very general, seems overly risky for this project, given thate there is another alternative.h  ' The method I have chosen is as follows:   H 	1)  In EXCEPTION, when building the signal arrays for an OPCDEC, alwaysH 	    allocate enough space in both signal arrays (32 and 64-bit) to holdF 	    the signal array of the longest possible exception (in this case,/ 	    8 longwords and quadwords for SS$_ASTFLT).o  F 	2)  In the emulator, establish a condition handler for any exceptionsD 	    that might be generated there (I'll call it the ACCVIO handler,G 	    although it would be used to handle other exceptions that emulatede# 	    instructions might encounter).   D 	3)  In the emulator, load the address of the two signal arrays into; 	    registers R2 and R3 (see Q2 below for how to do this).t  F 	4)  In the ACCVIO handler, get the OPCDEC signal array addresses fromE 	    the OPCDEC handler's R2 and R3 and copy the ACCVIO signal arraysa 	    over them.E  H 	5)  Use SYS$GO_TO_UNWIND to return to the OPCDEC handler with a failure 	    status.  @ 	6)  In the OPCDEC handler, notice the failure status and returnB 	    SS$_RESIGNAL, which effectively passes the ACCVIO back to theI 	    application with the context of the frame where the OPCDEC happened.   . To answer some obvious objections immediately:  G 	Q1) You have to have the right count of signal arguments in the OPCDECtH 	    signal array.  If you allocate more than the count shows, won't youA 	    have the wrong length when you try to peel it off the stack?rD 	A1) No, the length of the entire exception context (everything thatF 	    has to be peeled off) is kept separately in the CHF Context Area.2 	    That value will include the extra allocation.  1 	Q2) How do you load stuff into registers with C?hE 	A2) Newer versions of C have a "builtin" called ASM which allows younH 	    to include Alpha assembly code in line with C code.  An alternativeA 	    method is to define a linkage which passes some arguments in E 	    registers which the called routine is required to preserve (i.e.e4 	    R2-R15).  Finally, we could use a MACRO jacket.  G 	Q3) What do you do if something goes wrong?  For example, if you don'tr9 	    actually have the OPCDEC array in the ACCVIO handlerhE 	A3) The fallback will be to resignal the exception.  This means thatnE 	    a user-supplied handler would not have the correct PC, but would % 	    still get the correct exception.d  H 	Q4) Which signal array will you change and will you return SS$_RESIGNAL 	    or SS$_RESIGNAL64? D 	A4) I will copy both signal arrays.  Since I am being called from aA 	    special place inside EXCEPTION, I can choose whether to have D 	    EXCEPTION propogate one signal array to the other.  I choose to= 	    do the work myself and have EXCEPTION do no propogation.t  P There are a few remaining concern about reflecting exceptions:  First, an actualN STB/STW instruction will not do a read from memory, and thus will not fault ifL the page in question is marked fault-on-read or noread, but is writable.  AnO emulated STB/STW must first read the longword(s) or quadword(s) that contain(s)AJ the byte or word to be written, and thus will fault under this condition. O Second, if the page that the instruction is accessing is protected against bothwL read and write, a real STB/STW instruction would ACCVIO with the reason maskN showing a write, while the emulator's ACCVIO will show a reason mask of read. F Finally, if the page in which the actual instruction resides is markedM Fault-On-Read, a hardware instruction would execute without a fault, while an G emulated instruction would take a fault trying to read the instruction.   M The only way to avoid the first problem is to do the emulation in kernel mode K and probe the memory only for write access.  This method seems fraught withnM potential problems.  My plan is to allow this small difference in behavior toeL stand for several reasons.  First, it is a convention in VMS that any memoryL that is writable must also be readable.  That convention removes half of theP problem.  Secondly, the only expected use of Fault-On-Read in the near future isN for Memory Channel, and the emulator does not support I/O space access anyway.O In total, it seems unlikely that this problem would ever be noticed, and from aC; security point of view, it is erring on the side of safety.   P The second problem is similarly minor, but it is easier to fix, so I will do so.M I will simply keep track of what instruction I am emulating and if it gets anx= ACCVIO, replace the reason mask with "write" for STW and STB.t  J The last problem will constitute a restriction:  An instruction can not beM emulated if it is protected with F-O-R.  There is some thought that F-O-R maynN be used in the future to implement "execute-only" pages.  We will have to dealO with this problem if we do execute-only pages.  Perhaps the F-O-R handler coulds! be given additional intelligence.    3.3.1.3 Signalling SS$_EMULATE  L If the bit which represents the instruction that is being emulated is set inP CTL$GQ_EMULATE_SIGNAL, then the emulator is required to signal SS$_EMULATE afterK completing the emulation.  It will do this by simply overwriting the OPCDECsP signal array with an SS$_EMULATE signal array and returing SS$_RESIGNAL.  If the6 bit is not set, the emulator will return SS$_CONTINUE.  < 3.3.2  Specific Changes and Algorithms for Complete Emulator  E The following changed and new modules all reside in the SYS facility.p   o EXCEPTION.M64   A - Call a special non-paged emulator module for OPCDEC above IPL 2p  M   At label 10$: (where we have determined that we have an exception in systemlC   state or above IPL 2 and are about to bugcheck), insert a call tod>   EMULATE_NEW_INSTRUCTION_HI_IPL.  Specifically, load R16 withN   CHFCTX$Q_SIGARGLST(SP), R17 with CHFCTX$Q_MCHARGLST(SP), R25 with 2, and JSBM   EMULATE_NEW_INSTRUCTION_HI_IPL.  After it returns, if R0's lower bit is set J   (i.e. EMULATE... returned SS$_CONTINUE), branch to EXE_CONTSIGNAL (whichL   dismisses the interrupt and continues executing). Otherwise, control drops+   through into the INVEXCEPT bugcheck code.   I - Call the emulator in case of an OPCDEC in any mode (at IPL 2 or lower).   M   Immediately before the first label 30$: (just before calling CHF_MAP_IEEE), O   call EMULATE_NEW_INSTRUCTION in essentially the same way that IEEE_HANDLER is-J   called just below (except that IEEE_HANDLER is called indirectly throughO   CTL$GL_IEEE_HANDLER.  Specifically, load R16 with CHFCTX$Q_SIGARGLST(SP), R17-M   with CHFCTX$Q_MCHARGLST(SP), R25 with 2, and call EMULATE_NEW_INSTRUCTION.  F   After it returns, if R0's lower bit is set (i.e. EMULATE... returnedL   SS$_CONTINUE), branch to CHECK_STOP.  If R0's lower bit is clear, copy the=   (possibly) modified signal array argument argument count to-K   CHFCTX$L_SIG_ARGS(SP), and check again to see if the signal name is still H   SS$_OPCDEC.  If so, continue into 30$.  If not, branch to SEARCH_LOOP:  @ - Reserve extra space for the signal array in the case of OPCDEC  G   After EXE$OPCDEC_EXCEPTION_ENTRY:, there the 64-bit and 32-bit signal-G   arrays are allocated on the stack.  Change the allocation size from 4 J   quadwords and 4 longwords to 8 quadwords and 8 longwords.  Do NOT change*   the length field in either signal array.   o EMULATE.C, EMULATE.H  M This is a new module (and its header file) which contains the code to emulate-O instructions and to reflect exceptions.  It will be conditionally compiled into L two object modules:  EMULATE and EMULATE_HI_IPL.  The HI_IPL version will beI built so that it is non-pagable.  Additional differences are noted below.-0 The modules will contain the following routines:  > -chf_emulate_new_instruction(chf$mech_array, chf$signal_array)E -chf_emulate_new_instruction_hi_ipl(chf$mech_array, chf$signal_array)   P These routines are the main condition handler which get entered for each OPCDEC.! They perform the following steps:+  O 1) (Low IPL version) Establish a condition handler named emulator_cond_handler.   C 2) Call find_context to get the invo_context_blk for the invocation-J context where the OPCDEC occurred using the same algorithm as specified in section 3.2.2.  M 3) Decode the instruction pointed at by the PC-4 in the signal array.  If ther5 instruction can not be emulated, return SS$_RESIGNAL.   E 4) (Low IPL version) Modify the signal name in the signal array to be- SS$_EMULATE.  K 5)  (Low IPL version) Store pointers to the 32- and 64-bit signal arrays in  registers R2 and R3.  + 6)  Call the appropriate emulation routine.f  L 7)  If the emulation routine returns false, return SS$_RESIGNAL.  Otherwise,O if we are in system context, increment CPU$EMULATE_COUNT.  Otherwise, increment- CTL$GL_EMULATE_COUNT.   L 8) If we are not in system context, check to see that CTL$GQ_EMULATE_PC_RING< is pointing to an address between CTL$GQ_EMULATE_PC_RING andP CTL$GQ_EMULATE_PC_RING_END.  If not, store the address of CTL$GQ_EMULATE_PC_RINGO into it.  In any case, store the PC from the signal array into the cell pointeddN at by CTL$GQ_EMULATE_RING_PTR.  Increment the pointer.  (Treat the pointer andO the cell it points to as volatile.  This will make the compiler ensure that thenN read/modify/write is atomic in the face of multiple modes and IPLs writing the cells).m  O 9) If we are not in system context, check the bit in CTL$GQ_EMULATE_SIGNAL thatrM corresponds to this instruction. If it is set, replace the signal name in the K signal array with SS$_EMULATE and return SS$_RESIGNAL.  In all other cases,h return SS$_CONTINUE.    . -emulate_instruction_LDBU(invo_context_blk,pc) -emulate_instruction_LDWU	"U -emulate_instruction_STB	" -emulate_instruction_STW	" -emulate_instruction_SEXTB	" -emulate_instruction_SEXTW	"  I These routines emulate their respective instructions using the algorithmsa> specified in section 3.2.2. and return either TRUE if they are" successful, or FALSE if they fail.  8 -emulator-cond-handler(chf$mech_array, chf$signal_array)  P This is the condition handler which is established in emulate_new_instruction().. When entered, it performs the following steps:  K 1) Check the "depth" in the mechanism array to see if it is 0.  If so, thisaL means that the exception did not happen in the actual emulation code, and isM therefore is caused either by an emulator failure or by Fault-On-Read set for : the page containing the instruction.  Return SS$_RESIGNAL.  K 2) Check that the exception is an ACCVIO or an ASTFLT.  If not, this is notsM an exception that we know how to deal with.  Return SS$_RESIGNAL.  (This steps2 may be modified for future emulated instructions.)  H 3) Find the context of the establisher of this handler using the "depth" in the mechanism array.a  N 4) Get pointers to the OPCDEC 32- and 64-bit signal arrays out of the saved R2I and R3 from the establisher's context.  Get the opcode of the instructions being emulated.e  M 5) In the OPCDEC 64 and 32 bit signal arrays, move the PC and PS forward by NgO quadwords/longwords, where N is the difference in length between the new signal J array and the OPCDEC signal array.  Copy the ACCVIO signal arrays over the/ OPCDEC signal arrays (excluding the PC and PS).   L 6) If the instructions being emulated are STB or STW, modify the reason mask to be "write".  M 7) Call LIB$GO_TO_UNWIND to unwind to chf_emulate_new_instruction's frame anda to return SS$_RESIGNAL to it.    4.0 Testinge   4.1 Unit testing  O Unit testing will consist of simple functionality tests to ensure that the bulkCI of the code works.  It will at least check that each instruction operatesyK correctly with both negative and positive data.  For the complete emulator,NN it will check user mode, kernel mode low IPL and kernel mode high ipl, as well7 as checking to see that ACCVIOs are reflected properly.   B Some of these tests may rely on manual examination of the results.   4.2 System Testing  ? The following items should be tested for the complete emulator:i  L 0 Correctness of execution under "normal conditions".  To do this, we should)   test all combinations of the following:e  )     Instruction		Data		Registers	Mode/IPLe)     -----------		----		---------	--------p  * 	LDBU		Signed		0-28		KESU/0,1,2,8,20,22,31 	LDWU		Unsignedc 	STB 	STW 	SEXTB 	SEXTW  L "Test" means execute the instruction under the specified conditions and make, sure that the immediate results are correct.   o Correctness of counters   F 	- Ensure that the correct process-based counters increment when doing 	  the above testing.l  F 	- Insert a byte/word instruction into the clock interrupt routine andC 	  insure that the CPU-database counter increments when there is no  	  process activity.  , o Correctness of exception/signal processing  D 	- Supply a bogus address in each mode at IPL 0, Kernel mode at IPL2E 	  and IPL 3.  Ensure that the ACCVIO is signalled correctly at IPL 2 ; 	  and below and that we get a INVEXCEPT bugcheck at IPL 3.   A 	- Signal an OPCDEC with LIB$SIGNAL with a PC pointing to a bogus,A 	  location.  This will cause the emulator to get an ACCVIO while C 	  reading the instruction.  Ensure that this ACCVIO is reported aso? 	  being in the emulator, and is not reflected to the bogus PC.t  G 	- Turn on signalling by setting CTL$GQ_EMULATE_SIGNAL to 1.  Make suretH 	  that we get an SS$_EMULATE signal for all modes and IPLs below 2, and 	  nothing above 2.e  F 	- Set CTL$GL_EMULATE_SIGNAL_MAX to a small number "n" and ensure thatD 	  the SS$_EMULATE signal is no longer sent when the counter exceeds 	  "n".    o Atomicityr  I 	- Run the above tests in many threads of the same process simultaneouslysC 	  on a multiprocessor.  If Kernel Threads does not allow differentiH 	  threads to run on different CPUs when we are doing this test, we willH 	  also need to try this with multiple processes using a global section.  F 	- Validate the test by repeating the same test with a special versionI 	  of the emulator which does not use the "volatile" attribute, and whichlG 	  will thus not use atomic sequences.  If there are no errors, we needsD 	  to revisit the test and find out why we are not hitting atomicity 	  problems.  K For the subset emulator, just test the correctness of all instructions withsL all registers in user mode, and also test atomicity using multiple processes writing to a global section.  F All tests must run automatically with a minimum of manual setup and no manual intervention.  ! 5.0 Estimated Implementation Timeo  L These estimates are "real time".  That is, they include general overhead andN normal interruptions.  They do not include vacations, multi-day interruptions, etc.  H 5.1  Clean up prototype to create subset emulator, unit test, and debug:  A 	About 1 week.  The result will be a set of modified source filesuC 	in my local directory.  In other words, this time does not includeo 	anything to do with checkin.u  7 5.2  Write, unit test, and debug the complete emulator.l  B 	About 4 weeks.  The result will be a set of modified source filesC 	in my local directory.  In other words, this time does not include. 	anything to do with checkin.f  4 5.3  Write, debug and run the complete test program.  D 	About 4 weeks minimum.  This estimate is less certain, since I willD 	need to learn how to use threads, and also consult with the testingD 	folks to understand if there are any special requirements that theyE 	need for incorporating the test program into their regression suite.m   6.0 External RequirementsU  I 6.1 A C compiler which supports 64-bit addressing.  This compiler will belN required to build the emulator, since it must deference 64-bit pointers.  ThisN compiler must be available in any build which the complete emulator is checked into.1  K 6.1 A C compiler which supports generating byte/word instructions.  This isaN required for generating test programs.  This compiler need not be the compiler@ which support 64-bit addressing, although it would be desirable.