M65C02A Instruction Set Enhancements

A project log for Chameleon

FPGA shield board with XC3S50A/XC3S200A FPGA, 2 232/485 ports, 32Mb Flash, 4kb FRAM, 128kB SRAM, 12 Arduino UNO connections.

michael-a-morrisMichael A. Morris 08/16/2014 at 16:040 Comments

This log describes the additions to the 65C02 instruction set added by the M65C02A core. As indicated in the previous project log, Checkout, the M65C02A core is a significant enhancement to the microarchitecture of the M65C02 core. The changes in the core were primarily focused on two objectives: (1) reducing the size of the core, and (2) improving the speed of the core. The M65C02A core meets both of these objectives.

The reduction in the size of the M65C02A core makes the core smaller than some cores which only support the base 6502 instruction set. That is, the resource requirements (LUTs/Slices/FFs) of the M65C02A (481/358/125) core are less than those of cores such as M65C02 (755/464/225), verilog-6502 (510/337/195), and ag_6502 (978/512/93). Note that each core must be evaluated separately in any particular application. When a core is synthesized in isolation, simplistic resource comparisons will not provide an accurate view of a core's resource requirements in a specific application. In other words, many internal and external factors can affect the final resource utilization values reported for any particular core. Thus, when considering the M65C02A/M65C02 cores, their use of 2 BRAMs to hold their microprograms must be seriously considered; BRAMs are a very limited resource in most FPGAs.

The M65C02A address generator has been significantly enhanced, and several recent additions to the data path logic, allow the M65C02A to support several addressing modes found only in the 16-bit enhanced version of the WDC 65C02, the W65C816 microprocessor. Specifically, the M65C02A supports the two stack relative addressing modes of the '816: sp,S and (sp,S),Y. The M65C02A core, therefore, currently supports the 8-bit mode versions of the ORA/AND/EOR/ADC/LDA/STA/CMP/SBC sp,S instructions and the ORA/AND/EOR/ADC/LDA/STA/CMP/SBC (sp,S),Y instructions.

In addition, the M65C02A supports the three '816 instructions which push 16-bit addresses/constants onto the processor stack: (1) PEA abs, (2) PEI zp, and PER rel16. In actual practice, the operand of the '816 PEA instruction is a 16-bit immediate value. That is, a 16-bit constant is pushed onto the processor stack, and a standard assembler will accept either a 16-bit absolute address or a 16-bit immediate value. The PEI instruction pushes the 16-bit value stored in locations {zp+1, zp}, which like the PEA instruction operand, may be considered to be an address or a constant. The PER instruction pushes the absolute address of the sum of the rel16 parameter and the program counter (PC) value of the next instruction. In the '816, the PER instruction is used to support position independent, PC-relative subroutines and data. With the PER instruction, the '816 uses the instruction sequence of a PER rel16 instruction followed by an BRA rel16 instruction to implement a position independent branch to subroutine (BSR).

The M65C02A implements these three '816-specific instructions, but renames them to PHW #imm16, PHW zp, and PHR rel16, respectively. This is done in order to better convey the actual implementation of the instructions, and to allow the incorporation of an additional M65C02A-only instruction: PHW abs. This new M65C02A-only instruction behaves like the PHW zp (PEI zp) instruction except that it uses the absolute addressing mode rather than the zero page direct addressing mode. To complement these 16-bit push operations, the 16-bit M65C02A-only pull instructions PLW zp and PLW abs are included in the instruction set. These instructions pull a 16-bit value from the processor stack and store it in location specified by the addressing mode.

The M65C02A also implements the '816 BRA rel16 instruction to support PC-relative, position independent software/firmware. In addition, the M65C02A-only BSR rel16 instruction provides the capability to access PC-relative subroutines directly instead of having to use the PER rel16; BRA rel16 instruction sequence. The M65C02A core also provides the means to use 16-bit values pushed onto the processor stack as software/firmware addresses by providing the JMP (sp,S),Y instruction.

The M65C02A also implements the '816 COProcessor instruction. Like the '816 COP instruction, the M65C02A COP instruction is implemented as a processor trap like the standard BRK instruction. Unlike the 6502/65C02 BRK and the '816 COP, the M65C02A COP loads the X register with the "signature" byte that follows the instruction. The M65C02A implementation of the COP instruction can be easily gleaned from the definition: COP #imm. The 8-bit immediate operand is loaded into the X register, and then a trap is taken through a vector defined in high memory.

Beyond these instructions, work is continuing on the M65C02A core to support several other enhancements: (1) escape codes, (2)using Y as a stack pointer in zero page, (3) supporting Kernel (default) and User operating modes, and (4) virtual memory support. The first of these enhancements, instruction escape codes, is expected to enable significant enhancements to the instruction set.

The first escape code, IND, converts direct addressing modes such as zero page direct or absolute into zero page indirect or absolute indirect when it precedes an instruction using either of these two direct addressing modes. A potential application of this escape code is to allow the Rockwell bit-oriented instructions, RMBx/SMBx zp and BBRx/BBSx zp,rel, to use zero page indirect:RMBx/SMBx (zp) and BBRx/BBSx (zp),rel. This enhancement would potentially extend the applicability of these 32 instructions to the entire 64kB address space of the M65C02A core. A similar application could be the application of the IND escape code to the TRB/TSB/BIT zp and TRB/TSB/BIT abs instructions: TRB/TSB/BIT (zp) and TRB/TSB/BIT (abs).

IND has been implemented for all addressing modes. It may be applied to all zero page direct and direct absolute addressing modes. IND cannot be applied to instructions which provide both zero page direct and zero page indirect addressing modes. IND also converts indexed zero page direct addressing modes to indirect addressing modes which are consistent with the 6502/65C02 instruction set architecture. That is, pre-indexed zero page direct, zp,X, is converted to pre-indexed zero page indirect, (zp,X); post-indexed zero page direct, zp,Y, is converted to post-indexed zero page indirect, (zp),Y. A similar conversion applies to the indexed absolute direct addressing modes.

Three other escape/prefix codes have been implemented: OAX, OAY, and OSY. The OAX and OAY escape/prefix codes are mutually exclusive, but may be combined with IND (and SIZ, discussed below). OAX and OAY exchange A with X, or A with Y, respectively. As such, the X and Y index registers can be used as accumulators, and A becomes an index register. The OSY prefix/escape code makes Y function as a stack pointer in zero page. Unlike OAX and OAY, OSY does not perform a complete exchange of the registers and their functions. Thus, S does not replace Y as an index register. OSY is mutually exclusive with OAY, but may be combined with OAX.

These three prefix/escape codes have been implemented and significantly enhance the basic instruction set of the M65C02A. When combined with IND and the stack relative addressing modes, these prefix/escape codes can be used to implement Forth VMs and other HLLs much more efficiently. Being able to treat X and Y as accumulators makes it much easier to implement HLL addressing of data structures. Furthermore, although S does not become an index register when OSY is applied to an instruction, it does allow the use of all Y-specific instructions with S. Thus, although OSY does not give S the functionality of an accumulator, it does convert the STY/LDY/CPY/INY/DEY/PHY/PLY/TYA/TAY instructions to use S instead of Y: STS/LDS/CPS/INS/DES/PHS/PLS/TSA/TAS.

Not yet implemented are the SIZ and ISZ prefix/escape codes. The SIZ prefix code will extend an operation from 8 bits to 16 bits. The ISZ prefix provides a single opcode for combining IND and SIZ.

To maintain compatibility with standard 6502/65C02 assemblers and compilers, the M65C02A prefix/escape codes have a finite lifetime: 1 instruction cycle. This is in contrast to the '816 whose m and x bits in the native mode P register maintain the accumulator and X/Y index register sizes as programmed until explicitly changed by the programmer. The M65C02A prefix codes are taken from the unused opcodes of the W65C02S, and therefore operate as single cycle NOPs when applied to instructions that don't support the requested operations. One important characteristic is that the prefix/escape codes are not interruptable.

A number of additional instructions are planned for the remaining 12 free opcodes. There are five instructions defined to support implementation of Indirect and Direct Threaded Code (ITC/DTC) Forth VMs: (1) ENT - Enter, (2) NXT - Next, (3) PHI - Push IP, (4) PLI - Pull IP, and (5) INI - Increment IP. NXT provides direct support for a DTC Forth VM, and IND NXT provides support for an ITC Forth VM.

Three other opcodes are reserved for implementing and manipulating three level internal register stacks for A, X and/or Y: (1) DUP, (2) SWP, and (3) ROT. Off special note is that LDA/STA (and LDX/STX/ LDY/STY) do not automatically push and pop these internal register stacks. This approach keeps the behavior of these registers consistent with what a programmer may expect from a standard 6502/65C02. Therefore, a DUP instruction must precede LDA/LDX/LDY instruction in order to push a value onto the register stack. Similarly, a ROT instruction must follow STA/STX/STY instruction in order to pop a value from the register stack. Otherwise, LDx and STx only affect the top location of the register stack.

These register stacks are intended to provide some additional internal registers in a manner compatible with the 6502/65C02 architecture. In accumulator based architectures such as those of the 6502/65C02, 6800/68HC11, and 8080/8085/Z80 microprocessors, the accumulator becomes a choke point and limits performance. With only X and Y on board, the 6502/65C02 requires more loading and storing of intermediate values in external memory when compared to the 8080/8085/Z80 processors. Thus, equipping the M65C02A A, X, and/or Y registers with a simple three level stack will allow stack-based processing techniques to be used for arithmetic and pointer calculations without moving intermediate values to/from external memory. Finally, if not explicitly used, the register stacks will be invisible to the programmer.

The remaining four free opcodes are reserved for future use.