Greatest Hits of the MOS 6502

Over the years, I've developed a strong appreciation for the MOS 6502, mostly through my work writing an emulator (something I feel everyone should do). In researching this chip, I've stumbled across countless creative tricks and workarounds both in the design of the chip and the vast library of software written for it.

Link to this section Zero Page

This one's more of an interesting design choice: the 6502 has an absolutely tiny register file. The only general-purpose register is "A", the accumulator. There are two additional registers "X" and "Y", but they are only used for indexing memory.

However, the designers of the 6502 realized that this was extremely limiting to programs and would require lots of reads and writes into memory to handle temporary values computed in a program. Using the "Absolute" addressing mode, this requires two bytes for the memory address and around 4 cycles depending on the instruction. The "Zero Page" addressing mode is a hack: when a Zero Page instruction is used, only the lower byte of the memory address is stored, and the processor can skip a cycle! This both reduces the program size and speeds up program execution. As a result, the first 256 bytes of memory are effectively "registers" of sorts!

Link to this section The "CPS pin"

Like many architectures, the 6502 has a status register with flags like "zero", "overflow", and "negative" representing the last operation completed with the ALU. Unlike these architectures, though, the 6502 has a fun quirk: the chip has a physical "SO" ("set overflow") pin which, when the signal on it goes from high to low, sets the overflow flag in the status register. This flag can be used by subsequent branch instructions.

The usefulness of this pin is questionable, and it rarely saw usage in practice. It was apparently used in tight loops for routines interfacing with hardware, since a transition on the pin could cause code to jump out of the loop. The pin was originally called the CPS pin, short for "Chuck Peddle Special" (named after the main designer of the 6502, Chuck Peddle).

Link to this section The BRK Instruction

On the 6502, opcode 00 maps to the "BRK" instruction. This is essentially used to trigger an interrupt from within the program.

In the 6502, the location in memory that code is executed from after a reset or interrupt is controlled by a table of "vectors" located at the very end of the memory map.

SignalLocation
NMIFFFA-FFFB
RESETFFFC-FFFD
IRQ/BRKFFFE-FFFF

NMI and IRQ are for the two pins on the microprocessor of the same name ("non-maskable interrupt" and "interrupt request" respectively). RESET is for when the microcontroller boots up or is reset. BRK shares its vector with IRQ, meaning the code routine which services maskable interrupts is the same routine which services BRK instructions. So, when you run BRK, your code immediately jumps to the interrupt routine. Once the interrupt routine returns with the RTI instruction ("Return From Interrupt"), your code keeps executing.

BRK was intended for use in debugging (as a "breakpoint" of sorts). In some systems, this was what happened: the Apple II would launch the program monitor which would allow you to debug your program. However, this instruction has a weird quirk: despite BRK being a 1-byte instruction, when the interrupt routine returned, the program counter would jump forward by 2 bytes instead of 1 -- leaving a single byte after the BRK instruction unused. Clever programmers could exploit this by looking at the top of the stack and reading this value, using it to pass state between the code calling BRK and the interrupt service routine.

On the BBC Micro, BRK was used for error handling. To throw an error, the BRK instruction would be called with the error code stored in the byte immediately following it. Since returning from an error handler was not supported, an error string could be stored immediately after the error code byte additionally.

On the Apple III, Apple's Sophisticated Operating System used BRK for system calls. The system call opcode would be placed immediately after the BRK instruction, then a 2-byte pointer to additional parameters. This did mean the SOS routine responsible for system calls needed to increment the return address by 2.

Link to this section Wozniak's SWEET16

The 6502 is an 8-bit processor, but addresses and other values are often 16 bits. While it's of course possible to manipulate these larger values using a combination of 8-bit operations, the resulting code is often quite verbose. Steve Wozniak, known for his obsession with going great lengths to save a small amount of resources, did not want to deal with this, so he created an interpreted bytecode language called SWEET16. A program could call into a subroutine, and the code after the subroutine call would be interpreted as SWEET16 instructions until the SWEET16 "Return" instruction was executed, at which point the following code is executed normally.

SWEET16 is an interesting tradeoff for code size versus performance. While it massively reduces the size of code, interpreted SWEET16 code is about one tenth the speed of native 6502 code. While optimizing for code size was usually preferred in the past, modern processors have improved slower than modern memory has, so this tradeoff is less relevant now than it once was.

SWEET16 supported several "nonregister operations". These included branch operations, which used a 1-byte signed offset as the branch target.

OpcodeMnemonicFunction
00RTNExit SWEET16, return to native 6502 code
01BR eaBranch always (i.e., jump)
02BNC eaBranch if No Carry
03BC eaBranch if Carry
04BP eaBranch if Plus
05BM eaBranch if Minus
06BZ eaBranch if Zero
07BNZ eaBranch if Non-Zero
08BM1 eaBranch if Minus 1
09BNM1 eaBranch if Not Minus 1
0ABKBreak
0BRSReturn from Subroutine
0CBS eaBranch to Subroutine

It also featured 16 "registers", stored in the zero page. These were specified using the second nibble of the opcode, allowing these instructions to take up only a single byte. Notably, some instructions supported an indirect addressing mode. Many operate on register R0, also called the accumulator.

OpcodeMnemonicFunction
1nSET Rn $xxxxSet a register to an immediate value
2nLD RnTransfer the value in the specified register to R0
3nST RnTransfer the value in R0 to the specified register
4nLD @RnLoad the low-order byte of R0 from the memory address stored in Rn, then increment the value in Rn
5nST @RnStore the low-order byte of R0 into the memory address stored in Rn, then increment the value in Rn
6nLDD @RnLoad a 16-bit little-endian value from the memory location stored in Rn into R0, then increment the value in Rn by 2
7nSTD @RnStore a 16-bit little-endian value from R0 into the memory location stored in Rn, then increment the value in Rn by 2
8nPOP @RnDecrement the value in Rn, then load the low-order byte of R0 from the memory address stored in Rn
9nSTP @RnDecrement the value in Rn, then store the low-order byte of R0 to the memory address stored in Rn
AnADD RnAdd the contents of Rn to R0, and store the result in R0
BnSUB RnSubtract the contents of Rn from R0, and store the result in R0
CnPOPD @RnDecrement the value in Rn by 2, then load the 16-bit little-endian value from R0 into the memory address stored in Rn
DnCPR RnSubtract the value in Rn from R0, and set the status flags for branching
EnINR RnIncrement the contents of Rn
FnDCR RnDecrement the contents of Rn

SWEET16 stands out to me as a shockingly complete interpreted language implemented in a shockingly short amount of code. It's something I otherwise wouldn't have been expected to be feasible in this context.

Link to this section Illegal instructions and their uses

The 6502 has 151 valid opcodes. However, there are 256 possible values that the first byte of an instruction can take. What happens if you try one of the invalid ones?

  • 12 of them just lock up the CPU and prevent it from executing instructions further
  • 27 of them are no-ops, with varying instruction lengths
  • 1 has non-deterministic behavior ("XAA", which I won't even try to describe here)
  • The rest are perfectly usable, but do some rather strange operations...
MnemonicDescription
ANCAND memory with accumulator then move negative flag to carry flag
ARRAND memory with accumulator then rotate right
ASRAND memory with accumulator then logical shift right
DCPDecrement memory by 1 then compare with accumulator
ISCIncrement memory by 1 then subtract from accumulator, store result in accumulator
LASAND memory with stack pointer, store result in accumulator, X register, and stack pointer
LAXLoad both accumulator and index register from memory
RLARotate memory left then AND with accumulator
RRARotate memory right then add to accumulator
SAXStore accumulator AND X register into memory
SBXSubtract memory from accumulator AND X register, store in X
SHAStore accumulator AND index register AND a value dependent on the addressing mode into memory
SHSAND accumulator with X register, store in stack pointer, then store stack pointer AND high byte of memory address into memory
SHXStore index register X AND upper byte of address plus 1 into memory
SHYStore index register Y AND upper byte of address plus 1 into memory
SLOShift memory left, then store memory OR accumulator into accumulator
SREShift memory right, then store memory XOR accumulator into accumulator

The technical reasons for why this behavior happens are described well in this blog post and this document.

A few of these instructions were used in a few NES games (mostly 2-byte NOPs). These also saw use on various BBC Micro games. Games used these for both copy protection (as an attempt to validate the hardware?) and for performance. My guess is that more serious software doesn't bother with these since performance is less of a concern than correctness.

Later revisions of the 6502, such as the W65C02S, replaced the opcodes in the "F" column with additional bit manipulation opcodes.

Link to this section The Ricoh 2A03/2A07 and the NES

The 6502 was incredibly popular, and inspired a few clones and derivatives. One such clone was the Ricoh 2A03/2A07 chip used in the Nintendo Entertainment System. Apparently, Nintendo was not a big believer in copyright law back in this era. Oh, how the times have changed...

To see how blatant this was, first look at this image of the 6502 chip (courtesy of Visual 6502)...

Now, look at this image of the Ricoh 2A03 (also from Visual 6502)

Does that little bit in the lower right corner look familiar?

Interestingly, the Binary Coded Decimal (BCD) functionality of the 6502 were fused off before the chip was incorporated into the Ricoh clone. This feature allowed the 6502 to perform arithmetic operations on numbers stored as decimal, with 4 bits representing a decimal digit. It was covered under a patent held by MOS, which is probably why Ricoh disabled it.

The other parts of this chip are for things like the NES's sound generator, known as the "APU" and definitely worthy of its own post someday. It's also why there are two variants of this chip: one for NTSC and one for PAL.

Link to this section Conclusion

The 6502 is one of the final bastions of an age where microprocessors could be contained in a pile of parchment in a dude named Chuck's desk. It was one of the first mainframes in a chip, and powered countless number of people's first experiences in computing. The number of programmers today who got their start on the 6502 alone makes the chip worldchanging in it of itself.