Over the years, I've developed a strong appreciation for the MOS 6502, mostly through my work writing an emulator (something I feel everyone should do). In researching this chip, I've stumbled across countless creative tricks and workarounds both in the design of the chip and the vast library of software written for it.
Link to this section Zero Page
This one's more of an interesting design choice: the 6502 has an absolutely tiny register file. The only general-purpose register is "A", the accumulator. There are two additional registers "X" and "Y", but they are only used for indexing memory.
However, the designers of the 6502 realized that this was extremely limiting to programs and would require lots of reads and writes into memory to handle temporary values computed in a program. Using the "Absolute" addressing mode, this requires two bytes for the memory address and around 4 cycles depending on the instruction. The "Zero Page" addressing mode is a hack: when a Zero Page instruction is used, only the lower byte of the memory address is stored, and the processor can skip a cycle! This both reduces the program size and speeds up program execution. As a result, the first 256 bytes of memory are effectively "registers" of sorts!
Link to this section The "CPS pin"
Like many architectures, the 6502 has a status register with flags like "zero", "overflow", and "negative" representing the last operation completed with the ALU. Unlike these architectures, though, the 6502 has a fun quirk: the chip has a physical "SO" ("set overflow") pin which, when the signal on it goes from high to low, sets the overflow flag in the status register. This flag can be used by subsequent branch instructions.
The usefulness of this pin is questionable, and it rarely saw usage in practice. It was apparently used in tight loops for routines interfacing with hardware, since a transition on the pin could cause code to jump out of the loop. The pin was originally called the CPS pin, short for "Chuck Peddle Special" (named after the main designer of the 6502, Chuck Peddle).
Link to this section The BRK Instruction
On the 6502, opcode 00 maps to the "BRK" instruction. This is essentially used to trigger an interrupt from within the program.
In the 6502, the location in memory that code is executed from after a reset or interrupt is controlled by a table of "vectors" located at the very end of the memory map.
| Signal | Location |
|---|---|
| NMI | FFFA-FFFB |
| RESET | FFFC-FFFD |
| IRQ/BRK | FFFE-FFFF |
NMI and IRQ are for the two pins on the microprocessor of the same name ("non-maskable interrupt" and "interrupt request" respectively). RESET is for when the microcontroller boots up or is reset. BRK shares its vector with IRQ, meaning the code routine which services maskable interrupts is the same routine which services BRK instructions. So, when you run BRK, your code immediately jumps to the interrupt routine. Once the interrupt routine returns with the RTI instruction ("Return From Interrupt"), your code keeps executing.
BRK was intended for use in debugging (as a "breakpoint" of sorts). In some systems, this was what happened: the Apple II would launch the program monitor which would allow you to debug your program. However, this instruction has a weird quirk: despite BRK being a 1-byte instruction, when the interrupt routine returned, the program counter would jump forward by 2 bytes instead of 1 -- leaving a single byte after the BRK instruction unused. Clever programmers could exploit this by looking at the top of the stack and reading this value, using it to pass state between the code calling BRK and the interrupt service routine.
On the BBC Micro, BRK was used for error handling. To throw an error, the BRK instruction would be called with the error code stored in the byte immediately following it. Since returning from an error handler was not supported, an error string could be stored immediately after the error code byte additionally.
On the Apple III, Apple's Sophisticated Operating System used BRK for system calls. The system call opcode would be placed immediately after the BRK instruction, then a 2-byte pointer to additional parameters. This did mean the SOS routine responsible for system calls needed to increment the return address by 2.
Link to this section Wozniak's SWEET16
The 6502 is an 8-bit processor, but addresses and other values are often 16 bits. While it's of course possible to manipulate these larger values using a combination of 8-bit operations, the resulting code is often quite verbose. Steve Wozniak, known for his obsession with going great lengths to save a small amount of resources, did not want to deal with this, so he created an interpreted bytecode language called SWEET16. A program could call into a subroutine, and the code after the subroutine call would be interpreted as SWEET16 instructions until the SWEET16 "Return" instruction was executed, at which point the following code is executed normally.
SWEET16 is an interesting tradeoff for code size versus performance. While it massively reduces the size of code, interpreted SWEET16 code is about one tenth the speed of native 6502 code. While optimizing for code size was usually preferred in the past, modern processors have improved slower than modern memory has, so this tradeoff is less relevant now than it once was.
SWEET16 supported several "nonregister operations". These included branch operations, which used a 1-byte signed offset as the branch target.
| Opcode | Mnemonic | Function |
|---|---|---|
| 00 | RTN | Exit SWEET16, return to native 6502 code |
| 01 | BR ea | Branch always (i.e., jump) |
| 02 | BNC ea | Branch if No Carry |
| 03 | BC ea | Branch if Carry |
| 04 | BP ea | Branch if Plus |
| 05 | BM ea | Branch if Minus |
| 06 | BZ ea | Branch if Zero |
| 07 | BNZ ea | Branch if Non-Zero |
| 08 | BM1 ea | Branch if Minus 1 |
| 09 | BNM1 ea | Branch if Not Minus 1 |
| 0A | BK | Break |
| 0B | RS | Return from Subroutine |
| 0C | BS ea | Branch to Subroutine |
It also featured 16 "registers", stored in the zero page. These were specified using the second nibble of the opcode, allowing these instructions to take up only a single byte. Notably, some instructions supported an indirect addressing mode. Many operate on register R0, also called the accumulator.
| Opcode | Mnemonic | Function |
|---|---|---|
| 1n | SET Rn $xxxx | Set a register to an immediate value |
| 2n | LD Rn | Transfer the value in the specified register to R0 |
| 3n | ST Rn | Transfer the value in R0 to the specified register |
| 4n | LD @Rn | Load the low-order byte of R0 from the memory address stored in Rn, then increment the value in Rn |
| 5n | ST @Rn | Store the low-order byte of R0 into the memory address stored in Rn, then increment the value in Rn |
| 6n | LDD @Rn | Load a 16-bit little-endian value from the memory location stored in Rn into R0, then increment the value in Rn by 2 |
| 7n | STD @Rn | Store a 16-bit little-endian value from R0 into the memory location stored in Rn, then increment the value in Rn by 2 |
| 8n | POP @Rn | Decrement the value in Rn, then load the low-order byte of R0 from the memory address stored in Rn |
| 9n | STP @Rn | Decrement the value in Rn, then store the low-order byte of R0 to the memory address stored in Rn |
| An | ADD Rn | Add the contents of Rn to R0, and store the result in R0 |
| Bn | SUB Rn | Subtract the contents of Rn from R0, and store the result in R0 |
| Cn | POPD @Rn | Decrement the value in Rn by 2, then load the 16-bit little-endian value from R0 into the memory address stored in Rn |
| Dn | CPR Rn | Subtract the value in Rn from R0, and set the status flags for branching |
| En | INR Rn | Increment the contents of Rn |
| Fn | DCR Rn | Decrement the contents of Rn |
SWEET16 stands out to me as a shockingly complete interpreted language implemented in a shockingly short amount of code. It's something I otherwise wouldn't have been expected to be feasible in this context.
Link to this section Illegal instructions and their uses
The 6502 has 151 valid opcodes. However, there are 256 possible values that the first byte of an instruction can take. What happens if you try one of the invalid ones?
- 12 of them just lock up the CPU and prevent it from executing instructions further
- 27 of them are no-ops, with varying instruction lengths
- 1 has non-deterministic behavior ("XAA", which I won't even try to describe here)
- The rest are perfectly usable, but do some rather strange operations...
| Mnemonic | Description |
|---|---|
| ANC | AND memory with accumulator then move negative flag to carry flag |
| ARR | AND memory with accumulator then rotate right |
| ASR | AND memory with accumulator then logical shift right |
| DCP | Decrement memory by 1 then compare with accumulator |
| ISC | Increment memory by 1 then subtract from accumulator, store result in accumulator |
| LAS | AND memory with stack pointer, store result in accumulator, X register, and stack pointer |
| LAX | Load both accumulator and index register from memory |
| RLA | Rotate memory left then AND with accumulator |
| RRA | Rotate memory right then add to accumulator |
| SAX | Store accumulator AND X register into memory |
| SBX | Subtract memory from accumulator AND X register, store in X |
| SHA | Store accumulator AND index register AND a value dependent on the addressing mode into memory |
| SHS | AND accumulator with X register, store in stack pointer, then store stack pointer AND high byte of memory address into memory |
| SHX | Store index register X AND upper byte of address plus 1 into memory |
| SHY | Store index register Y AND upper byte of address plus 1 into memory |
| SLO | Shift memory left, then store memory OR accumulator into accumulator |
| SRE | Shift memory right, then store memory XOR accumulator into accumulator |
The technical reasons for why this behavior happens are described well in this blog post and this document.
A few of these instructions were used in a few NES games (mostly 2-byte NOPs). These also saw use on various BBC Micro games. Games used these for both copy protection (as an attempt to validate the hardware?) and for performance. My guess is that more serious software doesn't bother with these since performance is less of a concern than correctness.
Later revisions of the 6502, such as the W65C02S, replaced the opcodes in the "F" column with additional bit manipulation opcodes.
Link to this section The Ricoh 2A03/2A07 and the NES
The 6502 was incredibly popular, and inspired a few clones and derivatives. One such clone was the Ricoh 2A03/2A07 chip used in the Nintendo Entertainment System. Apparently, Nintendo was not a big believer in copyright law back in this era. Oh, how the times have changed...
To see how blatant this was, first look at this image of the 6502 chip (courtesy of Visual 6502)...

Now, look at this image of the Ricoh 2A03 (also from Visual 6502)

Does that little bit in the lower right corner look familiar?
Interestingly, the Binary Coded Decimal (BCD) functionality of the 6502 were fused off before the chip was incorporated into the Ricoh clone. This feature allowed the 6502 to perform arithmetic operations on numbers stored as decimal, with 4 bits representing a decimal digit. It was covered under a patent held by MOS, which is probably why Ricoh disabled it.
The other parts of this chip are for things like the NES's sound generator, known as the "APU" and definitely worthy of its own post someday. It's also why there are two variants of this chip: one for NTSC and one for PAL.
Link to this section Conclusion
The 6502 is one of the final bastions of an age where microprocessors could be contained in a pile of parchment in a dude named Chuck's desk. It was one of the first mainframes in a chip, and powered countless number of people's first experiences in computing. The number of programmers today who got their start on the 6502 alone makes the chip worldchanging in it of itself.
