Encoding
This section describes the binary encoding of each of the simulator's instructions. Although the instruction set is based on that of the Intel 8088, the encoding has been simplified for practical and educational purposes.
Acronyms and abbreviations
Throughout the encoding, the following abbreviations are used:
w: indicates the size of the operands.wSize 08 bits 116 bits rrrorRRR: reference registers, and depend onw.rrrw=0w=1000ALAX001CLCX010DLDX011BLBX100AHSP101CH— 110DH— 111BH— data refers to the byte/word of immediate data. For instructions where
w=0, data-high is ignored.disp refers to the word of a offset/displacement (always in 2's complement).
addr refers to the word of an address.
xxx-low refers to the least significant part (LSB) of a word or a byte.
xxx-high refers to the most significant part (MSB) of a word.
ALU Binary Instructions
| Instruction | Opcode |
|---|---|
MOV | 100 0000 w |
AND | 100 0001 w |
OR | 100 0010 w |
XOR | 100 0011 w |
ADD | 100 0100 w |
ADC | 100 0101 w |
SUB | 100 0110 w |
SBB | 100 0111 w |
TEST | 101 0001 w |
CMP | 101 0110 w |
These instructions receive two operands and support various addressing modes. This information is encoded in the d bit and the second byte of the instruction according to the following table:
| Destination | Source | Second byte | Following bytes |
|---|---|---|---|
| Register | Register | 00RRRrrr | — |
| Register | Memory (direct) | 01000rrr | addr-low, addr-high |
| Register | Memory (indirect) | 01010rrr | — |
| Register | Memory (indirect with offset) | 01100rrr | disp-low, disp-high |
| Register | Immediate | 01001rrr | data-low, data-high |
| Memory (direct) | Register | 11000rrr | addr-low, addr-high |
| Memory (indirect) | Register | 11010rrr | — |
| Memory (indirect with offset) | Register | 11100rrr | disp-low, disp-high |
| Memory (direct) | Immediate | 11001000 | addr-low, addr-high, data-low, data-high |
| Memory (indirect) | Immediate | 11011000 | data-low, data-high |
| Memory (indirect with offset) | Immediate | 11101000 | disp-low, disp-high, data-low, data-high |
For instructions with a register as an operand, rrr encodes this register. In the case of register to register, RRR encodes the source register and rrr the destination register.
ALU Unary Instructions
| Instruction | Opcode |
|---|---|
NOT | 0100 000 w |
NEG | 0100 001 w |
INC | 0100 010 w |
DEC | 0100 011 w |
These instructions receive one operand and support various addressing modes. This information is encoded in the second byte of the instruction according to the following table:
| Destination | Second byte | Following bytes |
|---|---|---|
| Register | 00000rrr | — |
| Memory (direct) | 11000000 | addr-low, addr-high |
| Memory (indirect) | 11010000 | — |
| Memory (indirect with offset) | 11100000 | disp-low, disp-high |
I/O Instructions
| Instruction | Opcode |
|---|---|
IN | 0101 00 pw |
OUT | 0101 01 pw |
The p bit encodes
- if the port is fixed (
p=0), which will have to be provided in the next byte (maximum port: 255), - or if the port is variable (
p=1), in which case the value stored in theDXregister will be used as the port.
Stack Instructions
| Instruction | Opcode |
|---|---|
PUSH | 0110 0rrr |
POP | 0110 1rrr |
PUSHF | 0111 0000 |
POPF | 0111 1000 |
rrr always represents a 16-bit register.
Jump Instructions
| Instruction | Opcode |
|---|---|
JC | 0010 0000 |
JNC | 0010 0001 |
JZ | 0010 0010 |
JNZ | 0010 0011 |
JS | 0010 0100 |
JNS | 0010 0101 |
JO | 0010 0110 |
JNO | 0010 0111 |
JMP | 0011 0000 |
CALL | 0011 0001 |
RET | 0011 0011 |
After the opcode, these instructions (except RET) receive an absolute memory address (which occupies two bytes).
Interrupt Related Instructions
| Instruction | Opcode |
|---|---|
CLI | 0001 1000 |
STI | 0001 1001 |
INT | 0001 1010 |
IRET | 0001 1011 |
After the opcode, INT receives the instruction number (occupies one byte).
Misc Instructions
| Instruction | Opcode |
|---|---|
NOP | 0001 0000 |
HLT | 0001 0001 |