Microprocessor and Interfacing
8086 Architecture · Assembly Programming · Peripheral Interfacing
Number Systems and Data Representation
The 8086 sees the world as bits. Four representations every assembly programmer must master:
| Base | Digits | Notation |
|---|---|---|
| Binary | 0, 1 | 1011B or 0b1011 |
| Octal | 0–7 | 17O or 017 |
| Decimal | 0–9 | 11D (default) |
| Hexadecimal | 0–9, A–F | 0BH or 0x0B |
One hex digit = exactly 4 bits (24=16). A byte = 2 hex digits; a word = 4. Easy to read and convert to/from binary mentally.
Signed Numbers: Two’s Complement
Invert all bits (one’s complement) and add 1. MSB carries the sign: 0 = positive, 1 = negative.
8-bit range: −128 to +127 | 16-bit range: −32768 to +32767
+5 = 0000 01012
Invert: 1111 10102
Add 1: 1111 10112 = FBH
Check: 5+(−5)=00H, carry discarded.
The same adder hardware works for signed and unsigned addition. Subtraction becomes A−B = A+B̅+1.
BCD and ASCII
Each decimal digit (0–9) stored in 4 bits.
Packed BCD: two digits/byte (59 → 59H).
Unpacked BCD: one digit/byte (5 → 05H).
8086 adjusts via DAA/DAS, AAA/AAS/AAM/AAD.
7-bit code; digits ‘0’–‘9’ at 30H–39H.
ASCII digit → BCD: AND AL,0FH
BCD digit → ASCII: OR AL,30H
Character ‘7’ in AL = 37H. Subtract 30H (or AND 0FH) → numeric value 7.
Introduction to Microprocessors
A microprocessor (μP) is a programmable, clock-driven, register-based electronic device that reads binary instructions from memory, accepts binary data as input, processes the data, and provides results as output.
Four Pillars of Any Microprocessor Architecture
- Word length – size of the natural operand (4, 8, 16, 32, 64-bit)
- Instruction set – the vocabulary of operations it understands
- Addressing modes – ways an operand can be specified
- Register set – the on-chip scratchpad
Everything connects to the outside world via three buses: Address, Data, and Control.
Microprocessor vs. Microcontroller vs. Microcomputer
| Aspect | Microprocessor | Microcontroller | Microcomputer |
|---|---|---|---|
| Chip contents | CPU only | CPU+RAM+ROM+I/O+Timers+ADC | CPU+memory+peripherals on board |
| Typical use | General-purpose computing | Dedicated/embedded control | Personal computing |
| Memory | External | On-chip | External (modular) |
| Power | Higher | Very low | Highest |
| Cost | Moderate | Very low | High |
| Examples | 8085, 8086, Pentium | 8051, AVR, PIC, ARM Cortex-M | IBM-PC, Raspberry Pi |
If the silicon alone can blink an LED with no external memory or I/O, it’s a microcontroller. If it needs a motherboard full of chips, it’s a microprocessor.
Von Neumann vs. Harvard Architecture
Single memory for code and data; one set of buses; simpler, cheaper. Cannot fetch instruction and data simultaneously. Used in 8085, 8086, x86 PCs.
Separate memories and buses for code and data; enables parallel access; faster but more complex. Used in 8051, AVR, ARM Cortex-M (modified Harvard).
Most modern CPUs are modified Harvard at the cache level (separate I-cache and D-cache) while remaining Von Neumann at main memory.
Evolution of Intel Microprocessors
| Processor | Year | Data Bus | Addr. Bus | Clock | Transistors | Highlights |
|---|---|---|---|---|---|---|
| 4004 | 1971 | 4-bit | 12 | 740 kHz | 2,300 | First μP, calculators |
| 8008 | 1972 | 8-bit | 14 | 800 kHz | 3,500 | 8-bit successor |
| 8080 | 1974 | 8-bit | 16 | 2 MHz | 6,000 | First general-purpose 8-bit |
| 8085 | 1976 | 8-bit | 16 | 3 MHz | 6,500 | Single +5 V supply |
| 8086 | 1978 | 16-bit | 20 | 5–10 MHz | 29,000 | First 16-bit, 1 MB memory |
| 8088 | 1979 | 8-bit ext. | 20 | 5 MHz | 29,000 | Original IBM-PC |
| 80286 | 1982 | 16-bit | 24 | 6–25 MHz | 134,000 | Protected mode, 16 MB |
| 80386 | 1985 | 32-bit | 32 | 16–33 MHz | 275,000 | First 32-bit, paging |
| 80486 | 1989 | 32-bit | 32 | 25–100 MHz | 1.2 M | Integrated FPU + cache |
| Pentium | 1993 | 64-bit | 32 | 60–300 MHz | 3.1 M | Superscalar |
| Core i-series | 2008+ | 64-bit | 36/40 | GHz | >109 | Multi-core, SIMD, AVX |
The 8085 Microprocessor (Foundations)
The Intel 8085 (1976) is the conceptual ancestor of every later x86 chip and the starting point for understanding the 8086.
- 8-bit data bus, 16-bit address bus ⇒ 64 KB memory space
- Multiplexed AD0–AD7 (lower address + data), demultiplexed by ALE
- Clock: 3–5 MHz; single +5 V supply
- 246 instructions (74 unique), 8-bit accumulator-based
- 5 hardware interrupts: TRAP, RST 7.5, 6.5, 5.5, INTR
It is small enough to fit on one whiteboard but contains every fundamental idea — registers, ALU, instruction fetch–decode–execute, interrupts, multiplexed buses — that will be seen in the 8086.
8085 Register Set
| Register | Function |
|---|---|
| A (Accumulator) | 8-bit result register for the ALU |
| B, C, D, E, H, L | General-purpose 8-bit; paired as BC, DE, HL (16-bit) |
| SP (Stack Pointer) | 16-bit, points to top of stack |
| PC (Program Counter) | 16-bit, address of the next instruction |
| Flag Register | S, Z, AC, P, CY (5 flags) |
The M operand in instructions like MOV A,M refers to memory pointed to by HL — the 8085’s primary indirect-addressing mechanism.
8086 Microprocessor: Architecture
Salient Features
- 16-bit data bus, 20-bit address bus; addressable memory: 220 = 1 MB
- Clock: 5, 8, 10 MHz versions; 40-pin DIP, HMOS, +5 V, ~29,000 transistors
- Pipelined architecture (BIU + EU), 6-byte instruction queue
- Two operating modes: Minimum and Maximum
- Memory segmentation (4 segments × 64 KB)
- Supports multiprocessor configurations
The 8086 is internally pipelined: while the Execution Unit (EU) executes one instruction, the Bus Interface Unit (BIU) fetches the next. This overlap nearly doubles throughput vs. the 8085.
The 8086’s instruction set is a strict superset of the 8080’s (and effectively the 8085’s) at the source level — old code could be ported with minor effort.
Internal Architecture: BIU and EU
General-Purpose Registers
All four are 16-bit and split into two 8-bit halves:
| Register | Halves | Primary Purpose |
|---|---|---|
| AX (Accumulator) | AH, AL | Arithmetic, I/O, MUL/DIV implicit operand |
| BX (Base) | BH, BL | Base address for memory addressing (with DS) |
| CX (Counter) | CH, CL | Loop counter (LOOP), shift count, REP |
| DX (Data) | DH, DL | I/O port address; high word in MUL/DIV |
Many instructions require a specific register: MUL src multiplies AL or AX with src; LOOP decrements CX; IN/OUT uses DX for ports > 255.
Pointer, Index and Segment Registers
- SP – Stack Pointer (offset in SS)
- BP – Base Pointer (default SS; stack frames)
- IP – Instruction Pointer (offset in CS)
- SI – Source Index (DS by default)
- DI – Destination Index (DS or ES in strings)
- CS – Code Segment
- DS – Data Segment
- SS – Stack Segment
- ES – Extra Segment
CS:IP | SS:SP | SS:BP | DS:BX/SI/DI | ES:DI (strings)
Flag Register (16-bit, 9 Flags Used)
| 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| – | – | – | – | OF | DF | IF | TF | SF | ZF | – | AF | – | PF | – | CF |
Status Flags (set by ALU result):
- CF – Carry from MSB (unsigned overflow)
- PF – Parity (even = 1)
- AF – Auxiliary carry (bit 3→4, for BCD)
- ZF – Zero result
- SF – Sign (MSB of result)
- OF – Signed overflow
Control Flags (set by programmer):
- TF – Trap (single-step debug)
- IF – Interrupt enable (
STI/CLI) - DF – Direction (string ops,
STD/CLD)
Memory Segmentation
The 8086 has 16-bit registers but a 20-bit address bus. Memory is divided into segments of 64 KB each, addressed by a 16-bit segment register + 16-bit offset.
Equivalently: shift the segment left by 4 bits (one hex digit), then add the offset.
- Programs are relocatable (just change the segment register)
- Logical separation of code, data, stack, extra segments
- Allows multiple programs to coexist without conflict
- Memory > 64 KB usable without widening internal registers
Given any SEG:OFFSET, compute the physical address in under 10 seconds. Practise with at least 20 random pairs before any examination.
Stack in the 8086
- Reserved area in the stack segment (SS)
- SP holds the offset of the top; grows downward (toward lower addresses)
- Operates word-by-word:
PUSHwrites 2 bytes;POPreads 2 bytes
PUSH src: SP ← SP−2; [SP] ← src
POP dst: dst ← [SP]; SP ← SP+2
Pin Diagram and Bus De-multiplexing
During T1 of every bus cycle the 8086 outputs the address with ALE high; an octal latch (74LS373) captures it on the falling edge of ALE, freeing AD0–AD15 for data in T2–T4.
Minimum vs. Maximum Mode
MN/M̅X̅ pin tied to +5 V. The 8086 generates all control signals directly: M/I̅O̅, RD̅, WR̅, ALE, DT/R̅, DE̅N̅, HOLD/HLDA. Used for single-CPU boards and small embedded systems.
MN/M̅X̅ pin tied to GND. Three encoded status lines S̅2, S̅1, S̅0 decoded by the external 8288 Bus Controller. Used for multiprocessor designs and when an 8087 NDP or 8089 I/O processor is present.
| S̅2 | S̅1 | S̅0 | Bus Cycle Generated |
|---|---|---|---|
| 0 | 0 | 0 | Interrupt acknowledge |
| 0 | 0 | 1 | Read I/O port |
| 0 | 1 | 0 | Write I/O port |
| 0 | 1 | 1 | Halt |
| 1 | 0 | 0 | Code (instruction) fetch |
| 1 | 0 | 1 | Read memory |
| 1 | 1 | 0 | Write memory |
| 1 | 1 | 1 | Passive (no bus cycle) |
Bus Cycle T-States
- T1: address output + ALE high
- T2: address tri-stated, RD̅ or WR̅ asserted
- T3: data on bus
- T4: data latched, control signals deasserted
If READY = 0 during T3, wait states Tw are inserted until the memory or peripheral is ready.
Addressing Modes of 8086
The 8086 supports seven basic addressing modes for memory/register references, plus I/O and control-transfer modes.
| # | Mode | Operand Form | Example |
|---|---|---|---|
| 1 | Immediate | Constant in instruction | MOV AL, 25H |
| 2 | Register | Operand in register | MOV AX, BX |
| 3 | Direct | 16-bit address in instruction | MOV AX, [1234H] |
| 4 | Register Indirect | Address in BX/SI/DI | MOV AX, [BX] |
| 5 | Based | BX or BP + displacement | MOV AX, [BX+4] |
| 6 | Indexed | SI or DI + displacement | MOV AX, [SI+2] |
| 7 | Based-Indexed | Base + Index (+ displacement) | MOV AX, [BX+SI+6] |
I/O modes: Direct (IN AL, 60H) and Indirect (IN AL, DX).
Control transfer: Intra-segment (NEAR) and Inter-segment (FAR), each direct or indirect.
Effective Address (EA) Computation
\[\text{EA} = \underbrace{\{\text{BX or BP}\}}_{\text{base}} + \underbrace{\{\text{SI or DI}\}}_{\text{index}} + \underbrace{\text{displacement}}_{\text{8 or 16-bit}}\] \[\text{Physical Address} = \text{Segment Register}\times 16 + \text{EA}\]If BP is used ⇒ SS is the default segment; otherwise DS is the default. Override with a segment prefix, e.g., MOV AX, ES:[BX].
DS=1000H, BX=0200H, SI=0050H; instruction MOV AX, [BX+SI+6]:
EA = 0200+0050+6 = 0256H; PA = 1000H×10 + 0256H = 10256H.
8086 Instruction Set
Data Transfer Instructions
None of these affect flags (except SAHF/POPF, which restore them).
| Instruction | Operation | Example |
|---|---|---|
MOV dst, src | dst ← src | MOV AX, BX |
XCHG dst, src | swap dst ↔ src | XCHG AX, BX |
PUSH src | SP←SP−2; [SP]←src | PUSH AX |
POP dst | dst←[SP]; SP←SP+2 | POP BX |
LEA dst, src | dst ← offset of src | LEA SI, MSG |
LDS reg, src | reg ← [src], DS ← [src+2] | LDS SI, PTR |
LES reg, src | similar but loads ES | LES DI, PTR |
IN AL, port | AL ← port | IN AL, 60H |
OUT port, AL | port ← AL | OUT 80H, AL |
XLAT | AL ← [BX+AL] | XLAT |
LAHF / SAHF | AH ↔ low byte of FLAGS | LAHF |
PUSHF / POPF | flags ↔ stack | PUSHF |
Arithmetic Instructions
| Instruction | Operation | Flags |
|---|---|---|
ADD dst, src | dst ← dst+src | all status |
ADC dst, src | dst ← dst+src+CF | all status |
SUB dst, src | dst ← dst−src | all status |
SBB dst, src | dst ← dst−src−CF | all status |
INC / DEC dst | ±1 | all except CF |
NEG dst | dst ← −dst (two’s comp.) | all status |
CMP dst, src | dst−src (discarded) | all status |
MUL src | AX ← AL×src (8-bit); DX:AX ← AX×src (16-bit) — unsigned | CF, OF |
IMUL src | signed multiply | CF, OF |
DIV src | AL=AX/src, AH=AX mod src (or DX:AX/src) | undefined |
IDIV src | signed divide | undefined |
DAA / DAS | BCD adjust after add/subtract | status |
CBW / CWD | sign-extend AL→AX, AX→DX:AX | none |
There is no MUL AX,BX. Always MUL src: AL/AX is the implicit other operand; result goes to AX or DX:AX.
Logical, Shift and Rotate
| Instruction | Operation | Notes |
|---|---|---|
AND, OR, XOR | bit-wise | CF=OF=0; SF, ZF, PF set |
NOT | one’s complement | no flags |
TEST dst, src | dst AND src (discarded) | sets flags only |
SHL / SAL dst, n | shift left (n=CL if count>1) | CF = last bit shifted out |
SHR dst, n | logical right shift | MSB filled with 0 |
SAR dst, n | arithmetic right shift | MSB retained (sign) |
ROL / ROR | rotate without carry | wraps around |
RCL / RCR | rotate through carry | CF is part of the chain |
SHL AX,1 ≡ AX×2; SAR AX,1 ≡ signed÷2. Much faster than MUL/DIV.
Control Transfer Instructions
| Instruction | Condition / Action |
|---|---|
| Unconditional | |
JMP target | unconditional jump (short/near/far) |
CALL target | push return address, jump |
RET / RETF | return (near / far) |
| Conditional (after CMP or arithmetic) | |
JE/JZ, JNE/JNZ | ZF=1, ZF=0 |
JC/JB/JNAE, JNC/JAE/JNB | CF=1, CF=0 |
JA/JNBE, JBE/JNA | unsigned greater / ≤ |
JG/JNLE, JL/JNGE | signed greater / less |
JGE/JNL, JLE/JNG | signed ≥ / ≤ |
JS, JNS | SF=1, SF=0 |
JO, JNO | OF=1, OF=0 |
JCXZ | CX = 0 |
| Loop | |
LOOP target | CX−−; if CX≠0, jump |
LOOPE / LOOPZ | CX−−; if CX≠0 and ZF=1, jump |
LOOPNE / LOOPNZ | CX−−; if CX≠0 and ZF=0, jump |
Signed vs. Unsigned Conditional Jumps
After CMP A,B, choose the jump that matches the intended interpretation.
| Relation | Unsigned (Above/Below) | Signed (Greater/Less) |
|---|---|---|
| A > B | JA / JNBE | JG / JNLE |
| A ≥ B | JAE / JNB | JGE / JNL |
| A < B | JB / JNAE | JL / JNGE |
| A ≤ B | JBE / JNA | JLE / JNG |
| A = B | JE / JZ | JE / JZ |
Use “A/B” (above/below) for unsigned, “G/L” (greater/less) for signed.
String Instructions
SI (source, DS) and DI (destination, ES); DF=0 auto-increments, DF=1 auto-decrements.
| Instruction | Operation | Prefix |
|---|---|---|
MOVSB / MOVSW | ES:[DI] ← DS:[SI], update SI, DI | REP |
CMPSB / CMPSW | compare DS:[SI] with ES:[DI] | REPE / REPNE |
SCASB / SCASW | compare AL/AX with ES:[DI] | REPE / REPNE |
LODSB / LODSW | AL/AX ← DS:[SI] | — |
STOSB / STOSW | ES:[DI] ← AL/AX | REP |
CLD ; direction = forward MOV CX, 100 REP MOVSB ; copy 100 bytes from DS:SI to ES:DI
Processor Control Instructions
| Instruction | Function |
|---|---|
CLC / STC / CMC | clear / set / complement CF |
CLD / STD | clear / set DF |
CLI / STI | clear / set IF |
HLT | halt until interrupt or reset |
NOP | no operation (XCHG AX,AX) |
WAIT | wait for TEST pin active |
ESC opcode, src | coprocessor escape (8087) |
LOCK | assert LOC̅K̅ for next instruction (bus arbitration) |
Instruction Execution Time — T-States
One T-state = one clock period. T-states = decoding + ALU work + bus cycles (each bus cycle = 4 T-states minimum).
| Instruction | Typical T-States | Notes |
|---|---|---|
MOV reg, reg | 2 | no memory access |
MOV reg, mem | 8 + EA | EA varies with addressing mode |
ADD reg, reg | 3 | ALU work only |
MUL r/m8 | 70–77 | multi-cycle iterative |
DIV r/m16 | 144–162 | most expensive single instruction |
LOOP target | 17 / 5 | taken / not taken |
INT n | 51 | stacks FLAGS, CS, IP; fetches vector |
At 5 MHz, Tclk=200 ns, so DIV r/m16 can take ~32 μs.
Assembly Language Programming
Assembler Directives (MASM/TASM)
| Directive | Purpose |
|---|---|
DB / DW / DD / DQ / DT | Define Byte / Word / DWord / QWord / 10-byte |
EQU | Symbolic constant: N EQU 10 |
ORG | Set location counter: ORG 100H |
SEGMENT … ENDS | Begin / end a segment |
ASSUME | Inform assembler which segment register is which |
PROC … ENDP | Define a procedure (NEAR/FAR) |
END | End of source file (with entry-point label) |
PUBLIC / EXTRN | Multi-file linkage |
OFFSET / SEG | Return offset / segment of a symbol |
PTR | Override default size (BYTE PTR [SI]) |
DUP | Allocate repeated data: ARR DB 100 DUP(0) |
A Complete 8086 Program: Sum of First N Natural Numbers
;----- Sum of first N natural numbers -----
.MODEL TINY
.CODE
ORG 100H
START: MOV CX, 10 ; N = 10
XOR AX, AX ; AX = sum = 0
MOV BX, 1 ; BX = i = 1
NEXT: ADD AX, BX
INC BX
LOOP NEXT ; CX-- ; jump if CX != 0
; AX now holds the sum (37H = 55 decimal)
MOV AH, 4CH ; DOS terminate
INT 21H
END START
Procedures
;----- Procedure: square AL, return in AX -----
SQUARE PROC NEAR
PUSH BX
MOV BL, AL
MUL BL ; AX = AL * BL
POP BX
RET
SQUARE ENDP
MOV AL, 7
CALL SQUARE ; AX = 49
Macros
;----- Macro to display a $-terminated string -----
PRINT MACRO MSG
MOV AH, 09H
LEA DX, MSG
INT 21H
ENDM
PRINT HELLO_MSG
PRINT GOODBYE_MSG
Procedure vs. Macro
| Aspect | Procedure | Macro |
|---|---|---|
| Code size | Once in memory | Once per invocation |
| Execution speed | Slower (CALL/RET overhead) | Faster (inline) |
| Parameter passing | Registers / stack | Textual substitution |
| Recursion | Supported | Not supported |
| Use when | Logic is long, called often | Logic is short, called rarely |
Short 2–3-line patterns → macro. Anything longer or called many times → procedure.
Sample Programs and Worked Examples
Example 1: Largest Number in an Array
;----- Find the largest of N bytes at DS:SI -----
; Inputs: SI -> array, CX = N (assume N >= 1)
; Output: AL = largest byte
MOV AL, [SI] ; first element = current max
DEC CX
JZ DONE
NEXT: INC SI
CMP AL, [SI]
JAE SKIP ; AL >= mem -> no change
MOV AL, [SI] ; new maximum
SKIP: LOOP NEXT
DONE: ; AL holds the result
Why JAE? Unsigned bytes. For signed data use JGE.
Example 2: Factorial (Iterative)
;----- Factorial of N (in CL), result in DX:AX -----
MOV AX, 1
XOR DX, DX
XOR CH, CH
OR CL, CL
JZ DONE ; 0! = 1
LOOP1: MOV BX, CX
MUL BX ; DX:AX = AX * BX
LOOP LOOP1
DONE: ; result in DX:AX
MUL BX produces a 32-bit unsigned product in DX:AX even when both operands are 16-bit.
Example 3: Block Move (REP MOVSB)
;----- Copy 256 bytes from DS:SRC to ES:DST -----
MOV AX, SEG SRC
MOV DS, AX
MOV SI, OFFSET SRC
MOV AX, SEG DST
MOV ES, AX
MOV DI, OFFSET DST
MOV CX, 256
CLD
REP MOVSB
Example 4: Counting Even and Odd Numbers
;----- Count even/odd bytes; BL=#even, BH=#odd -----
XOR BX, BX
SCAN: MOV AL, [SI]
TEST AL, 01H
JZ EVEN1
INC BH
JMP CONT
EVEN1: INC BL
CONT: INC SI
LOOP SCAN
Idiom: TEST AL,01H inspects the LSB without modifying AL.
Example 5: BCD-to-ASCII Conversion
;----- Convert packed BCD in AL (e.g. 59H) to AH:AL ('5','9') -----
MOV AH, AL
AND AL, 0FH ; lower nibble -> BCD digit
OR AL, 30H ; -> ASCII
MOV CL, 4
SHR AH, CL
OR AH, 30H ; upper nibble -> ASCII
Example 6: 32-bit Addition with ADC
;----- (DX:AX) = (DX:AX) + (CX:BX) -----
ADD AX, BX ; low halves; CF set if overflow
ADC DX, CX ; high halves + carry
CF is the bridge between successive words of a multi-precision operation. Same pattern with SBB for subtraction.
Example 7: String Length ($-terminated)
;----- Length of $-terminated string at ES:DI; output AX -----
MOV AL, '$'
MOV CX, 0FFFFH
CLD
REPNE SCASB
MOV AX, 0FFFFH
SUB AX, CX
DEC AX
REPNE = repeat while not equal. Stops when AL = ES:[DI] (sentinel found).
Interrupts of 8086
An interrupt suspends normal program execution and transfers control to an Interrupt Service Routine (ISR).
- CPU completes the current instruction.
- Pushes FLAGS, CS, IP onto the stack.
- Clears IF and TF.
- Reads the 4-byte vector from the IVT.
- Loads CS:IP from the vector ⇒ ISR begins.
IRETpops IP, CS, FLAGS and resumes.
A subroutine CALL pushes only CS:IP. An interrupt also pushes FLAGS and clears IF/TF. Return is IRET, not RET.
Interrupt Vector Table (IVT)
- Located at physical addresses
00000H–003FFH(first 1 KB). - 256 vectors, each 4 bytes: IP_low, IP_high, CS_low, CS_high.
- For type n: IVT address = n × 4.
| Type | Source | Description |
|---|---|---|
| 0 | Internal | Divide-by-zero error |
| 1 | Internal | Single-step (TF=1) |
| 2 | External | NMI (non-maskable pin) |
| 3 | Internal | Breakpoint (INT 3, 1-byte) |
| 4 | Internal | Overflow (INTO, if OF=1) |
| 5–31 | Reserved | Reserved by Intel |
| 32–255 | User | Software / external (INT n) |
INT 21H ⇒ vector at 21H×4 = 84H. CPU reads 4 bytes starting at 00000:0084H.
Hardware vs. Software Interrupts
- Asynchronous, triggered by external pins
- NMI: non-maskable, type 2
- INTR: maskable (IF=1 to enable); type number supplied during INTA cycles
- Usually routed through 8259A PIC
- Synchronous, executed by program
INT n— call ISR at vector nINTO— type 4 if OF=1- Used for DOS/BIOS services (
INT 21H,INT 10H…)
Internal exceptions > NMI > INTR > Single-step. Maskable interrupts can be temporarily disabled with CLI.
ISR Skeleton
;----- ISR template -----
ISR_X PROC FAR
PUSH AX
PUSH BX
PUSH DS
; ... actual work ...
MOV AL, 20H ; EOI to 8259A
OUT 20H, AL
POP DS
POP BX
POP AX
IRET
ISR_X ENDP
Three sacred rules: (1) preserve every register you change, (2) issue EOI to 8259A if hardware-driven, (3) end with IRET, not RET.
8259A Programmable Interrupt Controller
Expands the single INTR pin to 8 prioritized lines (IR0–IR7); cascadable to 64 inputs.
- 8 prioritized inputs; cascadable up to 64
- Programmable priority modes: fixed, rotating, specific
- Programmable interrupt vectors (no fixed mapping)
- Edge- or level-triggered inputs; masking via IMR
Internal registers: IRR latches requests; IMR blocks selected ones; Priority Resolver picks the highest non-masked; ISR tracks which is being serviced.
| Word | Purpose |
|---|---|
| ICW1 | Edge/level, single/cascade, need-ICW4 flag |
| ICW2 | Interrupt vector base address (T7–T3) |
| ICW3 | Master: which IR has slave; Slave: slave ID |
| ICW4 | 8086/8085 mode, AEOI, buffered, SFNM |
| OCW1 | Set/clear IMR bits (masking) |
| OCW2 | EOI commands, rotate priority |
| OCW3 | Poll, read IRR/ISR, special mask |
Until the ISR writes an EOI, lower-priority interrupts remain blocked. Simplest EOI: MOV AL,20H; OUT 20H,AL.
Up to 9 chips cascaded (1 master + 8 slaves) give 64 interrupt inputs. The original IBM-PC used one 8259A; the PC-AT added a second for 15 usable IRQs.
Memory Interfacing
| Type | Characteristic | Example |
|---|---|---|
| ROM | Mask-programmed, non-volatile | monitor / firmware |
| PROM | One-time user programmable | 27xx |
| EPROM | UV erasable | 2716, 2732, 2764 |
| EEPROM | Electrically erasable | 28Cxx |
| Flash | Block-erasable EEPROM | 29Fxx |
| SRAM | Bistable latch, fast, volatile | 6116 (2K×8) |
| DRAM | Capacitor, needs refresh, volatile | 4116, 4164 |
Memory Banking in the 8086
- Even bank (low byte D0–D7): selected by A0=0
- Odd bank (high byte D8–D15): selected by BH̅E̅=0
| BH̅E̅ | A0 | Operation |
|---|---|---|
| 0 | 0 | Whole word (aligned) |
| 0 | 1 | Upper byte only |
| 1 | 0 | Lower byte only |
| 1 | 1 | None |
Reading a word at an odd address takes two bus cycles instead of one — always align 16-bit data on even addresses!
Address Decoding
All unused address lines participate in CS̅ generation. No overlap; uses a 3-to-8 decoder (74LS138).
A single high-order address line used as CS̅. Cheaper but causes address foldback.
Memory-Mapped vs. I/O-Mapped I/O
| Aspect | Memory-Mapped I/O | I/O-Mapped I/O |
|---|---|---|
| Address space | Part of 1 MB | Separate 64 K I/O ports |
| Instructions | Any memory instruction | Only IN / OUT |
| Control signal | M/I̅O̅ = 1 | M/I̅O̅ = 0 |
| Memory loss | Yes (ports steal addresses) | No |
| Flexibility | Rich (any instruction on ports) | Limited (IN/OUT only) |
8255A Programmable Peripheral Interface
- 40-pin DIP, +5 V
- Three 8-bit ports: Port A, Port B, Port C
- Port C split into upper (PC7–PC4) and lower (PC3–PC0)
- A0, A1 select port; CS̅ from address decoding
- One 8-bit Control Word Register (CWR)
| A1 | A0 | Selected Register |
|---|---|---|
| 0 | 0 | Port A |
| 0 | 1 | Port B |
| 1 | 0 | Port C |
| 1 | 1 | CWR |
Operating Modes
Each port independently input or output. No handshaking.
Port A and B with handshaking (STB, IBF, INTR for input; OB̅F̅, AC̅K̅, INTR for output).
Port A only; full-duplex with 5 handshake lines from Port C.
Bit Set/Reset mode lets you set/clear individual Port-C bits without affecting A or B. Selected by D7 = 0 in the control word.
Control Word (I/O Mode, D7=1)
| Bit | Meaning |
|---|---|
| D7 | 1 = I/O mode select |
| D6, D5 | Group A mode: 00=Mode 0, 01=Mode 1, 1x=Mode 2 |
| D4 | Port A: 1=input, 0=output |
| D3 | PC upper: 1=input, 0=output |
| D2 | Group B mode: 0=Mode 0, 1=Mode 1 |
| D1 | Port B: 1=input, 0=output |
| D0 | PC lower: 1=input, 0=output |
PA=output, PB=input, PC upper=output, PC lower=input, all Mode 0:
CW = 1 00 0 0 0 1 12 = 83H
MOV AL,83H OUT CWR,AL
8255A Applications
App 1: LED Bar Display
; Port addresses: PA=80H, PB=82H, PC=84H, CWR=86H
START: MOV AL, 80H ; all ports output, Mode 0
OUT 86H, AL
MOV AL, 01H ; first LED on
NEXT: OUT 80H, AL
CALL DELAY
ROL AL, 1
JMP NEXT
App 2: Stepper Motor (Full-Step Sequence)
| Step | A | B | C | D | Hex |
|---|---|---|---|---|---|
| 1 | 1 | 0 | 0 | 0 | 01H |
| 2 | 0 | 1 | 0 | 0 | 02H |
| 3 | 0 | 0 | 1 | 0 | 04H |
| 4 | 0 | 0 | 0 | 1 | 08H |
MOV AL, 80H
OUT 86H, AL
MOV BL, 01H ; initial phase
MOV CX, N
ROT: MOV AL, BL
OUT 80H, AL
CALL DELAY
ROL BL, 1 ; CW; use ROR for CCW
LOOP ROT
App 3: Reading 8 Switches
MOV AL, 82H ; PA=out, PB=in, Mode 0
OUT 86H, AL
WAIT: IN AL, 82H
TEST AL, 01H
JNZ WAIT ; still open (pull-up=1)
MOV AL, 0FFH
OUT 80H, AL ; all LEDs on
8254 Programmable Interval Timer
Three independent 16-bit counters (Counter 0, 1, 2). Each counts in binary or BCD. Inputs: CLK, GATE; Output: OUT.
- Mode 0 – Interrupt on terminal count
- Mode 1 – Hardware retriggerable one-shot
- Mode 2 – Rate generator (divide-by-N)
- Mode 3 – Square-wave generator
- Mode 4 – Software-triggered strobe
- Mode 5 – Hardware-triggered strobe
Control Word
| Bit | Meaning |
|---|---|
| SC1, SC0 | Counter select: 00=C0, 01=C1, 10=C2, 11=read-back |
| RW1, RW0 | 00=latch, 01=LSB, 10=MSB, 11=LSB then MSB |
| M2, M1, M0 | Mode (000 to 101) |
| BCD | 0=binary 16-bit, 1=4-decade BCD |
Divisor = 1 MHz / 1 kHz = 1000. Mode 3, binary, LSB+MSB, Counter 0:
CW = 0011 01102 = 36H
MOV AL, 36H ; control word OUT CWR, AL MOV AX, 1000 ; divisor OUT C0, AL ; LSB MOV AL, AH OUT C0, AL ; MSB
8251A USART (Serial Communication)
A programmable serial communication chip designed to interface with the 8085/8086 family. Supports asynchronous (5–8 data bits, 1/1.5/2 stop bits, baud factor ×1/×16/×64) and synchronous modes; full-duplex, double-buffered; built-in parity, framing, and overrun error detection.
Important pins:
TxD,RxD– serial data linesTxC,RxC– transmit/receive clocksTxRDY,RxRDY– status signals to CPUDTR,DSR,RTS,CTS– modem controlC/D̅– control word / data select
After reset: Mode word → (optional sync chars) → Command word, then data transfer begins.
Mode word = 0100 11102 = 4EH
Command = 0011 01112 = 37H (TxEN, RxEN, ER, DTR, RTS)
RS-232C Standard
EIA serial interface between DTE and DCE. Negative logic: Logic 1 (mark) = −3 to −15 V; Logic 0 (space) = +3 to +15 V. TTL↔RS-232 conversion via MAX232.
Microcontroller pins (5 V / 3.3 V TTL) cannot drive RS-232 levels directly. MAX232 generates ±10 V from a single 5 V supply using charge pumps.
8237A DMA Controller
A dedicated hardware engine transfers data directly between memory and an I/O device, bypassing the CPU after initial setup.
Peripheral → DMAC: DREQ. DMAC → CPU: HOLD.
CPU → DMAC: HLDA (bus granted). DMAC → peripheral: DACK.
Transfer Modes
- Single transfer – one byte per DREQ; bus returned after each byte.
- Block transfer – DMAC keeps bus until entire block done. Fastest; CPU stalled.
- Demand transfer – continues while DREQ active; pauses otherwise.
- Cascade – one 8237A acts as slave to another (PC/AT: two 8237As, 7 usable channels).
- Four independent DMA channels
- Each channel has 16-bit base address and 16-bit word count
- Supports read, write, verify, and memory-to-memory transfers
- Programmable priority (fixed or rotating); auto-initialise on terminal count
The 8237A natively addresses 64 KB. For 1 MB systems, an external page register supplies the upper bits, with the DMAC walking inside a 64 KB page.
8279 Keyboard/Display Controller
A programmable controller handling a matrix keyboard (up to 8×8) and a multiplexed 7-segment display (up to 16 digits).
Keyboard scanning and display refreshing are repetitive housekeeping tasks. Off-loading them to the 8279 frees the CPU and removes flicker/missed-keystroke issues.
Three Sub-blocks
- Keyboard section – scan, debounce, FIFO of detected keys
- Display section – 16×8 display RAM + scan logic
- Scan section – shared row/column scan lines
Keyboard modes:
- Scanned keyboard: 2-key lockout or N-key rollover
- Scanned sensor matrix
- Strobed input (external strobe latches data)
Display modes:
- 8 or 16 character display
- Left-entry (calculator style)
- Right-entry (typewriter style)
When a key is detected and debounced, 8279 asserts IRQ; the CPU reads the FIFO to get the (row, column) code.
ADC and DAC Interfacing
Real-world signals are analog. ADC converts analog to digital for the CPU; DAC converts digital back to analog for actuators/outputs.
Key ADC parameters:
- Resolution (bits)
- Conversion time
- Reference voltage VREF
- Input range (unipolar/bipolar)
- Linearity, INL/DNL errors
ADC architectures:
- Flash – fastest, costly
- Successive Approximation (SAR) – balanced
- Dual-slope integrating – accurate, slow
- Sigma-Delta – audio/precision
ADC 0808/0809 – Classic 8-bit SAR
Read Channel 3: Worked Example
; control port=80H, data port=81H, status port=82H
MOV AL, 03H ; channel 3
OUT 80H, AL
OR AL, 18H ; ALE=1, SOC=1
OUT 80H, AL
AND AL, 0E7H
OUT 80H, AL
WAIT: IN AL, 82H
TEST AL, 01H
JZ WAIT ; wait for EOC
IN AL, 81H ; 8-bit digital sample
DAC 0808 – 8-bit Current-Output DAC
Output equation: \(V_{out} = V_{REF} \cdot D / 2^n\) where D = input code (0–255).
Most DACs use an R-2R resistor network: only two resistor values, monotonic by construction, easy to fabricate — the workhorse of integrated DAC design.
Increment a counter 0→255, OUT to DAC port; then decrement 255→0; repeat. The op-amp converts the staircase into a voltage ramp.
Advanced Intel Processors
Each generation widened registers, added new modes, deepened pipelines, and integrated more on-chip (cache, FPU, MMU) — all while preserving backward compatibility with 8086 code.
80286 (1982) – Protected Mode is Born
- 16-bit data bus, 24-bit address bus ⇒ 16 MB physical memory
- Real mode: 8086-compatible (1 MB). Protected mode: descriptor tables, 4 privilege rings.
- Hardware support for multitasking and memory protection; no paging yet.
80386 (1985) – The 32-bit Leap
- 32-bit registers (EAX, EBX, …, EIP, EFLAGS); 32-bit address bus ⇒ 4 GB
- Adds Virtual-8086 mode; built-in paging with 4 KB pages and two-level page tables
A 32-bit linear address is split [Dir | Table | Offset] = [10 | 10 | 12] bits, walking a two-level page-table tree — the foundation of every modern OS.
80486 (1989) – Integration and Pipelining
- On-chip 8 KB L1 cache (write-through); integrated FPU; five-stage pipeline
- DX2/DX4 variants: internal clock doubled/tripled vs. external bus
Pentium (1993) and Beyond
- Superscalar: two pipelines (U and V) — up to 2 IPC
- Split L1: 8 KB code + 8 KB data cache; 64-bit data bus; branch prediction (BTB)
- AMD’s x86-64 (2003) extended to 64-bit; Intel adopted as Intel 64
More parallelism per clock + more cores per chip + more memory hierarchy on-die.
CISC vs. RISC — The Quiet Convergence
Complex Instruction Set Computer: many instructions, variable length, complex addressing modes. Intel x86 family.
Reduced Instruction Set Computer: few simple instructions, fixed length, load/store architecture. ARM, RISC-V, MIPS.
The decoder splits each CISC instruction into smaller micro-ops, executed by a RISC-style out-of-order engine. Today’s x86 is effectively RISC-on-the-inside, CISC-on-the-outside.
Bonus Module: 8051 Microcontroller
A self-contained chip with CPU + RAM + ROM + I/O + Timers on one die. Designed for embedded control.
8051 Architecture at a Glance
- 8-bit CPU; Harvard architecture
- 4 KB internal ROM; 128 bytes internal RAM + 128 bytes SFR area
- Four 8-bit I/O ports: P0, P1, P2, P3
- Two 16-bit Timers/Counters: T0, T1
- One full-duplex UART; 5 interrupt sources (2 external, 2 timer, 1 serial)
- 64 KB external code memory + 64 KB external data memory
8051 Memory Organisation
- 00H–1FH: 4 banks of R0–R7
- 20H–2FH: bit-addressable area (128 bits)
- 30H–7FH: general-purpose RAM and stack
- Accumulator (A), B register
- Port latches P0–P3; Timer regs; SCON, SBUF; IE, IP
- PSW, SP, DPTR (DPH:DPL), PCON
Selected Special Function Registers
| SFR | Address | Purpose |
|---|---|---|
| ACC (A) | E0H | Accumulator; primary working register |
| B | F0H | Auxiliary; used in MUL/DIV |
| PSW | D0H | Program Status Word (CY, AC, F0, RS1, RS0, OV, –, P) |
| SP | 81H | Stack pointer (defaults to 07H after reset) |
| DPTR | 82H/83H | Data pointer for external memory |
| P0–P3 | 80H/90H/A0H/B0H | Port latches |
| TMOD | 89H | Timer mode control |
| TCON | 88H | Timer/interrupt control (bit-addressable) |
| SCON | 98H | Serial port control (bit-addressable) |
| SBUF | 99H | Serial data buffer |
| IE | A8H | Interrupt enable |
| IP | B8H | Interrupt priority |
8051 “Hello, Blink!” in Assembly
ORG 0000H ; reset vector
SJMP MAIN
ORG 0030H
MAIN: CLR P1.0
LOOP: CPL P1.0 ; toggle LED
ACALL DELAY
SJMP LOOP
DELAY: MOV R7, #200
D2: MOV R6, #250
D1: DJNZ R6, D1 ; ~500 us inner
DJNZ R7, D2 ; ~100 ms total
RET
END
Forty years on, the 8051 core lives inside SIM cards, USB controllers, RF transceivers, and countless industrial micros. Its tiny size, deterministic timing, and well-known toolchain keep it relevant.
Practice Problems
Given CS=1234H, IP=5678H, find the physical address of the instruction being fetched.
Solution: \(\text{PA}=\texttt{12340H}+\texttt{5678H}=\mathbf{\texttt{179B8H}}\) (≈96.7 KB into 1 MB)
A 20-bit physical address fits in exactly 5 hex digits (max FFFFFH = 1 MB−1).
DS=2000H, BX=0100H, SI=0050H. Find EA and PA for MOV AX,[BX+SI+20H].
Mode: based-indexed with displacement.
EA = 0100H+0050H+0020H = 0170H
PA = 20000H+0170H = 20170H. Word: bytes 20170H (low) and 20171H (high).
After MOV AL,7FH then ADD AL,01H, find CF, ZF, SF, OF, AF, PF.
Result: 7FH+01H = 80H (1000 0000).
- CF=0 — no carry out of bit 7
- ZF=0 — result non-zero
- SF=1 — MSB of result is 1
- OF=1 — signed overflow (positive + positive → negative)
- AF=1 — carry from bit 3 to bit 4
- PF=0 — odd number of 1-bits (just bit 7)
CF signals unsigned overflow; OF signals signed overflow. They are set independently by the same arithmetic.
The 8086 reads a word from physical address 2050H. Which bank(s) are accessed?
2050H is even ⇒ aligned word access. BH̅E̅=0, A0=0 ⇒ both banks selected in one bus cycle. Even bank supplies D0–D7; odd bank D8–D15.
If address were 2051H (odd): two bus cycles required (misalignment penalty).
SS=3000H, SP=0100H, AX=1234H, BX=ABCDH. After PUSH AX; PUSH BX; POP AX — state?
| Step | SP | AX | BX | 300FEH/FF | 300FCH/FD |
|---|---|---|---|---|---|
| Start | 0100 | 1234 | ABCD | – | – |
| PUSH AX | 00FE | 1234 | ABCD | 34/12 | – |
| PUSH BX | 00FC | 1234 | ABCD | 34/12 | CD/AB |
| POP AX | 00FE | ABCD | ABCD | 34/12 | CD/AB |
An 8086 runs at 5 MHz. An instruction takes 17 T-states. Execution time?
\[T_{CLK}=\frac{1}{5\,\text{MHz}}=200\,\text{ns}\] \[T_{exec}=17\times 200\,\text{ns}=3.4\,\mu\text{s}\]A loop body of 50 T-states executed 1000 times takes 50×1000×200 ns = 10 ms.
Five Ideas That Will Stay With You
- A microprocessor is just a very fast sequencer that fetches, decodes, and executes instructions on data living in memory.
- Segmentation, paging, privilege levels — successive layers of abstraction over the same physical memory.
- Interrupts and DMA are the two great mechanisms by which I/O escapes the CPU’s serial bottleneck.
- Almost every peripheral chip — 8255, 8254, 8259, 8251, 8237 — shares the same recipe: control register + data registers + status bits.
- The boundary between hardware and software is a useful fiction. Reading a datasheet is half the engineering.
Recommended Textbooks
- Douglas V. Hall, Microprocessors and Interfacing — Programming and Hardware, 2nd ed., TMH. The classic 8086 reference.
- A.K. Ray & K.M. Bhurchandi, Advanced Microprocessors and Peripherals, 3rd ed., McGraw Hill.
- Barry B. Brey, The Intel Microprocessors: 8086 to Pentium 4, 8th ed., Pearson.
- Y. Liu & G.A. Gibson, Microcomputer Systems: The 8086/8088 Family, 2nd ed., Prentice Hall.
- M.A. Mazidi et al., The 8051 Microcontroller and Embedded Systems, 2nd ed., Pearson.
- Ramesh S. Gaonkar, Microprocessor Architecture, Programming, and Applications with the 8085, 6th ed., Penram.
Where Next?
Embedded systems: ARM Cortex-M (STM32, NXP), AVR (Arduino), RISC-V (SiFive, ESP32-C).
Operating systems: processes, threads, scheduling, virtual memory and TLB.
Computer architecture: pipelining, hazards, branch prediction, cache coherence, out-of-order execution.
Hardware design: Verilog/VHDL, FPGAs, soft-core CPUs.
The fastest way to internalise this material is to build something: a stepper-motor driver, a UART terminal, a tiny RTOS scheduler. Theory sticks when soldered to practice.