IA-32 and AMD64/Intel 64
The colloquially named "x86" family of instruction set architectures includes Intel's 32-bit IA-32 architecture and AMD's 64-bit AMD64 architecture, subsequently adopted by Intel as Intel 64.
The name "x86" is short for "80x86", and derives from the part numbers of early Intel chips, which all ended in "86", e.g., 8086, 80186, 80286, 80386, and 80486. The trend might have continued with the 80586, had courts not ruled that numbers cannot be trademarked. Thus was born the Pentium.
IA-32 processors operate in one of four modes at any time:
- Real Mode, with a 16-bit segmented memory model
- Protected Mode, which introduces memory protection and privilege levels
- Virtual 86 Mode, which allows processors to run legacy real-mode software
- System Management Mode, which is the bane of real-time software developers
AMD64 processors provide an additional operating mode, with two sub-modes:
- Long Mode, which Intel calls IA-32e ("e" for "extensions")
- 64-bit Mode, with a flat 64-bit memory model
- Compatibility Mode, which allows 64-bit processors to run legacy 32-bit software
Over the years, Intel has learned the hard way that backwards compatibility sells chips. For that reason, all IA-32 and AMD64 processors boot in 16-bit real mode. Out of the box, the most advanced Intel Xeon Platinum is no more capable than an 8086 from the 1970s. System software is responsible for switching to long mode to unlock the advanced features of the chip.
How could the 8086 microprocessor, with its 16-bit registers and data bus,
address 20 bits (1 MiB) of memory? By adding two 16-bit registers — a segment
register and an offset register — with the segment register shifted four bits
to the left (equivalent to multiplying by 16). For example, the logical address
1234:1234h corresponds to the physical address
Segment register ┌───────┬───────┬───────┬───────┐ │ 1 │ 2 │ 3 │ 4 │ << 4 └───────┴───────┴───────┴───────┘ 15 0 Offset register ┌───────┬───────┬───────┬───────┐ + │ 1 │ 2 │ 3 │ 4 │ └───────┴───────┴───────┴───────┘ 15 0 ========================================= Physical Address ┌───────┬───────┬───────┬───────┬───────┐ │ 1 │ 3 │ 5 │ 7 │ 4 │ └───────┴───────┴───────┴───────┴───────┘ 19 0
It's a clever trick, but has two quirks:
While any given segment and offset uniquely define exactly one physical address, the converse is not true; any given physical address can be constructed from several combinations of segments and offsets.
For example, the following segment:offset pairs produce equivalent physical addresses:
F000 FF00 FFF0 FFFF + FFFF + 0FFF + 00FF + 000F ===== ===== ===== ===== FFFFF ≡ FFFFF ≡ FFFFF ≡ FFFFF
It is possible to overflow the 20-bit physical address, in which case the address wraps:
FFFF + FFFF ===== 0FFEF (NOT 10FFEF)
Some software written for the 8086/8088 depended on this wrap-around. When the 80286 shipped in 1982 with a 24-bit address bus, system designers opted to disable its 20th address pin ("A20") for backwards compatibility with the previous processors. System software had to enable the A20 line in order to address more than 1 MiB of memory.
Real-address mode defines four segment registers, each with a specific purpose:
cs: code segment
ss: stack segment
ds: data segment
es: extra segment
In protected mode, all processor memory accesses go through a layer of indirection. The segment registers no longer contain 16-bit segment addresses, but are repurposed to hold segment selectors, which point to segment descriptors in a segment descriptor table:
Segment Selector in Segmentation Register ┌────────────┬────┬─────┐ Global/Local │ index │ TI │ RPL │ Descriptor Table └────────────┴────┴─────┘ ┌────────────────────┐ 15 │ 3 2 1 0 │ ... │ │ ├────────────────────┤ └─────────────────────>│ Segment Descriptor │ ├────────────────────┤ │ ... │ └────────────────────┘ ▲ │ └─────── gdtr/ldtr
The programmer may select between two such tables: the global descriptor table (GDT) and the local descriptor table (LDT). The segment selector's Table Indicator (TI) bit determines whether the selector points to the GDT (TI = 0) or LDT (TI = 1). Interrupts involve another type of table, the interrupt descriptor table (IDT).
The segment selector's Requestor Privilege Level (RPL) records the CPU's Current Privilege Level (CPL) when the selector is loaded into a segment register.
The "null selector" — index = 0, TI = 0 — is used to represent unused segments.
Memory accesses to a null segment raise #GP. For this reason, the first entry
in the GDT cannot be used. Additionally,
ss can't be null — loading
either with the null selector also raises #GP.
In addition to the four real mode segment registers, protected mode defines two additional registers:
fs: extra segment
gs: extra segment
Segment descriptors are 8-byte structures describing the base, size, type, and other attributes of a memory segment:
63 56 55 51 48 43 40 39 32 ┌──────────────┬───┬───┬───────────────┬───┬──────┬──────────────┐ │ base (24-31) │ G │ … │ limit (16-19) │ … │ type │ base (16-23) │ ├──────────────┴───┴───┴───────────┬───┴───┴──────┴──────────────┤ │ base (0-15) │ limit (0-15) │ └──────────────────────────────────┴─────────────────────────────┘ 31 0
The 32-bit base can address 4 GiB. The 20-bit limit (size) supports up to 1 MiB segments with byte-sized granularity (G = 0) or up to 4 GiB segments with 4 KiB page-sized granularity (G = 1).
Protected mode supports a form of hardware-assisted multitasking by saving task state in a task state segment (TSS). However, most operating systems don't use hardware multitasking, opting instead to manage task switches in software.
Virtual 86 mode
Allows protected mode operating systems to run legacy 16-bit real mode software.
System Management mode
Allows handling of high-priority non-maskable hardware interrupts for things like thermal control and battery charging. Invisible to operating systems, except as missing cycles.
AMD64 calls it long mode; Intel 64 calls it IA-32e.
The full capabilities of 64-bit processors are only available when running in 64-bit mode, which includes:
- 64-bit virtual addresses in a flat memory model
- 64-bit general purpose registers, and 8 more of them (
- RIP-relative addressing mode
As a sub-mode of long mode, 64-bit mode can be enabled on a per-code-segment basis. That is, in fact, one of the only uses of segmentation in 64-bit mode. Another legacy IA-32 feature not supported in long mode is hardware task switching via the task state segment (TSS).
By default, addresses are 64-bits long and operands are 32-bits long (though
both can be overridden with instruction prefixes), yielding an LP64 data model:
longs and pointers are 64-bits, while
ints remain 32-bit. That results in
smaller binaries and less wasted RAM than if
ints were 64-bit.
Compatibility being key to the success of AMD64, this 64-bit sub-mode, which is enabled on a per-code-segment basis, allows legacy 16- and 32-bit protected mode software to run on 64-bit systems, concurrently with 64-bit applications.
When paging is enabled, the processor views main memory as an array of 4 KiB page frames and carves the virtual address space into 4 KiB pages that can be loaded into any page frame, or none at all (paged out to a swap area, for example).
The 32-bit linear address produced by the segmentation unit is mapped to a 32-bit physical address by the memory management unit (MMU) using a two-level page table tree:
32-bit Linear Virtual Address ┌─────────────────┬────────────────┬─────────────────────┐ │ Directory │ Table │ Offset │ └─────────────────┴────────────────┴─────────────────────┘ 31 │ 22 21 │ 12 11 │ 0 │ │ │ │ │ │ │ │ Page │ │ │ Table │ │ │ ┌───────┐ │ │ │ │ ... │ │ │ Page │ │ ... │ │ │ Directory │ ├───────┤ │ │ ┌───────┐ └─>│ pte │─────────┐ │ │ │ ... │ ├───────┤ ▼ ▼ │ │ ... │ │ ... │ 0xAAAAABBB │ ├───────┤ │ ... │ 32-bit Physical └─>│ pte │────────>└───────┘ Address ├───────┤ 4 KiB │ ... │ │ ... │ cr3 ────>└───────┘ 4 KiB
32-bit with physical address extensions (PAE)
The 32-bit linear address produced by the segmentation unit is mapped to a 36-bit physical address by the memory management unit (MMU) using a three-level page table tree.
The 64-bit linear address is mapped to a 64-bit physical address by the MMU using a four-level page table tree.
IA-32 addresses objects in memory by constructing effective addresses, relative to some segment, using a combination of base, index, scale, and displacement:
- a base register (e.g.,
bx), used by high-level languages to store the base address of data structures
- an index register, representing a (possibly negative) array index
- a scale factor (1, 2, 4, or 8), representing the size of array elements
- a displacement, encoded in the instruction, representing the offset of a
field within an aggregate type (like a C
Logical addresses are composed of a segment selector and an effective address, or offset within the segment. Logical addresses are resolved to "linear", or virtual, addresses by the segmentation unit, and finally to physical addresses that can be driven onto the address bus by the paging unit, as depicted below:
Base + [Scale x Index] + Displacement ┌───────────────────┐ │ Effective Address │ └───────────────────┘ │ │ │ Logical Address ▼ ┌──────────────────┬───────────────────┐ │ Segment Selector │ Offset │ └──────────────────┴───────────────────┘ │ │ Segmentation unit │ ▼ ┌──────────────────────────────────────┐ │ Linear virtual address │ └──────────────────────────────────────┘ │ │ Paging unit │ ▼ ┌──────────────────────────────────────┐ │ Physical Address │ └──────────────────────────────────────┘ │ │ Bus unit │ ▼ Address Bus
- 16-bit data registers (AX, BX, CX, DX)
- 16-bit address registers (SP, BP, SI, DI)
- 16-bit segment registers (CS, DS, ES, SS)
- 16-bit external data bus
- 20-bit external address bus (1 MiB addressability)
An 8086 with an 8-bit external data bus. The processor in IBM's PC-XT.
Reduced chip count versions of the 8088.
Reduced chip count versions of the 8086.
- 24-bit external address bus (16 MiB)
- Protected mode with four privilege levels
- 32-bit registers
- 32-bit external data bus
- 32-bit external address bus (4 GiB)
- Segmented and flat memory models
- 4-KiB paged virtual memory
- 5-stage pipeline
- 8-KiB cache
- Optional integrated x87 math co-processor (FPU)
- 64-bit external data bus
- Some models with 64-bit MMX registers
- Superscalar (two pipelines = two instructions per clock)
- 8-KiB instruction + data caches
- Branch prediction unit
- 64-bit general purpose registers
After the Pentium, the Intel processor family tree gets a bit confusing, with myriad marketing names (Pentium Pro, Pentium II, Pentium III, Pentium 4, Pentium M, Pentium D, and Pentium Extreme Edition, to name a few) that sound like minor variations on a theme, but in fact represent radically new designs based on a bevy of microarchitectures:
- P6 Family
- Pentium M
- Enhanced Core
- Sandy Bridge
- Ivy Bridge
- Ivy Bridge-E
- Kaby Lake
- Knights Landing
- Goldmont Plus
- Coffee Lake
- Kinghts Mill
- Cascade Lake
- Ice Lake
See WikiChip's Intel Microarchitectures page for the gory details.