IA-32 and AMD64/Intel 64

The colloquially named "x86" family of instruction set architectures includes Intel's 32-bit IA-32 architecture and AMD's 64-bit AMD64 architecture, subsequently adopted by Intel as Intel 64.

The name "x86" is short for "80x86", and derives from the part numbers of early Intel chips, which all ended in "86", e.g., 8086, 80186, 80286, 80386, and 80486. The trend might have continued with the 80586, had courts not ruled that numbers cannot be trademarked. Thus was born the Pentium.

Operating modes

IA-32 processors operate in one of four modes at any time:

  • Real Mode, with a 16-bit segmented memory model
  • Protected Mode, which introduces memory protection and privilege levels
  • Virtual 86 Mode, which allows processors to run legacy real-mode software
  • System Management Mode, which is the bane of real-time software developers

AMD64 processors provide an additional operating mode, with two sub-modes:

  • Long Mode, which Intel calls IA-32e ("e" for "extensions")
    • 64-bit Mode, with a flat 64-bit memory model
    • Compatibility Mode, which allows 64-bit processors to run legacy 32-bit software

Over the years, Intel has learned the hard way that backwards compatibility sells chips. For that reason, all IA-32 and AMD64 processors boot in 16-bit real mode. Out of the box, the most advanced Intel Xeon Platinum is no more capable than an 8086 from the 1970s. System software is responsible for switching to long mode to unlock the advanced features of the chip.

Real mode

How could the 8086 microprocessor, with its 16-bit registers and data bus, address 20 bits (1 MiB) of memory? By adding two 16-bit registers — a segment register and an offset register — with the segment register shifted four bits to the left (equivalent to multiplying by 16). For example, the logical address 1234:1234h corresponds to the physical address 13574h:

           Segment register
  ┌───────┬───────┬───────┬───────┐
  │   1   │   2   │   3   │   4   │  << 4
  └───────┴───────┴───────┴───────┘
   15                            0

                   Offset register
          ┌───────┬───────┬───────┬───────┐
+         │   1   │   2   │   3   │   4   │
          └───────┴───────┴───────┴───────┘
           15                            0

  =========================================

               Physical Address
  ┌───────┬───────┬───────┬───────┬───────┐
  │   1   │   3   │   5   │   7   │   4   │
  └───────┴───────┴───────┴───────┴───────┘
   19                                    0

It's a clever trick, but has two quirks:

  1. While any given segment and offset uniquely define exactly one physical address, the converse is not true; any given physical address can be constructed from several combinations of segments and offsets.

    For example, the following segment:offset pairs produce equivalent physical addresses:

      F000          FF00          FFF0          FFFF 
    +  FFFF       +  0FFF       +  00FF       +  000F
      =====         =====         =====         =====
      FFFFF    ≡    FFFFF    ≡    FFFFF    ≡    FFFFF
    
  2. It is possible to overflow the 20-bit physical address, in which case the address wraps:

      FFFF
    +  FFFF
      =====
      0FFEF (NOT 10FFEF)
    

    Some software written for the 8086/8088 depended on this wrap-around. When the 80286 shipped in 1982 with a 24-bit address bus, system designers opted to disable its 20th address pin ("A20") for backwards compatibility with the previous processors. System software had to enable the A20 line in order to address more than 1 MiB of memory.

Real-address mode defines four segment registers, each with a specific purpose:

  • cs: code segment
  • ss: stack segment
  • ds: data segment
  • es: extra segment

Protected mode

In protected mode, all processor memory accesses go through a layer of indirection. The segment registers no longer contain 16-bit segment addresses, but are repurposed to hold segment selectors, which point to segment descriptors in a segment descriptor table:

    Segment Selector
           in
  Segmentation Register
┌────────────┬────┬─────┐          Global/Local
│ index      │ TI │ RPL │        Descriptor Table
└────────────┴────┴─────┘     ┌────────────────────┐
 15    │    3   2  1   0      │         ...        │
       │                      ├────────────────────┤
       └─────────────────────>│ Segment Descriptor │
                              ├────────────────────┤
                              │         ...        │
                              └────────────────────┘
                                         ▲
                                         │
                                         └─────── gdtr/ldtr

The programmer may select between two such tables: the global descriptor table (GDT) and the local descriptor table (LDT). The segment selector's Table Indicator (TI) bit determines whether the selector points to the GDT (TI = 0) or LDT (TI = 1). Interrupts involve another type of table, the interrupt descriptor table (IDT).

The segment selector's Requestor Privilege Level (RPL) records the CPU's Current Privilege Level (CPL) when the selector is loaded into a segment register.

The "null selector" — index = 0, TI = 0 — is used to represent unused segments. Memory accesses to a null segment raise #GP. For this reason, the first entry in the GDT cannot be used. Additionally, cs and ss can't be null — loading either with the null selector also raises #GP.

In addition to the four real mode segment registers, protected mode defines two additional registers:

  • fs: extra segment
  • gs: extra segment

Segment descriptors are 8-byte structures describing the base, size, type, and other attributes of a memory segment:

 63          56  55     51           48     43  40 39          32
┌──────────────┬───┬───┬───────────────┬───┬──────┬──────────────┐
│ base (24-31) │ G │ … │ limit (16-19) │ … │ type │ base (16-23) │
├──────────────┴───┴───┴───────────┬───┴───┴──────┴──────────────┤
│            base (0-15)           │         limit (0-15)        │
└──────────────────────────────────┴─────────────────────────────┘
 31                                                             0

The 32-bit base can address 4 GiB. The 20-bit limit (size) supports up to 1 MiB segments with byte-sized granularity (G = 0) or up to 4 GiB segments with 4 KiB page-sized granularity (G = 1).

Protected mode supports a form of hardware-assisted multitasking by saving task state in a task state segment (TSS). However, most operating systems don't use hardware multitasking, opting instead to manage task switches in software.

Virtual 86 mode

Allows protected mode operating systems to run legacy 16-bit real mode software.

System Management mode

Allows handling of high-priority non-maskable hardware interrupts for things like thermal control and battery charging. Invisible to operating systems, except as missing cycles.

Long mode

AMD64 calls it long mode; Intel 64 calls it IA-32e.

64-bit mode

The full capabilities of 64-bit processors are only available when running in 64-bit mode, which includes:

  • 64-bit virtual addresses in a flat memory model
  • 64-bit general purpose registers, and 8 more of them (r8 - r15)
  • RIP-relative addressing mode

As a sub-mode of long mode, 64-bit mode can be enabled on a per-code-segment basis. That is, in fact, one of the only uses of segmentation in 64-bit mode. Another legacy IA-32 feature not supported in long mode is hardware task switching via the task state segment (TSS).

By default, addresses are 64-bits long and operands are 32-bits long (though both can be overridden with instruction prefixes), yielding an LP64 data model: longs and pointers are 64-bits, while ints remain 32-bit. That results in smaller binaries and less wasted RAM than if ints were 64-bit.

Compatibility mode

Compatibility being key to the success of AMD64, this 64-bit sub-mode, which is enabled on a per-code-segment basis, allows legacy 16- and 32-bit protected mode software to run on 64-bit systems, concurrently with 64-bit applications.

Paging

When paging is enabled, the processor views main memory as an array of 4 KiB page frames and carves the virtual address space into 4 KiB pages that can be loaded into any page frame, or none at all (paged out to a swap area, for example).

32-bit

The 32-bit linear address produced by the segmentation unit is mapped to a 32-bit physical address by the memory management unit (MMU) using a two-level page table tree:

             32-bit Linear Virtual Address
┌─────────────────┬────────────────┬─────────────────────┐
│    Directory    │      Table     │        Offset       │
└─────────────────┴────────────────┴─────────────────────┘
 31      │      22 21      │     12 11              │    0
         │                 │                        │
         │                 │                        │
         │                 │    Page                │
         │                 │    Table               │
         │                 │  ┌───────┐             │
         │                 │  │  ...  │             │
         │    Page         │  │  ...  │             │
         │  Directory      │  ├───────┤             │
         │  ┌───────┐      └─>│  pte  │─────────┐   │
         │  │  ...  │         ├───────┤         ▼   ▼
         │  │  ...  │         │  ...  │     0xAAAAABBB
         │  ├───────┤         │  ...  │   32-bit Physical
         └─>│  pte  │────────>└───────┘      Address
            ├───────┤          4 KiB
            │  ...  │
            │  ...  │
   cr3 ────>└───────┘
              4 KiB

32-bit with physical address extensions (PAE)

The 32-bit linear address produced by the segmentation unit is mapped to a 36-bit physical address by the memory management unit (MMU) using a three-level page table tree.

64-bit

The 64-bit linear address is mapped to a 64-bit physical address by the MMU using a four-level page table tree.

Memory addressing

IA-32 addresses objects in memory by constructing effective addresses, relative to some segment, using a combination of base, index, scale, and displacement:

  • a base register (e.g., bp, bx), used by high-level languages to store the base address of data structures
  • an index register, representing a (possibly negative) array index
  • a scale factor (1, 2, 4, or 8), representing the size of array elements
  • a displacement, encoded in the instruction, representing the offset of a field within an aggregate type (like a C struct).

Logical addresses are composed of a segment selector and an effective address, or offset within the segment. Logical addresses are resolved to "linear", or virtual, addresses by the segmentation unit, and finally to physical addresses that can be driven onto the address bus by the paging unit, as depicted below:

          Base + [Scale x Index] + Displacement
                   ┌───────────────────┐
                   │ Effective Address │
                   └───────────────────┘
                             │
                             │
                             │
            Logical Address  ▼
┌──────────────────┬───────────────────┐
│ Segment Selector │      Offset       │
└──────────────────┴───────────────────┘
                   │
                   │  Segmentation unit
                   │
                   ▼
┌──────────────────────────────────────┐
│        Linear virtual address        │
└──────────────────────────────────────┘
                   │
                   │  Paging unit
                   │
                   ▼
┌──────────────────────────────────────┐
│           Physical Address           │
└──────────────────────────────────────┘
                   │
                   │  Bus unit
                   │
                   ▼
              Address Bus

Notable Processors

8086 (1978)

  • 16-bit data registers (AX, BX, CX, DX)
  • 16-bit address registers (SP, BP, SI, DI)
  • 16-bit segment registers (CS, DS, ES, SS)
  • 16-bit external data bus
  • 20-bit external address bus (1 MiB addressability)

8088 (1979)

An 8086 with an 8-bit external data bus. The processor in IBM's PC-XT.

80188 (1980)

Reduced chip count versions of the 8088.

80186 (1982)

Reduced chip count versions of the 8086.

80286 (1982)

  • 24-bit external address bus (16 MiB)
  • Protected mode with four privilege levels

80386 (1985)

  • 32-bit registers
  • 32-bit external data bus
  • 32-bit external address bus (4 GiB)
  • Segmented and flat memory models
  • 4-KiB paged virtual memory

80486 (1989)

  • 5-stage pipeline
  • 8-KiB cache
  • Optional integrated x87 math co-processor (FPU)

Pentium (1993)

  • 64-bit external data bus
  • Some models with 64-bit MMX registers
  • Superscalar (two pipelines = two instructions per clock)
  • 8-KiB instruction + data caches
  • Branch prediction unit
  • APIC

AMD64 (2003)

  • 64-bit general purpose registers

Later models

After the Pentium, the Intel processor family tree gets a bit confusing, with myriad marketing names (Pentium Pro, Pentium II, Pentium III, Pentium 4, Pentium M, Pentium D, and Pentium Extreme Edition, to name a few) that sound like minor variations on a theme, but in fact represent radically new designs based on a bevy of microarchitectures:

  • P6 Family
  • NetBurst
  • Pentium M
  • Core
  • Enhanced Core
  • Atom
  • Nehalem
  • Westmere
  • Sandy Bridge
  • Ivy Bridge
  • Ivy Bridge-E
  • Haswell
  • Haswell-E
  • Airmont
  • Silvermont
  • Broadwell
  • Skylake
  • Kaby Lake
  • Goldmont
  • Knights Landing
  • Goldmont Plus
  • Coffee Lake
  • Kinghts Mill
  • Cascade Lake
  • Ice Lake

See WikiChip's Intel Microarchitectures page for the gory details.

Reading