Memory Hierarchies, Cache Memories
CACHE-1
Who Cares About the Memory Hierarchy?
Processor-DRAM Memory Gap (latency)
1000!
Moore s Law
CPU!
µProc 60%/yr. (2X/1.5yr)
Performance
100! 10!
Processor-Memory Performance Gap: (grows 50% / year) Less Law?
DRAM
1!
1980! 1981! 1982! 1983! 1984! 1985! 1986! 1987! 1988! 198 9! 1990! 1991! 1992! 1993! 1994! 1995! 1996! 1997! 1998! 1999! 2000!
DRAM 9%/yr. (2X/10 yrs)
Time
CACHE-2
Random Access Memory Technology
❍
Random is good: access time is “the same” for all locations ! As opposed to sequential access, as in tapes, disk ! SRAM: Static Random Access Memory
◊ Low density, high power, $expensive$, fast ◊ “Two inverters arguing” ◊ Static: content will last forever (until lose power) ◊ 4 transistors per memory cell (2 per inverter, 2 for selecting the cell/driving the output)
◊ SRAM cell about 10X larger than DRAM cell ◊ Burns A LOT more power than DRAM
! DRAM: Dynamic Random Access Memory
◊ High density, low power, cheap, slow ◊ FORGETS– “dynamic” is marketing: need to be refreshed regularly- memory that forgets!
◊ 2 Transistor per memory cell
CACHE-3
Structure of a random access memory
Memory cells: stores 1 bit each Word (select) lines
Additional signals: “Read/~Write” line Connects to all cells Memory “strobe” line Instructs memory to perform read/write decoder address Bit (input/output) lines
CACHE-4
Predictable Program Behavior: Locality of Reference
❍
Temporal locality:
! Recently accessed items are likely to be re-accessed in the near future ! If access X at time t, then highly likely also access X at time t ± D for small D
❍
Spatial locality:
! Items with addresses near one another in memory are likely to be referenced close together in time ! If access X at time t, then highly likely also access X ± A at time t ± D for small A, D
❍
Sequential locality
! Important special case of spatial locality ! next item accessed is likely to be at the next sequential address in memory ! If access X at time t, then also access X + B at time t + 1 for a constant B
CACHE-5
How are caches organized and accessed
❍ ❍ ❍
For simplicity, data is stored in fixed-length parcels called blocks or lines Data is tagged with its address in memory (so that it can be found and identified) As a black box:
CPU asks for data @ location X
Cache searches for data @ X
CPU gets data (it never talks to main memory)
If not there, cache asks main memory for data (called a cache miss)
CACHE-6
A 64B direct mapped cache
31 Tag (26) 6 5 index (3) 3 2 block offset 0 (3) MAR tag (26 bits) blocks (8 bytes) set (holds 1 block)
row dec
=?
word select (mux)
hit/miss signal
MDR CACHE-7
A 64B 2-way set associative cache
31 5 4 3 2 tag (27) index (2) 0 block (3) offset MAR blocks (8 bytes) set (holds 2 blocks) tag (27bits)
row dec
LATCH =?
LATCH =? block select (mux)
Notes: this cache is the same size as the (previous) direct mapped cache, but the index field is 1 bit shorter (index: 2 bits total) A direct-mapped cache is really a 1-way set associative cache
word select (mux)
MDR
CACHE-8
A 64B fully associative cache (also called a Content Addressable Memory or CAM)
31 tag tag (29bits) =? tag =? tag =? tag =? tag =? tag =? tag =? tag =? tag 3 2 block offset 0 MAR
Notes: Same size as previous caches, but no row decoder and no index at all! Comparators (one per block) take the place of the row decoder
word select (mux) MDR CACHE-9
Cache performance: terms
❍
❍
Hit: data is in the cache ! Hit Time: Time to get the data from the cache to the processor Miss: data is not in the cache ! Miss Ratio = number of misses / total number of accesses ! Miss Penalty: Time to get a missing block and get the data to the processor
CACHE-10
Cache performance and AAT
❍ ❍ ❍
To measure: (1) Run a program and