improving performance of caches

avg. memory access time = hit time + miss rate * miss penalty


  1. reduce hit time
  2. reduce miss rate
  3. reduce miss penalty

the first term (hit time) is small;
the second term (miss rate * miss penalty) is large.

therefore concentrate on (2) and (3) above, since they are more important than (1)

reducing cache misses

the three "C"s of cache design

  1. compulsory: first access to block that's not cached.
  2. capacity: if cache cannot contain all blocks for execution of a program, misses will occur because of blocks being discarded and later retrieved.
  3. conflict: if block replacement strategy is set associative or direct mapped then misses will occur if too many blocks are mapped to its set.

the 7 ways of dealing with conflicts

  1. increase associativity. tradeoff: more hardware complexity, increased cost and may decrease clock rate, slowing overall system.
  2. increase cache capacity or make block sizes larger. reduces miss rate. tradeoff: larger amounts of info have to be moved between cache and next lower level. increases miss penalty.
  3. use victim cache. (victim cache is a small cache holding discarded blocks due to miss). victim cache is checked on next miss to see if it ocntains desired data before going down to next lower level of memory hierarchy. typically a small fully associated victim cache is used with a direct mapped main cache. e.g. a 4 entry victim cache typ. reduces 50% of misses in a 4k direct mapped cache.
  4. pseudo associative caches: works like direct mapped cache on hit; but on a miss, another cache entry is checked before reverting to next lower memory level. thus it has a fast and slow hit time.
  5. hardware instruction prefetch: prefetch of instructions and data. instruction prefetch done in hardware outside cache. same can be done for data. can have multiple data streams each fetching at different addresses
  6. compiler controlled prefetching: compiler can insert prefetches to request data before needed: loops are prime targets for compiler controlled prefetching.
  7. compiler optimizations:

Cache hierarchy; typical cache sizes

y2k keyboard