Strategies for Improving Main Memory (DRAM) Performance
- Wider main memory:
- more memory bandwidth (eg. 64 bit bus has twice the throughbut as 32 bit
bus [Pentium])
- cache controller (generally word width) needs to be multiplexed to main
memory bus
- board designers need to deal with many more traces and associated routing
problems
- Simple Interleaved Memory:
- access for sequential words spread across several memory dies
- speedup for sequential accesses *theoreticly* access time / number of banks
- complicates memory controller designes and board layouts
- little benefit for random word accesses
- best improvement for write-trough cache designs
- Independent Memory Banks:
- separate data and address busses to each bank
- very complicated memory controller designs
- very big improvement for multi-CPU designs as concurrent access to memory
is permitted
- Avoiding Memory Bank Conflicts:
- allow fast non-contiguous access for read "gather"
- allow fast non-contigouous access for write "scatter"
- problem is that many accesses are sequential and by nature go to the
same meory bank:
- sequentially interleaved memory banks easieset to implement
- modulo interleaved harder to implement but allow better perfomance for
sequential and random access
- DRAM Specific Interleaving:
- Nibble "mode"
- DRAM will have four square areas that can supply data simultaneously
- Page "mode"
- column address can be changed for fast access in a single column
- Static Colum "mode"
- same as above, but no need to toggle the CAS line, merely changing
the address reads new data
- circuit generally already on the DRAM die, so memory controllers can be
designed without new hardware
- RAMBUS
- new design requiring "intelligent" DRAM, as the memory bus
is "packet switched"
- 20 percent cost increase over traditional DRAM
- caches on DRAM die improve transfer rates by decreasing access times
going further
IEEE Trans. Multimedia june 1999, page 121, analog RAM chip
video conferencing cameras for mpeg4,
for optical flow.
16 by 256 analog memory chip.
7 bits dynamic range, runs at 5MHz
best for high bandwidth repetitive tasks
small images e.g. (50 by 50 pixels).
multiplexing in areas of pixels
they calculate the transforms on, invariant codings, fourier transform,
edge detection.