Wednesday, January 19, 2011

Cache Memory demystified.

What is Cache ?
The webster dictionary defines Cache as "a safe place for hiding and storing things."
For software developers like me, cache memory is a temporary storage place for information requested most frequently. It consists of relatively small area of fast local memory. As a result, the information stored may be delivered much quicker than it would be done from slower external devices, such as operating memory or disk subsystem.

Why is it Needed ?
In simple terms, its needed to speed up the operations.
e.g. Time to run code = clock cycles used for running the code + clock cycles spent waiting for memory fetch. For many years now, CPUs all over the world have sped up by more than 50% per year, over memory chip speed ups. Hence memory access is a bottleneck to fast computing.

Cache in short is that part of memory hierarchy which is used to improve average access time to slower memory. The hierarchy is defined as follows :

<----- FASTER -----

Registers -- L1 Cache -- L2 Cache -- RAM -- Disk/Tape

----- BIGGER ---->

Cache Basics

Before I go any further, let me put out a glossary of few Cache related terms, we will be using a lot:
I-Cache - Instruction Cache
D-Cache - Data Cache
U-Cache - Unified Cache
S-Cache - Secondary Cache
T-Cache - Tertiary Cache
B-Cache - Back up Cache
E-Box - Integer execution Unit / Arithmetic Logic Unit.
I-Box - Decoder and Scheduler Unit
F-Box - Floating Point Unit
A-Box - Load and Store Unit
C-Box - Cache and System Bus Controller Unit.

Now with that out of the way, let me bring out the big guns.

In any computer system, cache memory is separated logically and physically for so-called levels. The internal memory built-in to the processor core is called L1-Cache or Level 1 Cache and the external cache memory located either on processor module or the main board is called L2-Cache or Level 2 Cache.

Cache memory on the first level is further subdivided into two regions, of roughly same size, called I-Cache and D-Cache. Cache memory of the second level is almost always unified. The reason being the Cache memory of second level doesn't need to be as fast as of the first level. In real life I-Cache and D-Cache satisfy about 80%-90% of all requests, usually, so it makes room for a trade off.

A general Cache based system

Cache functioning
At run time, while executing some instruction, if there's no data available in the register, a request has to be generated and dispatched to the nearest cache level i.e. to D-Cache. If a miss returns, the request goes redirected towards the next cache level and so on and so forth. Then while retrieving the data, the system first writes the data from memory to cache and then cache to register.

If you wish to save/write some data, the resulting operation is in opposite direction as in, the data is first written to register, then to D-Cache, then to L2 Cache and if required, then to memory (Depending on the write policy being followed: Write-through or Write-back.)

To reduce the cost of memory transfer, more than one element is loaded into cache, for each request. The unit of transfer is called a cache block or cache line. The size of this line is architecture dependent and is usually expressed in bytes. A line is typically between 32 - 128 bytes long.

Every address field consists of two primary parts : a dynamic (tag) which contains higher address bits, and a static (index) which contains the lower address bit. The first one may be modified during run time but the second one is fixed. Imagine that there is some physical memory space organized in the form of M memory segments, sized equally. Suppose those memory segments and cache segments(lines) are of exactly same size. Therefore to obtain an address of a memory line, it needs to determine the number of its memory segment first and the number of memory line inside of that segment, second, and then to merge both numbers.

Cache mapping
There are three general methods for mapping main memory to cache.
1) Direct mapping : In direct mapping, a block from main memory can go in exactly one place in the cache. It works by doing the following simple calculation for each memory address.
Cache address = memory address MODULO cache size.
e.g. location 0 in the cache memory can be occupied by location 0,4,8... from main memory.

2) Set associative : N-way set associative cache memory means that information stored at some address in operating memory could be placed (cached) in N locations (lines) of this cache memory. So if the memory is logically segmented then each segment will have exactly one line capable of caching information located at some memory address. So when the data from a certain address is required, the appropriate lines in all the segments are queried and a hit or miss is delivered.

3) Fully associative : In this arrangement, a block from main memory can be placed in any location in the cache.

There are several algorithms to choose from for replacing the cache lines in associative memory cases.

  • LRU (Least Recently Used) - A line containing information with oldest last hit is replaced.
  • LRR (Least Recently Replaced) / FIFO (First In First Out) - The line containing information with oldest record is replaced.
  • LFU (Least Frequently Used) - A line containing information which is been requested least often is replaced.
  • Random - A random line is chosen for replacement.
Direct mapped cache is faster than fully associative but is least efficient because controller has no choice while writing because there is only one line available to choose from . So significant performance degradation occur if there are two or more memory addresses challenging each other for a single cache line. Fully associative cache is slowest yet most efficient for the same above stated reasons.

Cache Read - uses the principal of locality to bring pages into cache. There are two different types of localities :

  • Temporal Locality (Locality in Time) - states that if an item is referenced, it will tend to be referenced again soon. e.g. loops, reuse etc.
  • Spatial Locality (Locality in Space) - states that if an item is referenced, items whose addresses are close by tend to be referenced soon. e.g. array access
Both of these principles are satisfied by moving blocks/lines instead of words to the memory for each request. Optimal line size is based on properties of data bus and memory subsystem.

Cache Write - There are 2 main approaches for writing data to memory.

  • Write Through - states that the information is written to both the block in Cache and the block in lower level memory at the same time. This approach is simpler but costly.
  • Write Back - states that the information is written only to the block in the Cache. The modified block is written to the main memory only when it is replaced in Cache. This method is less costly as it reduces the memory writes but is more complicated as a lot of checks need to be in place to prevent data corruption.
That's all from me at the moment. There will be more to come. Please let me know of any comments.

Sun cache document
Various documents on the Internet.

No comments:

Post a Comment