Thrash The Cache

I did some experiments some time back on Icache. Putting down something one can try easily and yeah, its fun!

First of all, get the Icache size of your processor. You can get it by running dmidecode in your linux box (Yeah, I am assuming you are using linux). Example: `dmidecode | grep -B 20 way`.

Next see the associativity of your cache. On my system, its 4 way and the output looks something like this:

Operational Mode: Write Back
Location: Internal
Installed Size: 32 KB
Maximum Size: 32 KB
Associativity: 4-way Set-associative

So each address of my cache can be mapped simultaneously to four addresses in RAM at 32K fixed offsets.
As the topic says, lets thrash the cache. For that, we need to make sure that our instructions are aligned such that every 32Kth location in the memory is accessed. The following code does it for 1-way cache. Writing for 4-way would be lengthy and redudant code. So chose 1-way.

func5(){
func6();
}
temp3(){
asm(“.space 32*1024 \n”);
}
func4(){
func5();
}

Here asm(“.space 32*1024 \n”) makes sure that function func4 is spaced at 32K from func5. Run`nm | grep func` to get the addresses. Call func4 in a large loop and time the whole code. Now remove temp3 and look at the execution time. It takes a lot less time than that with temp3 (Increase the number of times the function is called to see the effect ). Mine is 4-way, So I need eight such functions spaced apart to start the thrash! The fun here is to see it happening after getting to read about it in books. Seeing it work is exhilarating. Try it and yeah…..Happy thrashing!

Yeah, this also gives the ideal ‘what not to do’ for writing code.