Project P4
you will improve a program that does matrix multiplication so that it works better with a cache. You will be working with the pa5/matMul/ and pa5/cacheBlocking/ directories for this part. The pa5/matMul/ directory contains a fully written matrix multiplication program pa5/matMul/matMul.c, its pa5/matMul/autograder.py testing script, test cases in pa5/matMul/tests/, and expected answers in pa5/matMul/answers/. The pa5/cacheBlocking/ directory is where you will write your optimized version of matrix multiplication in pa5/cacheBlocking/cacheBlocking.c.
Correctness
First, your matrix multiplication program in cacheBlocking.c should correctly do matrix multiplication. You can use the testing harness in pa5/matMul/ to do this testing. The pa5/cacheBlocking/autograder.py script will also do tests to check for correct matrix multiplication.
Generating memory traces
Second, you can use valgrind to generate memory access traces using this command from the pa5/cacheBlocking/ directory:
valgrind --tool=lackey --trace-mem=yes ./cacheBlocking ../matMul/tests/matrix_a_2x2.txt ../matMul/tests/matrix_b_2x2.txt
Though you can and should just use the pa5/cacheBlocking/autograder.py script, which will call valgrind as above to generate memory traces.
The pa5/cacheBlocking/tests/ directory contains the memory access traces for the baseline pa5/matMul/matMul program that you are competing against.
Simulating memory accesses on a cache simulator
Third, you can use the reference simulator pa5/csim-ref to simulate the memory traces. For this part of the assignment, we assume a 256-byte 4-way set-associative LRU cache with 16-byte blocks (this design should sound familiar).
The pa5/cacheBlocking/answers/ directory contains the summary statistics for the baseline pa5/matMul/matMul program that you are competing against. You want to optimize your cacheBlocking.c program to perform better than the baseline assuming
the above cache design. For full credit, you should have lesser of both miss count and evictions than the baseline.