CMSC216 Lab11: Matrices and Cache Optimization
- Due: 11:59pm Mon 01-Dec-2025 on Gradescope
- Approximately 1.00% of total grade
CODE DISTRIBUTION: lab11-code.zip
CHANGELOG: Empty
Table of Contents
1 Rationale
Basic understanding of how the memory system works to fulfill load/store requests is essential when one wishes to optimize program performance. Having an intuition on which styles of coding exploit cache memories will offer great performance benefits on most modern computing architectures. This Lab analyzes a simple set of routines that have different performance due to their memory access pattern. The final problem is to create a more efficient version of a routine which accesses memory in a fashion that exploits cache more effectively increasing performance.
Associated Reading / Preparation
Bryant and O'Hallaron: Ch 6 on The Memory Hierarchy. Ch 6.5 and 6.6 may be particularly helpful
Grading Policy
Credit for this exercise is earned by completing the code/asnwers here
and submitting a Zip of the work to Gradescope. Students are
responsible to check that the results produced locally via make test
are reflected on Gradescope after submitting their completed
Zip. Successful completion earns 1 Engagement Point.
Lab Exercises are open resource/open collaboration and students are encouraged to cooperate on labs. Students may submit work as groups of up to 5 to Gradescope: one person submits then adds the names of their group members to the submission.
See the full policies in the course syllabus.
2 Codepack
The codepack for the HW contains the following files:
| File | State | Description |
|---|---|---|
Makefile |
Provided | Makefile to buil the matsums_main program |
matvec.h |
Provided | Header file defining some types and functions for |
matvec_util.c |
Provided | Utility functions to manipulate matrices/vectors |
matsums_funcs.c |
EDIT | Small "algorithms" to time arithmetic operations |
matsums_main.c |
EDIT | Main function that times different summing techniques |
QUESTIONS.txt |
EDIT | Questions to answer |
test_lab11.org |
Testing | Tests for this lab |
test_matsums.org |
Testing | Code tests for the lab |
testy |
Testing | Test running scripts |
gradescope-submit |
Provided | Allows submission from the command line |
3 Timing Notes
This lab involves optimizing the performance of a function and the behavior of doing so will vary from one machine to the next. If run on an arbitrary machine, the optimization techniques intended may or may not provide the expected speedup.
For evaluation purposes, log into the machine grace-aws.umd.edu to
test the timing. This machine has known characteristics (processor
speed and cache behavior) and should provide reasonably consistent
results. Other machines like regular GRACE might work but are not
guaranteed.
4 QUESTIONS.txt File Contents
Below are the contents of the QUESTIONS.txt file for the lab.
Follow the instructions in it to complete the QUIZ and CODE questions
for the lab.
_________________
LAB11 QUESTIONS
_________________
Exercise Instructions
=====================
Follow the instructions below to experiment with topics related to
this exercise.
- For sections marked QUIZ, fill in an (X) for the appropriate
response in this file. Use the command `make test-quiz' to see if
all of your answers are correct.
- For sections marked CODE, complete the code indicated. Use the
command `make test-code' to check if your code is complete.
- DO NOT CHANGE any parts of this file except the QUIZ sections as it
may interfere with the tests otherwise.
- If your `QUESTIONS.txt' file seems corrupted, restore it by copying
over the `QUESTIONS.txt.bk' backup file.
- When you complete the exercises, check your answers with `make test'
and if all is well, create a zip file with `make zip' and upload it
to Gradescope. Ensure that the Autograder there reflects your local
results.
- IF YOU WORK IN A GROUP only one member needs to submit and then add
the names of their group.
Timing Note
===========
This code has been timed on grace-aws.umd.edu to verify it behaves as
expected in most cases but as with all timing code, system load and
other circumstances can affect the actual timing results. If you
strange times that seem to conflict with the expected answers, report
it to staff to get help.
QUIZ: Questions matsums
=======================
(A) Vector and Matrix
~~~~~~~~~~~~~~~~~~~~~
Examine the header file `matvec.h' which gives type definitions and
declares functions associated with a simple matrix and vector type.
Which of the following best describes how the `matrix_t` type in the
lab stores numerical matrices?
- ( ) The 2D matrix is stored as a 2D array in the field data; this
allows direct access via 2D indexing via `mat.data[i][j]`
- ( ) The type `matrix_t` is defined as the same as a `int[][]` so the
2D data can be directly accessed via `mat[i][j]`
- ( ) The 2D matrix is stored as a 1D array in the mat.data field. 2D
accesses like `(i,j)` must be translated to 1D locations via the
formula `mat.data[i*mat.cols) + j]`
- ( ) The 2D matrix is stored in a hash table with the `mat.data`
field storing data and a hashing function used to compute the 1D
location of a 2D element.
(B) Getting and Setting Elements
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
For the `matrix_t' and `vector_t' types, convenient means to get and
set elements is provided. This mechanism is used in the utility
functions in `matvec_util.c' and defined in `matvec.h'.
Which of these best describes the means to access matrix elements
provided and used in the lab?
- ( ) The functions `MGET(mat,i,j)` and `MSET(mat,i,j,x)` are
implemented in the header to get and set elements of a `matrix_t`
- ( ) The macros `MGET(mat,i,j)` and `MSET(mat,i,j,x)` are `#define`'d
in the header to get and set elements of a `matrix_t`
- ( ) Instructions are given about what formula and fields to use when
accessing and changing elements of the matrix.
- ( ) No easy way is given for to work with the `matrix_t` type or its
elements.
(C) row_sums vs col_sums Timing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Compile and run the `matsums_main' executable using the provided
`Makefile' (typing `make' should suffice).
(A) row_sums vs col_sums Timing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
LOG INTO THE MACHINE grace-aws.umd.edu
- This is a special TIMING MACHINE for which the times of this code
have been tuned
- Using a different machine for your timing may lead to problems
Compile and run the `matsums_main' executable using the provided
`Makefile' (typing `make' should suffice). Note how to run this
program: pass the row and column sizes as command line parameters.
,----
| >> ./matsums_main
| usage: ./matsums_main <rows> <cols>
`----
Run this program with differing parameters which control the number of
rows and columns in the matrix which is summed. Study the results,
including on a large matrix with 18000 rows and 8000 columns. Note
any speed differences between the Row and Column sum routines on large
matrices.
The result of this timing indicated the following relative speeds of
these two provided algorithms.
- ( ) Summing ROWS was SLOWER than summing COLUMNS
- ( ) Summing ROWS was the SAME speed as summing COLUMNS
- ( ) Summing ROWS was FASTER than summing COLUMNS
- ( ) The speed varies wildly from run to run so it is harder to say
which of ROWS or COLUMNS is faster
(B) row_sums vs col_sums speed
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Examine the source code for the functions `row_sums()' and
`col_sums()' in the file `matsums_funcs.c'. Describe why the
timing/speed differences you observed previously are occurring based
on features of the source code you see and the layout of the
`matrix_t' that is being summed. Consider especially the memory layout
and how the that of elements visited in each approach maps to the
actual data layout of the matrix data.
Which of the following best describes a reason for the relative speeds
of the algorithms?
- ( ) The two algorithms had equal speed as they perform the same
number of operations and have the same `O(R*C)` complexity.
- ( ) Row Summing was Faster because it accessed memory sequentially
while Column summing has a large stride in memory leading to poor
cache performance.
- ( ) Column Summing was Faster because it accessed memory
sequentially while Row has a stride in memory leading to poor cache
performance.
- ( ) Column summing was slower because the Row summing algorithm
exhibited better processor pipelining features allowing more
instructions per cycle to be executed.
CODE: opt_col_sums()
====================
Complete the function `opt_col_sums()' in file `matsums_funcs.c'. This
function should have identical behavior to `col_sums()' (it sums the
columns of the matrix into a provided array). However, it should be
*optimized* so that it achieves performance near to that of the
`row_sums()' function. To achieve this, re-arrange the loops to
iterate as efficiently as possible with respect to the memory
system. Feel free to ask course staff for hints on how to do this or
do some online research.
To time `opt_col_sums()', uncomment relevant blocks in the
`matsums_main.c' program that are provided for it.
Run your code on some of the large matrix examples previously used
such as an 18,000 by 8,000 matrix. If the speed for the optimized
column version is close to that of the row sums, try running the test
cases associated with the code via `make test-code'.
Submission
Follow the instructions at the end of Lab01 if you need a refresher on how to upload your completed exercise zip to Gradescope.