Last Updated: 2025-11-23 Sun 14:37

CMSC216 Lab11: Matrices and Cache Optimization

CODE DISTRIBUTION: lab11-code.zip

CHANGELOG: Empty

1 Rationale

Basic understanding of how the memory system works to fulfill load/store requests is essential when one wishes to optimize program performance. Having an intuition on which styles of coding exploit cache memories will offer great performance benefits on most modern computing architectures. This Lab analyzes a simple set of routines that have different performance due to their memory access pattern. The final problem is to create a more efficient version of a routine which accesses memory in a fashion that exploits cache more effectively increasing performance.

Associated Reading / Preparation

Bryant and O'Hallaron: Ch 6 on The Memory Hierarchy. Ch 6.5 and 6.6 may be particularly helpful

Grading Policy

Credit for this exercise is earned by completing the code/asnwers here and submitting a Zip of the work to Gradescope. Students are responsible to check that the results produced locally via make test are reflected on Gradescope after submitting their completed Zip. Successful completion earns 1 Engagement Point.

Lab Exercises are open resource/open collaboration and students are encouraged to cooperate on labs. Students may submit work as groups of up to 5 to Gradescope: one person submits then adds the names of their group members to the submission.

See the full policies in the course syllabus.

2 Codepack

The codepack for the HW contains the following files:

File State Description
Makefile Provided Makefile to buil the matsums_main program
matvec.h Provided Header file defining some types and functions for
matvec_util.c Provided Utility functions to manipulate matrices/vectors
matsums_funcs.c EDIT Small "algorithms" to time arithmetic operations
matsums_main.c EDIT Main function that times different summing techniques
QUESTIONS.txt EDIT Questions to answer
test_lab11.org Testing Tests for this lab
test_matsums.org Testing Code tests for the lab
testy Testing Test running scripts
gradescope-submit Provided Allows submission from the command line

3 Timing Notes

This lab involves optimizing the performance of a function and the behavior of doing so will vary from one machine to the next. If run on an arbitrary machine, the optimization techniques intended may or may not provide the expected speedup.

For evaluation purposes, log into the machine grace-aws.umd.edu to test the timing. This machine has known characteristics (processor speed and cache behavior) and should provide reasonably consistent results. Other machines like regular GRACE might work but are not guaranteed.

4 QUESTIONS.txt File Contents

Below are the contents of the QUESTIONS.txt file for the lab. Follow the instructions in it to complete the QUIZ and CODE questions for the lab.

                           _________________

                            LAB11 QUESTIONS
                           _________________


Exercise Instructions
=====================

  Follow the instructions below to experiment with topics related to
  this exercise.
  - For sections marked QUIZ, fill in an (X) for the appropriate
    response in this file. Use the command `make test-quiz' to see if
    all of your answers are correct.
  - For sections marked CODE, complete the code indicated. Use the
    command `make test-code' to check if your code is complete.
  - DO NOT CHANGE any parts of this file except the QUIZ sections as it
    may interfere with the tests otherwise.
  - If your `QUESTIONS.txt' file seems corrupted, restore it by copying
    over the `QUESTIONS.txt.bk' backup file.
  - When you complete the exercises, check your answers with `make test'
    and if all is well, create a zip file with `make zip' and upload it
    to Gradescope. Ensure that the Autograder there reflects your local
    results.
  - IF YOU WORK IN A GROUP only one member needs to submit and then add
    the names of their group.


Timing Note
===========

  This code has been timed on grace-aws.umd.edu to verify it behaves as
  expected in most cases but as with all timing code, system load and
  other circumstances can affect the actual timing results. If you
  strange times that seem to conflict with the expected answers, report
  it to staff to get help.


QUIZ: Questions matsums
=======================

(A) Vector and Matrix
~~~~~~~~~~~~~~~~~~~~~

  Examine the header file `matvec.h' which gives type definitions and
  declares functions associated with a simple matrix and vector type.

  Which of the following best describes how the `matrix_t` type in the
  lab stores numerical matrices?

  - ( ) The 2D matrix is stored as a 2D array in the field data; this
    allows direct access via 2D indexing via `mat.data[i][j]`
  - ( ) The type `matrix_t` is defined as the same as a `int[][]` so the
    2D data can be directly accessed via `mat[i][j]`
  - ( ) The 2D matrix is stored as a 1D array in the mat.data field. 2D
    accesses like `(i,j)` must be translated to 1D locations via the
    formula `mat.data[i*mat.cols) + j]`
  - ( ) The 2D matrix is stored in a hash table with the `mat.data`
    field storing data and a hashing function used to compute the 1D
    location of a 2D element.


(B) Getting and Setting Elements
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  For the `matrix_t' and `vector_t' types, convenient means to get and
  set elements is provided. This mechanism is used in the utility
  functions in `matvec_util.c' and defined in `matvec.h'.

  Which of these best describes the means to access matrix elements
  provided and used in the lab?
  - ( ) The functions `MGET(mat,i,j)` and `MSET(mat,i,j,x)` are
    implemented in the header to get and set elements of a `matrix_t`
  - ( ) The macros `MGET(mat,i,j)` and `MSET(mat,i,j,x)` are `#define`'d
    in the header to get and set elements of a `matrix_t`
  - ( ) Instructions are given about what formula and fields to use when
    accessing and changing elements of the matrix.
  - ( ) No easy way is given for to work with the `matrix_t` type or its
    elements.


(C) row_sums vs col_sums Timing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  Compile and run the `matsums_main' executable using the provided
  `Makefile' (typing `make' should suffice).


(A) row_sums vs col_sums Timing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  LOG INTO THE MACHINE grace-aws.umd.edu
  - This is a special TIMING MACHINE for which the times of this code
    have been tuned
  - Using a different machine for your timing may lead to problems

  Compile and run the `matsums_main' executable using the provided
  `Makefile' (typing `make' should suffice). Note how to run this
  program: pass the row and column sizes as command line parameters.
  ,----
  | >> ./matsums_main
  | usage: ./matsums_main <rows> <cols>
  `----

  Run this program with differing parameters which control the number of
  rows and columns in the matrix which is summed. Study the results,
  including on a large matrix with 18000 rows and 8000 columns.  Note
  any speed differences between the Row and Column sum routines on large
  matrices.

  The result of this timing indicated the following relative speeds of
  these two provided algorithms.

  - ( ) Summing ROWS was SLOWER than summing COLUMNS
  - ( ) Summing ROWS was the SAME speed as summing COLUMNS
  - ( ) Summing ROWS was FASTER than summing COLUMNS
  - ( ) The speed varies wildly from run to run so it is harder to say
    which of ROWS or COLUMNS is faster


(B) row_sums vs col_sums speed
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  Examine the source code for the functions `row_sums()' and
  `col_sums()' in the file `matsums_funcs.c'.  Describe why the
  timing/speed differences you observed previously are occurring based
  on features of the source code you see and the layout of the
  `matrix_t' that is being summed. Consider especially the memory layout
  and how the that of elements visited in each approach maps to the
  actual data layout of the matrix data.

  Which of the following best describes a reason for the relative speeds
  of the algorithms?

  - ( ) The two algorithms had equal speed as they perform the same
    number of operations and have the same `O(R*C)` complexity.
  - ( ) Row Summing was Faster because it accessed memory sequentially
    while Column summing has a large stride in memory leading to poor
    cache performance.
  - ( ) Column Summing was Faster because it accessed memory
    sequentially while Row has a stride in memory leading to poor cache
    performance.
  - ( ) Column summing was slower because the Row summing algorithm
    exhibited better processor pipelining features allowing more
    instructions per cycle to be executed.


CODE: opt_col_sums()
====================

  Complete the function `opt_col_sums()' in file `matsums_funcs.c'. This
  function should have identical behavior to `col_sums()' (it sums the
  columns of the matrix into a provided array). However, it should be
  *optimized* so that it achieves performance near to that of the
  `row_sums()' function. To achieve this, re-arrange the loops to
  iterate as efficiently as possible with respect to the memory
  system. Feel free to ask course staff for hints on how to do this or
  do some online research.

  To time `opt_col_sums()', uncomment relevant blocks in the
  `matsums_main.c' program that are provided for it.

  Run your code on some of the large matrix examples previously used
  such as an 18,000 by 8,000 matrix.  If the speed for the optimized
  column version is close to that of the row sums, try running the test
  cases associated with the code via `make test-code'.

Submission

Follow the instructions at the end of Lab01 if you need a refresher on how to upload your completed exercise zip to Gradescope.


Web Accessibility
Author: Chris Kauffman (profk@umd.edu)
Date: 2025-11-23 Sun 14:37