Last Updated: 2024-11-16 Sat 10:14

CMSC216 HW11: Matrices and Memory Optimization

CODE DISTRIBUTION: hw11-code.zip

  • Download the code distribution
  • See further setup instructions below

CHANGELOG:

1 Rationale

Basic understanding of how the memory system works to fulfill load/store requests is essential when one wishes to optimize program performance. Having an intuition on which styles of coding exploit cache memories will offer great performance benefits on most modern computing architectures.

This HW analyzes a simple set of routines that have different performance due to their memory access pattern. The final problem is to create a more efficient version of a routine which accesses memory in a fashion that exploits cache more effectively increasing performance.

Associated Reading / Preparation

Bryant and O'Hallaron: Ch 6 on The Memory Hierarchy. Ch 6.5 and 6.6 may be particularly helpful

Grading Policy

Credit for this HW is earned by taking the associated HW Quiz which is linked under Gradescope. The quiz will ask similar questions as those that are present in the QUESTIONS.txt file and those that complete all answers in QUESTIONS.txt should have no trouble with the quiz.

Homework and Quizzes are open resource/open collaboration. You must submit your own work but you may freely discuss HW topics with other members of the class.

See the full policies in the course syllabus.

2 Codepack

The codepack for the HW contains the following files:

File State Description
Makefile Provided Makefile to buil the matsums_main program
matvec.h Provided Header file defining some types and functions for
matvec_util.c Provided Utility functions to manipulate matrices/vectors
matsums_funcs.c EDIT Small "algorithms" to time arithmetic operations
matsums_main.c EDIT Main function that times different summing techinques
QUESTIONS.txt EDIT Questions to answer

3 Overview and Background

This HW deals with matrices and vectors. In C these are typically 2D arrays (matrices) and 1D arrays (vectors), though to increase performance of the memory system, other approaches are sometimes taken. Part of this lab is to examine the matrix_t and vector_t types provided as they may pertain to later assignments.

The lab focuses on analyzing two simple routines

  • sum_rows() which sums the rows of a matrix into an appropriately sized vector.
  • sum_cols() which sums the columns of a matrix into an appropriately sized vector.

Similar to discussion in class, these two have different performance properties that can only be explained through knowledge of the memory hierarchy.

4 What to Understand

Ensure that you understand

  • How different access patterns in memory can lead to significantly different speeds in code
  • How in some cases, re-arranging code to access memory in a different pattern can dramatically improve performance

5 Questions

Analyze the files in the provided codepack and answer the questions given in QUESTIONS.txt.

                           _________________

                            HW 11 QUESTIONS
                           _________________


Write your answers to the questions below directly in this text file to
prepare for the associated quiz. Credit for the HW is earned by
completing the associated online quiz on Gradescope.


Timing Note
===========

  This code has been timed on GRACE to verify it behaves as expected in
  most cases but as with all timing code, system load and other
  circumstances can affect the actual timing results. If you strange
  times that seem to conflict with the expected answers, report it to
  staff to get help.


PROBLEM 1: Code Overview
========================

(A) Vector and Matrix
~~~~~~~~~~~~~~~~~~~~~

  Examine the header file `matvec.h' which gives type definitions and
  declares functions associated with a simple matrix and vector type.

  For the matrix type `matrix_t' with R rows and C columns, how is a 2D
  matrix actually laid out in memory?


(B) Getting and Setting Elements
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  For the `matrix_t' and `vector_t' types, convenient means to get and
  set elements is provided. This mechanism is used in the utility
  functions in `matvec_util.c' and defined in `matvec.h'.

  Describe how one would get element (R,C) of `matrix_t' or set it to
  value Z.  Why are these mechanisms not functions?


PROBLEM 2: Timing Rows vs Columns
=================================

(A) row_sums vs col_sums Timing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  Compile and run the `matsums_main' executable using the provided
  `Makefile' (typing `make' should suffice).

  Run the code on `csel-kh1250-NN.cselabs.umn.edu' to ensure you get the
  expected performance experience.

  Run this program with differing parameters which control the number of
  rows and columns in the matrix which is summed. Show some example runs
  with different parameters including on large square matrix with 16000
  rows and 8000 columns.

  Note any speed differences between the two on large matrices.


(B) row_sums vs col_sums speed
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  Examine the source code for the functions `row_sums()' and
  `col_sums()' in the file `matsums_funcs.c'.  Describe why the
  timing/speed differences you observed previously are occurring based
  on features of the source code you see and the layout of the
  `matrix_t' that is being summed.


PROBLEM 3: opt_col_sums()
=========================

  Complete the function `opt_col_sums()' in file `matsums_funcs.c'. This
  function should have identical behavior to `col_sums()' (it sums the
  columns of the matrix into a provided array). However, it should be
  *optimized* so that it achieves performance near to that of the
  `row_sums()' function. To achieve this, re-arrange the loops to
  iterate as efficiently as possible with respect to the memory
  system. Feel free to ask course staff for hints on how to do this or
  do some online research.

  To time `opt_col_sums()', uncomment relevant blocks in the
  `matsums_main.c' program that are provided for it.

  Paste your source code and a copy of the timing results of running
  `matsums_main' on an 16000 by 8000 matrix.

Author: Chris Kauffman (profk@umd.edu)
Date: 2024-11-16 Sat 10:14