CSCI 2021 HW14: Wrap-up and Review
CODE DISTRIBUTION: hw14-code.zip
- Download the code distribution every HW
- See further setup instructions below
CHANGELOG: Empty
1 Rationale
This HW reviews concepts from earlier assignments and lecture to prepare for the final exam. It is optional but provides some useful practice.
Associated Reading / Preparation
Review the following sections from Bryant and O'Hallaron:
- Ch 5 on code optimizations
- Ch 6 on the memory hierarchy especially writing cache friendly code
- Ch 9 on Virtual Memory and Paging
- Ch 7 on Linking and the ELF Format
- Previous chapters mentioned on the schedule to prepare for the comprehensive portion of the final exam.
Grading Policy
- There is no material to submit for this HW and it is not worth additional credit.
- There is an Exit Survey on Canvas which is linked under the Assignments section. This is an anonymous survey. Completing it will earn 1 additional Engagemen Point
- Students are also encouraged to complete the Student Rating of Teaching for the course here: https://srt.umn.edu/blue. A sufficient response rate will trigger a final exam question to be revealed during the last lecture.
2 Review Problem 1
Consider the following C program which runs a function and reports the addresses of several entities before pausing.
#include <unistd.h> #include <stdlib.h> #include <stdio.h> int important_func(int arg1, long arg2){ printf("%p : important_func()\n",&important_func); printf("%p : arg1\n",&arg1); printf("%p : arg2\n",&arg2); printf("program pid is %d\n",getpid()); printf("press any key to continue\n"); getchar(); } int main(int argc, char *argv[]){ int x = 1; long y = 2; important_func(x, y); return 0; }
Running the program creates the following output.
> gcc demoprog.c > ./a.out 0x5575f7169169 : important_func() 0x7ffea79aa9bc : arg1 0x7ffea79aa9b0 : arg2 program pid is 11045 press any key to continue
While the program is paused, pmap
is invoked in another terminal and
produces the following. Certain lines numbered in comments on the
right.
> pmap 11045 11045: a.out #Line 00005575f7168000 4K r---- a.out # 1 00005575f7169000 4K r-x-- a.out # 2 00005575f716a000 4K r---- a.out # 3 00005575f716b000 4K r---- a.out # 4 00005575f716c000 4K rw--- a.out # 5 00005575f8117000 132K rw--- [ anon ] # 6 00007fcdd5146000 136K r---- libc-2.29.so # 7 00007fcdd5168000 1328K r-x-- libc-2.29.so # 8 00007fcdd52b4000 304K r---- libc-2.29.so # 9 00007fcdd5300000 4K ----- libc-2.29.so #10 00007fcdd5301000 16K r---- libc-2.29.so #11 00007fcdd5305000 8K rw--- libc-2.29.so #12 00007fcdd5307000 24K rw--- [ anon ] #13 00007fcdd5340000 8K r---- ld-2.29.so #14 00007fcdd5342000 124K r-x-- ld-2.29.so #15 00007fcdd5361000 32K r---- ld-2.29.so #16 00007fcdd536a000 4K r---- ld-2.29.so #17 00007fcdd536b000 4K rw--- ld-2.29.so #18 00007fcdd536c000 4K rw--- [ anon ] #19 00007ffea798b000 132K rw--- [ stack ] #20 00007ffea79cf000 12K r---- [ anon ] #21 00007ffea79d2000 4K r-x-- [ anon ] #22 total 2296K
Answer the following questions.
- Which specific lines of
pmap
output correspond to which entities in the program? Aside from the address correspondence, what information thatpmap
prints supports this correspondence? - Despite the entities
important_func()
andarg1
appearing on the same line of C code, they are assigned very different memory addresses. Why is this the case? - We learned in our discussion of assembly programming that the first
6 arguments of functions are passed in registers and registers do
not have main memory addresses. However, the program prints out an
address for
arg1
andarg2
. Explain what is going on and how this discrepancy is resolved.
SOLUTION
3 Review Problem 2
The following is a compilation session that goes wrong. Study it and the errors that ensue.
> cat do_math.c # show code for program // Demo program using math operations. #include <stdio.h> #include <math.h> #include <unistd.h> int main(int argc, char *argv[]){ double e = M_E; double cos_e = cos(e); double sin_e = sin(e); double e_squared = pow(e,2.0); printf("E is %.3f\n",e); printf("cos(E) is %.3f\n",cos_e); printf("sin(E) is %.3f\n",sin_e); printf("E^2 is %.3f\n",e_squared); printf("program pid is %d\n",getpid()); printf("press any key to continue\n"); getchar(); return 0; } > gcc -c do_math.c # compile the code with -c option > file do_math.o do_math.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped > gcc -o math_prog do_math.c # compile the program to an executable /usr/bin/ld: /tmp/ccNWf9Yu.o: in function `main': do_math.c:(.text+0x26): undefined reference to `cos' /usr/bin/ld: do_math.c:(.text+0x3d): undefined reference to `sin' /usr/bin/ld: do_math.c:(.text+0x60): undefined reference to `pow' collect2: error: ld returned 1 exit status
- Why is it that the first
gcc -c
compilation succeeds while the secondgcc -o
fails? What is the difference between these two invocations? - What role does the
math.h
header file play in this compilation process? What would happen if it were removed? - What kind of error is reported in the second set of failures? Does
it require code changes to the
do_math.c
file? - How does one "fix" this problem to produce the ultimate executable.
SOLUTION
4 Review Problem 3
Continuing the previous problem with the do_math.c
program, while
running it the program, one can invoke pmap
to gather information on
the working memory of the program. The shows two simultaneous runs of
math_prog
along with its corresponding pmap
information.
#### SHELL 1 ###### #### SHELL 2 ###### > ./math_prog > ./math_prog E is 2.718 E is 2.718 cos(E) is -0.912 cos(E) is -0.912 sin(E) is 0.411 sin(E) is 0.411 E^2 is 7.389 E^2 is 7.389 program pid is 3334355 program pid is 3335206 press any key to continue press any key to continue #### SHELL 3 ###### #### SHELL 4 ###### > pmap 3334355 > pmap 3335206 3334355: ./math_prog 3335206: ./math_prog 000055eb074de000 4K r---- math_prog 000055d42f543000 4K r---- math_prog 000055eb074df000 4K r-x-- math_prog 000055d42f544000 4K r-x-- math_prog 000055eb074e0000 4K r---- math_prog 000055d42f545000 4K r---- math_prog 000055eb074e1000 4K r---- math_prog 000055d42f546000 4K r---- math_prog 000055eb074e2000 4K rw--- math_prog 000055d42f547000 4K rw--- math_prog 000055eb09132000 132K rw--- [ anon ] 000055d42f7fe000 132K rw--- [ anon ] 00007f8a8f301000 12K rw--- [ anon ] 00007f465f83c000 12K rw--- [ anon ] 00007f8a8f304000 148K r---- libc-2.30.so 00007f465f83f000 148K r---- libc-2.30.so 00007f8a8f329000 1332K r-x-- libc-2.30.so 00007f465f864000 1332K r-x-- libc-2.30.so 00007f8a8f476000 296K r---- libc-2.30.so 00007f465f9b1000 296K r---- libc-2.30.so 00007f8a8f4c0000 4K ----- libc-2.30.so 00007f465f9fb000 4K ----- libc-2.30.so 00007f8a8f4c1000 12K r---- libc-2.30.so 00007f465f9fc000 12K r---- libc-2.30.so 00007f8a8f4c4000 12K rw--- libc-2.30.so 00007f465f9ff000 12K rw--- libc-2.30.so 00007f8a8f4c7000 16K rw--- [ anon ] 00007f465fa02000 16K rw--- [ anon ] 00007f8a8f4cb000 60K r---- libm-2.30.so 00007f465fa06000 60K r---- libm-2.30.so 00007f8a8f4da000 624K r-x-- libm-2.30.so 00007f465fa15000 624K r-x-- libm-2.30.so 00007f8a8f576000 612K r---- libm-2.30.so 00007f465fab1000 612K r---- libm-2.30.so 00007f8a8f60f000 4K r---- libm-2.30.so 00007f465fb4a000 4K r---- libm-2.30.so 00007f8a8f610000 4K rw--- libm-2.30.so 00007f465fb4b000 4K rw--- libm-2.30.so 00007f8a8f611000 8K rw--- [ anon ] 00007f465fb4c000 8K rw--- [ anon ] 00007f8a8f645000 8K r---- ld-2.30.so 00007f465fb80000 8K r---- ld-2.30.so 00007f8a8f647000 124K r-x-- ld-2.30.so 00007f465fb82000 124K r-x-- ld-2.30.so 00007f8a8f666000 32K r---- ld-2.30.so 00007f465fba1000 32K r---- ld-2.30.so 00007f8a8f66f000 4K r---- ld-2.30.so 00007f465fbaa000 4K r---- ld-2.30.so 00007f8a8f670000 4K rw--- ld-2.30.so 00007f465fbab000 4K rw--- ld-2.30.so 00007f8a8f671000 4K rw--- [ anon ] 00007f465fbac000 4K rw--- [ anon ] 00007ffcf2746000 132K rw--- [ stack ] 00007fff6782d000 132K rw--- [ stack ] 00007ffcf27fc000 12K r---- [ anon ] 00007fff67876000 12K r---- [ anon ] 00007ffcf27ff000 4K r-x-- [ anon ] 00007fff67879000 4K r-x-- [ anon ] ffffffffff600000 4K --x-- [ anon ] ffffffffff600000 4K --x-- [ anon ] total 3624K total 3624K
- The
pmap
listings show entries forlibc-2.30.so
andlibm-2.30.so
. What are these entities and are they present in thedo_math
program? - Identify several parts of the working memory of the two
math_prog
processes that are distinct from each other. Also identify several parts that are shared and describe how this sharing is possible. - Both processes report using
3624K
of memory. Do you expect that once one process finishes, there will be3264K
of memory made available? Describe why or why not.
SOLUTION
5 Review Problem 4
Below are tables showing values in Main Memory and the state of a
small Cache. Beneath this is a short code fragment that sums elements
from main memory. The code has a user parameter stride
. For strides
of 1,2,3,4, work through the code changing cache and counting how many
cache hits/misses occur with the given stride. Show the final state of
cache in each case.
Main Memory
| Addr | Addr Bits | Value | |------+----------------+-------| | 20 | 00 10 0000 | 9 | | 24 | 00 10 0100 | 10 | | 28 | 00 10 1000 | 11 | | 2C | 00 10 1100 | 12 | | 30 | 00 11 0000 | 13 | | 34 | 00 11 0100 | 14 | | 38 | 00 11 1000 | 15 | | 3C | 00 11 1100 | 16 | | 40 | 01 00 0000 | 17 | | 44 | 01 00 0100 | 18 | | 48 | 01 00 1000 | 19 | | 4C | 01 00 1100 | 20 | | 50 | 01 01 0000 | 21 | | 54 | 01 01 0100 | 22 | | 58 | 01 01 1000 | 23 | | 5C | 01 01 1100 | 24 | | 60 | 01 10 0000 | 25 | | 64 | 01 10 0100 | 26 | | 68 | 01 10 1000 | 27 | | 6C | 01 10 1100 | 28 | |------+----------------+-------| | | Tag Set Offset | |
Cache
- Direct-mapped: 1 line per set
- 16-byte lines = 4-bit offset
- 4 Sets = 2-bit index
- 8-bit Address = 2-bit tag
- Total Cache Size = 64 bytes 4 sets * 16 bytes
- Mostly cold with some data from an earlier part of the code
| | | | Blocks/Line | | Set | V | Tag | 0-3 4-7 8-11 12-15 | |-----+---+-----+-----------------------| | 00 | 1 | 00 | 1 2 3 4 | | 01 | 0 | - | - | | 10 | 0 | - | - | | 11 | 0 | - | - | |-----+---+-----+-----------------------| | Set | V | Tag | 0-3 4-7 8-11 12-15 |
Code
int *data = (int *) 0x20; // address of start of the array int stride; scanf("%d", &stride); // user enters 1, 2, 3, or other values int sum = 0; for(int i=0; i<20; i += stride){ sum += data[i]; }
Stats for Strides:
Show sum, number of hits/misses, and final cache state for strides of 1,2,3,4.
- Stride 1: sum, misses/hits, final cache state
- Stride 2: sum, misses/hits, final cache state
- Stride 3: sum, misses/hits, final cache state
- Stride 4: sum, misses/hits, final cache state
SOLUTION
6 Review Problem 5
The code in reversal_benchmark.c
implements two functions which reverse the
elements of an array. They take markedly different approaches.
void reverse_arr1(int *arr, long size){ int *tmp = malloc(sizeof(int)*size); for(long i=0; i<size; i++){ tmp[i] = arr[size-i-1]; } for(long i=0; i<size; i++){ arr[i] = tmp[i]; } free(tmp); } void reverse_arr2(int *arr, long size){ for(long i=0; i<size/2; i++){ int tmp = arr[i]; arr[i] = arr[size-i-1]; arr[size-i-1] = tmp; } }
(A) Predictions
Predict which of these two functions will run faster based on their code characteristics. Justify your answer by contrasting the features of the two reversal functions.
SOLUTION
(B) Timing
Modify the provided reversal_benchmark.c
file to perform timing calculations
on the two functions as they are called on different sizes of arrays.
Print runtimes in a tabular format.
Make sure to compile/run with a convention like the following:
> gcc -Og reversal_benchmark.c # compile with minimal optimizations > ./a.out # show usage usage: ./a.out <min_pow2> <max_pow2> <repeats> > ./a.out 10 20 100 # run benchmark, size 2^10 to 2^20, 100 repeats per SIZE rev1 rev2 1024 ??? ??? 2048 ??? ??? 4096 ??? ??? 8192 ??? ??? 16384 ??? ??? 32768 ??? ??? 65536 ??? ??? 131072 ??? ??? 262144 ??? ??? 524288 ??? ??? 1048576 ??? ???
Paste the C code you wrote below to show you how you timed the function runs.
SOLUTION
(C) Analysis
Paste the table of numbers your code produced for timing the two reversal functions. Describe if the one you predicted would be faster actually was measured to be faster.
SOLUTION
(D) Optimized Reversal
After determining the faster reversal function, modify it further in an attempt to further improve its performance.