CMSC216 Lab10: I/O Redirection / Cache Effects
- Due: 11:59pm Sun 23-Nov-2025 on Gradescope
- Approximately 1.00% of total grade
CODE DISTRIBUTION: lab10-code.zip
- Download the code distribution
- See further setup instructions below
CHANGELOG: Empty
1 Rationale
Unix maintains a table of open File Descriptors for each running
process. Using the dup() and dup2() system calls, programs can
manipulate this table to achieve interesting effects, notably
redirection of output from standard locations to other places. This
exercise demonstrates some common techniques for doing so and will
acquaint students with the basics of how the file descriptor table
works and how it is inherited by child processes.
Timing stretches of code is often done via the clock()
function. This lab introduces that function and demonstrates its use
to time loops involving several types of structs that are arranged
differently in memory. The timing allows one to observe how struct
layout affects memory cache performance which reinforces the central
idea of efficient memory access: sequential access patterns tend to be
better than strided and random access patterns.
Grading Policy
Credit for this exercise is earned by completing the code/asnwers here
and submitting a Zip of the work to Gradescope. Students are
responsible to check that the results produced locally via make test
are reflected on Gradescope after submitting their completed
Zip. Successful completion earns 1 Engagement Point.
Lab Exercises are open resource/open collaboration and students are encouraged to cooperate on labs. Students may submit work as groups of up to 5 to Gradescope: one person submits then adds the names of their group members to the submission.
See the full policies in the course syllabus.
2 Codepack
The codepack for this exercise is linked at the top of this document. Always download it and unzip/unpack it. It should contain the following files which are briefly described.
| File | Use | Description |
|---|---|---|
QUESTIONS.txt |
EDIT | Questions to answer: fill in the multiple choice selections in this file. |
switch_stdout.c |
Study | Problem 1 C file to study to answer QUIZ questions |
childout_query.c |
EDIT | Problem 1 code outline to complete |
clock_demo.c |
Provided | Problem 2 demo on how the clock() function is used |
struct_stride.c |
EDIT | Problem 2 code used to observe CPU timing differences |
QUESTIONS.txt.bk |
Backup | Backup copy of the original file to help revert if needed |
Makefile |
Build | Enables make test and make zip |
testy |
Testing | Test running scripts |
test_lab10.org |
Testing | Tests for this exercise |
gradescope-submit |
Provided | Allows submission from the command line |
3 Problem 1: I/O Redirection
Programs often need to deal with open files for reading and
writing. The UNIX Operating System (Linux included in this) maintains
a data structure called the File Descriptor Table for all open
files. Some entries in this table are automatically created like
Standard Input and Standard Output. Others are created via the
open() system call. The table is maintained in Kernel Space and can
only be altered via system calls like open() / close() / ~dup() /
dup2().
It is useful to have some diagrams of how the dup() and dup2()
system calls manipulate the table of file descriptors. The following
diagrams will be discussed in lecture and may be used by course staff
to assist students in understanding how programs like
switch_stdout.c work.
Fork and Child File Descriptors
Figure 1: Effects of open()'ing a file then calling fork() : the child and parent both refer to the same open file.
dup() and dup2() System calls
Figure 2: LEFT: Effect of calling dup() to create a duplicate file descriptor table entry. RIGHT: Effect of calling dup2() to overwrite on file descriptor entry with another.
4 Problem 2 Background: The clock() Function
The code block below illustrates the basic usage pattern for the
clock() function.
#include <time.h> // for clock() and clock_t
{
clock_t begin = clock(); // current cpu moment
Perform computation that takes a while;
clock_t end = clock(); // later cpu moment
double cpu_time = // convert into seconds
((double) (end-begin)) / CLOCKS_PER_SEC;
printf("Elapsed CPU Time: %f second\n", cpu_time);
}
A few caveats are worth pointing out.
- The
clock_ttype is often a large integer type likeunsigned longwhich is why one can perform subtraction using it. Don't rely on this being the case and just use the type indicated as shown. clock()itself returns a number corresponding to the number of CPU "ticks" which have occurred while the program runs. This requires conversion into the number of seconds shown above. It makes use of theCLOCKS_PER_SECONDconstant which is included viatime.h.- The time computed by this method is equivalent to the
usertime reported by thetimeutility: it is how much CPU time the user program has used. It is relevant to timing computational loops but is complemented by "wall time" which requires use of different timing functions likegettimeofday()to compute. WARNING: Timing code runs is inherently noisy and will vary from one run to the next.
clock()is reliable for timing computations that take around 0.001 seconds (a thousandth of a second). For times shorter than that, the variations in timing will likely be nearly as large as the total time which makes timing shorter activities unreliable.Adjust program parameters like the number of loop iterations so reported times are at least 1e-03 seconds. Ideally times should be larger, in the 1e-01 second range to be trustworthy.
5 QUESTIONS.txt File Contents
Below are the contents of the QUESTIONS.txt file for the exercise.
Follow the instructions in it to complete the QUIZ and CODE questions
for the exercise.
_________________
LAB10 QUESTIONS
_________________
Exercise Instructions
=====================
Follow the instructions below to experiment with topics related to
this exercise.
- For sections marked QUIZ, fill in an (X) for the appropriate
response in this file. Use the command `make test-quiz' to see if
all of your answers are correct.
- For sections marked CODE, complete the code indicated. Use the
command `make test-code' to check if your code is complete.
- DO NOT CHANGE any parts of this file except the QUIZ sections as it
may interfere with the tests otherwise.
- If your `QUESTIONS.txt' file seems corrupted, restore it by copying
over the `QUESTIONS.txt.bk' backup file.
- When you complete the exercises, check your answers with `make test'
and if all is well, create a zip file with `make zip' and upload it
to Gradescope. Ensure that the Autograder there reflects your local
results.
- IF YOU WORK IN A GROUP only one member needs to submit and then add
the names of their group.
PROBLEM 1 QUIZ: Questions on switch_stdout.c
============================================
Analyze the `switch_stdout.c' program. Compile and run it via
,----
| > make switch_stdout
| ...
| > ./switch_stdout
| ...
`----
Analyze the code and focus your attention on the use of `open() /
dup() / dup2()' which this program demonstrates.
Answer the following Questions about the techniques used in this
program. You may need to consult the Manual Page / Documentation on
some functions to answer confidently.
Program Output
~~~~~~~~~~~~~~
Which of the following is the output for `switch_stdout' when run?
(each of 1. 2. 3. appear on separate lines in the output)
- ( ) 1. Now you see me. 2. Now you don't! 3. How mysterious...
- ( ) 1. Now you see me. 2. Now you don't!
- ( ) 1. Now you see me. 3. How mysterious...
- ( ) 1. Now you see me.
open() system call
~~~~~~~~~~~~~~~~~~
The `open()' system call is used to open a file for writing in the
example. What is returned by this system call?
- ( ) A `FILE *' which is passed to subsequent I/O operations or
`NULL' for failure
- ( ) An integer file descriptor which is >= 0 for success and -1 for
failure
- ( ) An integer return code that is 1 for success and 0 for failure
- ( ) A `char *' which is the name of the opened file or `NULL' for
failure
Use of dup()
~~~~~~~~~~~~
Which of the following best describes how the `dup()' system call is
used in `switch_stdout.c'?
- ( ) It creates a duplicate of a file descriptor allowing standard
output to be restored to the screen late in the program.
- ( ) It manipulates the file descriptor table so output that would go
to the screen goes into a file instead.
- ( ) It duplicates an existing file creating an efficient copy of it
on disk.
- ( ) It creates a child process that prints to a file instead of the
screen.
Use of dup2()
~~~~~~~~~~~~~
Which of the following best describes how the `dup2()' system call is
used in `switch_stdout.c' when it is first called?
- ( ) It creates a duplicate of a file descriptor allowing standard
output to be restored to the screen late in the program.
- ( ) It manipulates the file descriptor table so output that would go
to the screen goes into a file instead.
- ( ) It duplicates an existing file creating an efficient copy of it
on disk.
- ( ) It creates a child process that prints to a file instead of the
screen.
printf() changes in behavior
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Good old `printf()' is used in `switch_stdout.c' in several places but
seems to change its behavior in some of these spots. Which of the
following best describes this variation in behavior?
- ( ) `printf()' is called with different arguments that cause it to
print to different destinations, sometimes standard output,
sometimes a file
- ( ) `printf()' is called the same way in each case but automatically
begins printing to a file that is `open()''d and when it is
`close()''d, `printf()' reverts to printing to the screen
- ( ) `printf()' is called the same in each case and always prints to
standard output but by changing what is in the file descriptor table
at that position, output goes to the screen or to a file.
PROBLEM 1 CODE: childout_query.c Program
========================================
Fill in the template code provided in `childout_query.c'. The intent
of the program is to build off of the previous system calls like
`fork() / exec() / stat()' and combine then with the newly introduced
`dup2()' faclity to accomplish the following:
1. Specify an output file, a query character, and a command to run on
the command line
2. Fork and Exec a child process to run the given command and redirect
output into the named file
3. Have the parent wait until the child is finished
4. The parent then reports how many bytes of output are in the child's
output file
5. The parent opens the child output file, scans and prints it all
out, and reports how many times the query_letter appears in the
output
Below is a demonstration of how the program is meant to work.
,----
| >> childout_query
| usage: childout_query <childfile> <query_letter> <child_arg0> [child_arg1] ...
|
| >> childout_query test.txt 3 seq 31 35
| Parent waiting for child to complete
| Child redirecting output to 'test.txt', then exec()ing
| test-results
| Child complete, exit code 0
| child produced 15 bytes of output
| Scanning child output for '3'
| 31
| 32
| 33
| 34
| 35
| child output contained 6 occurrences of '3'
`----
Note that the 15 bytes of output is due to there being 5 lines with 3
characters each: all lines comprise two digits and a newline (\n).
Here is another example where an `ls' command is run as the child
process.
,----
| >> childout_query x.txt u ls /
| Parent waiting for child to complete
| Child redirecting output to 'x.txt', then exec()ing
| test-results
| Child complete, exit code 0
| child produced 99 bytes of output
| Scanning child output for 'u'
| bin
| boot
| dev
| etc
| home
| lib
| lib64
| lost+found
| mnt
| opt
| proc
| root
| run
| sbin
| srv
| swapfile
| sys
| tmp
| usr
| var
| child output contained 3 occurrences of 'u'
`----
PROBLEM 2: clock_demo.c Program
===============================
Demoers will walk through the `clock_demo.c' program to show how the
`clock()' function is used in practice. Students should look
carefully at the techniques used to time the two different sections of
code and print that timing info. These will be needed to fill in the
subsequent programs.
Running the `clock_demo' program on the command line will produce
results that look like the following:
,----
| >> make clock_demo
| gcc -Wall -Werror -g -Og -Wall -Werror -g -Og clock_demo.c -o clock_demo
|
| >> ./clock_demo
| usage: ./clock_demo <arrlen> <repeats>
|
| >> ./clock_demo 1000 1000
| Summing array length 1000 with 1000 repeats, ascending
| Summing array length 1000 with 1000 repeats, descending
| method: sum ascending CPU time: 2.3750e-03 sec sum: 499500
| method: sum descending CPU time: 1.5760e-03 sec sum: 499500
|
| >> ./clock_demo 100000 1000
| Summing array length 100000 with 1000 repeats, ascending
| Summing array length 100000 with 1000 repeats, descending
| method: sum ascending CPU time: 6.6969e-02 sec sum: 704982704
| method: sum descending CPU time: 6.2286e-02 sec sum: 704982704
|
| >> ./clock_demo 100000 10000
| Summing array length 100000 with 10000 repeats, ascending
| Summing array length 100000 with 10000 repeats, descending
| method: sum ascending CPU time: 6.2730e-01 sec sum: 704982704
| method: sum descending CPU time: 6.1995e-01 sec sum: 704982704
`----
PROBLEM 2 CODE: `struct_stride.c' Program
=========================================
The provided `struct_stride.c' program has a number of TODO items in
it related to timing several computations and reporting their results.
It is best to complete these items first and then attempt to answer
the quiz questions as some questions require running the program and
observing timing results.
Use the lab's description of how the `clock()' function works to
complete TODO items and print the results.
When completed, the program can be run as show below:
,----
| >> ./struct_stride
| usage: ./struct_stride <arr_length> <num_iters>
|
| >> ./struct_stride 10000000 100
| method: int_field_base CPU time: 1.2345e-01 sec sum: 0
| method: arr_field_base CPU time: 1.2345e-01 sec sum: 0
| method: int_field_optm CPU time: 1.2345e-01 sec sum: 0
| method: arr_field_optm CPU time: 1.2345e-01 sec sum: 0
`----
NOTE: the timing information has intentionally been changed to read
1.2345e-01 as calculating this timing information is part of the lab.
PROBLEM 2 QUIZ: Timing `struct_stride.c' Runs
=============================================
NOTE: timing code varies from one machine to the next. The answers
below were tested on GRACE and appear to be stable but system load may
affect the outcome.
Relative Speed of Struct Layouts
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
After adding in calls to `clock()' and code the print times, run the
`struct_strid' program.
Run the program with a large array and a modest amount of array
iterations such as using the following parameters:
,----
| ./struct_stride 6000000 100
`----
Examine the times reported.
Which option below reflects the relative speeds of the
layout/algorithm combinations?
,----
| ------SLOWEST--------------------------------------------FASTEST-----
| - ( ) arr_field_base > arr_field_optm > int_field_base > int_field_optm
| - ( ) int_field_base > int_field_optm > arr_field_base > arr_field_optm
| - ( ) arr_field_base > int_field_base > arr_field_optm > int_field_optm
| - ( ) int_field_base > arr_field_base > int_field_optm > arr_field_optm
`----
Order of Memory Access
~~~~~~~~~~~~~~~~~~~~~~
Below are several memory layouts of A/B elements to consider.
-------------------------------------------------------------------------
Byte Offset +00 +04 +08 +12 +16 +80 +84 +88 +92 +96
LAYOUT1 A0 A1 A2 A3 A4 ... B0 B1 B2 B3 B4 ...
-------------------------------------------------------------------------
-------------------------------------------------------------------
Byte Offset +00 +04 +08 +12 +16 +20 +24 +28 32 +36
LAYOUT 2 A0 B0 A1 B1 A2 B2 A3 B3 A4 B4 ...
-------------------------------------------------------------------
For each of following, indicate the best suited option.
The `int_field_base' approach code that is timed..
- ( ) Uses memory LAYOUT 1 and visit elements in the order A0, B0, A1,
B1, A2, B2, etc.
- ( ) Uses memory LAYOUT 1 and visit elements in the order A0, A1, A2,
... B0, B1, B2, etc.
- ( ) Uses memory LAYOUT 2 and visit elements in the order A0, B0, A1,
B1, A2, B2, etc.
- ( ) Uses memory LAYOUT 2 and visit elements in the order A0, A1, A2,
... B0, B1, B2, etc.
The `arr_field_base' approach code that is timed..
- ( ) Uses memory LAYOUT 1 and visit elements in the order A0, B0, A1,
B1, A2, B2, etc.
- ( ) Uses memory LAYOUT 1 and visit elements in the order A0, A1, A2,
... B0, B1, B2, etc.
- ( ) Uses memory LAYOUT 2 and visit elements in the order A0, B0, A1,
B1, A2, B2, etc.
- ( ) Uses memory LAYOUT 2 and visit elements in the order A0, A1, A2,
... B0, B1, B2, etc.
The `int_field_optm' approach code that is timed..
- ( ) Uses memory LAYOUT 1 and visit elements in the order A0, B0, A1,
B1, A2, B2, etc.
- ( ) Uses memory LAYOUT 1 and visit elements in the order A0, A1, A2,
... B0, B1, B2, etc.
- ( ) Uses memory LAYOUT 2 and visit elements in the order A0, B0, A1,
B1, A2, B2, etc.
- ( ) Uses memory LAYOUT 2 and visit elements in the order A0, A1, A2,
... B0, B1, B2, etc.
The `arr_field_optm' approach code that is timed..
- ( ) Uses memory LAYOUT 1 and visit elements in the order A0, B0, A1,
B1, A2, B2, etc.
- ( ) Uses memory LAYOUT 1 and visit elements in the order A0, A1, A2,
... B0, B1, B2, etc.
- ( ) Uses memory LAYOUT 2 and visit elements in the order A0, B0, A1,
B1, A2, B2, etc.
- ( ) Uses memory LAYOUT 2 and visit elements in the order A0, A1, A2,
... B0, B1, B2, etc.
int_field_base VS arr_field_base
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Examine the differences between the two types of structs that are used
in `struct_stride.c' called `int_field_t' and `arr_field_t'.
Now examine the first 2 code blocks that use these structs called,
`int_field_base' and `arr_field_base'. Both involve arrays and structs
which store an equal number of positive and negative
integers. However, they differ in the overall layout of those
integers. Both use loops sum the "a" numbers first then sum the "b"
numbers, then combine them for the total sum.
Which of the following are possible explanations for the timing
difference between `int_field_base' and `arr_field_base'?
- ( ) `int_field_base' must perform more loop iterations than
`arr_field_base' which will making it slower.
- ( ) `arr_field_base' uses more memory to store then number than
`int_field_base' and this additional memory increases speed.
- ( ) `int_field_base' has a memory layout that is ABABABAB so when
adding A elements, there is a "stride" through
memory. `arr_field_base' has a layout like AAAAABBBBB so adding
elements has no stride.
- ( ) `int_field_base' uses struct field access. The assembly
instructions to access array fields are slower than the assembly
instructions that access array elements. This makes `arr_field_base'
faster due to its use of plain integer arrays.
BASE vs OPTM versions
~~~~~~~~~~~~~~~~~~~~~
The last two layout/algorithm sections are labeled "optm" as they
perform a simple code transformation from their "base" version.
Select ALL of the items below that are accomplished with this
transformation.
- ( ) Fewer loop checks/increments are needed as there is only one
loop instead of 2.
- ( ) The number of loop iterations is lowered for all loops in the
optm version.
- ( ) The memory stride is eliminated for the int_field_optm as both
a/b elements are added each iteration.
- ( ) The algorithmic complexity is reduced from O(N^2) in the "base"
to O(N) in the "optm" version.
6 Submission
Follow the instructions at the end of Lab01 if you need a refresher on how to upload your completed exercise zip to Gradescope.