CMSC216 HW11: Binary I/O, Memory Mapping and pmap
- Due: 11:59pm Mon 01-Dec-2025 on Gradescope
- Approximately 0.83% of total grade
CODE DISTRIBUTION: hw11-code.zip
CHANGELOG:
- Mon Nov 24 11:10:45 AM EST 2025
- Post 637 reported broken links and a partially empty Zip file for the HW which have now been repaired. Thanks for reporting!
Table of Contents
1 Rationale
Files are often stored in "binary format" for efficiency of storage
and access. Rather than more familiar formatted text formats, these
formats require use of binary file I/O to manipulate them, frequently
low level Unix read() / write() calls. They also often require
jumping to different positions in the file which can be done via the
lseek() system call. These are explored in this HW.
A viable alternative to file I/O is to make use of memory mapped files
through mmap(). This utilizes a system call to expose files as a
pointer into operating system managed space which holds parts of the
file in main memory. While equivalent in power to standard I/O,
mmap() avoids the need for intermediate buffers and allows pointer
arithmetic to be used to locate and alter the file.
On modern computing systems, virtual memory creates the illusion that
every program has a linear address space from 0 to some large
address. Mostly this happens behind the scenes and is managed by the
operating system but knowledge of presence of virtual addresses
provides insight into many aspects of practical programming. One can
inspect some of the OS information on the virtual address space of a
program using utilities such as pmap.
Associated Reading / Preparation
Bryant and O'Hallaron Ch 10 covers basic I/O functions like read() /
write() as well as lseek(). These functions work equally as well
for text and binary data.
Bryant and O'Hallaron: Ch 9 on Virtual Memory is informative for it's
coverage virtual memory in general. The mmap() function is discussed
in section 9.8.4. The overview of virtual memory is useful to
understand the output of pmap.
Grading Policy
Credit for this HW is earned by taking the associated HW Quiz which is
linked under Gradescope. The quiz will ask similar questions as
those that are present in the QUESTIONS.txt file and those that
complete all answers in QUESTIONS.txt should have no trouble with
the quiz.
Homework and Quizzes are open resource/open collaboration. You must submit your own work but you may freely discuss HW topics with other members of the class.
See the full policies in the course syllabus.
2 Codepack
The codepack for the HW contains the following files:
| File | Description |
|---|---|
QUESTIONS.txt |
Questions to answer |
memory-parts/ |
Directory for Problem 1 |
Makefile |
Makefile to build programs for the HW |
memory_parts.c |
Problem 1 program to analyze |
gettysburg.txt |
Problem 1 data file |
binfiles-mmap/ |
Directory for Problems 2-3 |
Makefile |
Makefile to build Problem 2-3 programs |
department.h |
Header file for programs |
make_dept_directory.c |
Problem 2-3 program to create data file |
cse_depts.dat.bk |
Backup of data file created in Problem 2-3 |
print_department_read.c |
Problem 2 program to analyze |
print_department_mmap.c |
Problem 3 program to analyze |
3 Questions
Analyze the files in the provided codepack and answer the questions
given in QUESTIONS.txt.
_________________
HW 11 QUESTIONS
_________________
Write your answers to the questions below directly in this text file to
prepare for the associated quiz. Credit for the HW is earned by
completing the associated online quiz on Gradescope.
PROBLEM 1: Virtual Memory and pmap
==================================
Code for this problem is in the `memory-parts' subdirectory.
(A) memory_parts memory areas
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Examine the source code for the provided `memory-parts/memory_parts.c'
program. Identify what region of program memory you expect the
following variables to be allocated into:
- global_arr[]
- stack_arr[]
- heap_arr
(B) Running memory_parts and pmap
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Compile the `memory_parts' using the provided Makefile.
,----
| > make memory_parts
`----
Run the program and note that it prints several pieces of information
- The addresses of several of the variables allocated
- Its Process ID (PID) which is a unique number used to identify the
running program. This is an integer.
For example, the output might be
,----
| > ./memory-parts
| 0x5605a7c271e9 : main()
| 0x5605a7c2a0c0 : global_arr
| 0x7ffe5ff7d600 : stack_arr
| 0x5605a92442a0 : heap_arr
| 0x7f1fa7303000 : mmap'd file
| 0x600000000000 : mmap'd block1
| 0x600000001000 : mmap'd block2
| my pid is 8406
| press any key to continue
`----
so the programs PID is 8406
The program will also stop at this point until a key is pressed. DO
NOT PRESS A KEY YET.
Open another terminal and type the following command in that new
terminal.
,----
| > pmap THE-PID-NUMBER-THAT-WAS-PRINTED-EARLIER
`----
Paste the output of pmap below.
(C) Program Addresses vs Mapped Addresses
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
pmap prints out the virtual address space table for the program. The
leftmost column is a virtual address mapped by the OS for the program
to some physical location. The next column is the size of the area of
memory associated with that starting address. The 3rd column contains
permissions of the program has for the memory area: r for read, w for
read, x for execute. The final column is contains any identifying
information about the memory area that pmap can discern.
Compare the addresses of variables and functions from the paused
program to the output. Try to determine the virtual address space in
which each variable resides and what region of program memory that
virtual address must belong to (stack, heap, globals, text). In some
cases, the identifying information provided by pmap may make this
obvious.
(D) Min Size of Mapped Areas
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The minimum size of any virtual area of memory appears to be 4K. Why
is this the case?
(E) Additional Observations
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Notice that in addition to the "normal" variables that are mapped,
there is also an entry for the mmap()'d file 'gettysburg.txt' in the
virtual address table. The mmap() function is explored in the next
problem but note its calling sequence which involves use of a couple
system calls:
1. `open()' which is a low level file opening call which returns a
numeric file descriptor.
2. `fstat()' which obtains information such as size for an open file
based on its numeric file descriptor. The `stat() / fstat()' system
calls are used to ask the Unix Operating System information about
files such as their size, modification times, and access
permissions. This system call is studied more in Operating System
courses.
Finally there are additional calls to `mmap()' which allocate memory
to the program at a specific virtual address. Similar code to this is
often used to allocate and expand the heap area of memory for programs
in implementations of `malloc()'.
PROBLEM 2: Binary File Format w/ Read
=====================================
(A)
~~~
Compile all programs in the directory `binfiles/' with the provided
`Makefile'. Run the command
,----
| ./make_dept_directory cse_depts.dat
`----
to create the `cse_depts.dat' binary file. Examine the source code for
this program along with the header `department.h'.
- What system calls are used in `make_dept_directory.c' to create this
file?
- How is the `sizeof()' operator used to simplify some of the
computations in `make_dept_directory.c'?
- What data is in `cse_depts.dat' and how is it ordered?
(B)
~~~
Run the `print_department_read' program which takes a binary data file
and a department code to print. Show a few examples of running this
program with the valid command line arguments. Include in your demo
runs that
- Use the `cse_depts.dat' with known and unknown department codes
- Use a file other than `cse_depts.dat'
(C)
~~~
Study the source code for `print_department_read' and describe how it
initially prints the table of offsets shown below.
,----
| Dept Name: CS Offset: 104
| Dept Name: EE Offset: 2152
| Dept Name: IT Offset: 3688
`----
What specific sequence of calls leads to this information?
(D)
~~~
What system call is used to skip immediately to the location in the
file where desired contacts are located? What arguments does this
system call take? Consult the manual entry for this function to find
out how else it can be used.
PROBLEM 3: mmap() and binary files
==================================
An alternative to using standard I/O functions is "memory mapped"
files through the system call `mmap()'. The program
`print_department_mmap.c' provides the functionality as the previous
`print_department_read.c' but uses a different mechanism.
(A)
~~~
Early in `print_department_mmap.c' an `open()' call is used as in the
previous program but it is followed shortly by a call to `mmap()' in
the lines
,----
| char *file_bytes =
| mmap(NULL, size, PROT_READ, MAP_SHARED,
| fd, 0);
`----
Look up reference documentation on `mmap()' and describe some of the
arguments to it including the `NULL' and `size' arguments. Also
describe its return value.
(B)
~~~
The initial setup of the program uses `mmap()' to assign a pointer to
variable `char *file_bytes'. This pointer will refer directly to the
bytes of the binary file.
Examine the lines
,----
| ////////////////////////////////////////////////////////////////////////////////
| // CHECK the file_header_t struct for integrity, size of department array
| file_header_t *header = (file_header_t *) file_bytes; // binary header struct is first thing in the file
`----
Explain what is happening here: what value will the variable `header'
get and how is it used in subsequent lines.
(C)
~~~
After finishing with the file header, the next section of the program
begins with the following.
,----
| ////////////////////////////////////////////////////////////////////////////////
| // SEARCH the array of department offsets for the department named
| // on the command line
|
| dept_offset_t *offsets = // after file header, array of dept_offset_t structures
| (dept_offset_t *) (file_bytes + sizeof(file_header_t));
|
`----
Explain what value the `offsets_arr' variable is assigned and how it
is used in the remainder of the SEARCH section.
(D)
~~~
The final phase of the program begins below
,----
| ////////////////////////////////////////////////////////////////////////////////
| // PRINT out all personnel in the specified department
| ...
| contact_t *dept_contacts = (contact_t *) (file_bytes + offset);
`----
Describe what value `dept_contacts' is assigned and how the final
phase uses it.