CSCI 4061 HW07: Recursive Directory Traversal / pmap
- Due: 11:59pm Thu 6/9/2020
- Approximately 0.83% of total grade
- Homework and Quizzes are open resource/open collaboration. You must submit your own work but you may freely discuss HW topics with other members of the class.
CODE DISTRIBUTION: hw07-code.zip
CHANGELOG:
- Wed Mar 10 11:33:39 AM CST 2021
- Some minor errors such as a missing Makefile for
memory_partsand a mis-numbered problem have been corrected.
1 Rationale
A variety of applications require code to traverse an entire directory
structure visiting all files present in to perform some operation.
At the shell level, utilities like find are good at doing this. For
programs requiring lower-level interactions, such as version control
software (e.g. git, svn, cvs), the C and system call interfaces to
traversing directories are usually used. As a tree-like structure,
code that traverses directories will almost always have a recursive
character to it whether explicit or implicit.
This HW explores code to visit all files an a directory hierarchy
using standard system calls. The nftw() library call which encodes
this functionality is also examined. Completing the HW will inform on
how to accomplish this task without breaking the brain.
In addition, the pmap utility is useful to show portions of the page
table associated with a running program and is explored in one
problem.
Associated Reading / Preparation
Stevens/Rago Ch 4.20-4.23 discuss functions like opendir() /
readdir() in the context of traversing a directory hierarchy. They
allude to the nftw() function and demonstrate their own
implementation of it.
Consult the manual page on pmap for more information on it.
Grading Policy
Credit for this HW is earned by taking the associate Quiz which is
linked under Gradescope. The quiz will ask similar questions as
those that are present in the QUESTIONS.txt file and those that
complete all answers in QUESTIONS.txt should have no trouble with
the quiz.
See the full policy in the syllabus.
2 Codepack
The codepack contains the following files:
| File | Description |
|---|---|
QUESTIONS.txt |
Questions to answer |
nftw/ |
Directory for Problems 1-3 |
Makefile |
Run make to set up the required directory structure and build programs for this HW |
recursive_listing.c |
Problem 1 code to analyze |
print_file_info.c |
Problem 2 code to analyze |
nftw_listing.c |
Problem 3 code to analyze |
qsort_demo.c |
Optional file demonstrating qsort() function |
pmap-memory-parts/ |
Directory for Problem 4 |
memory_parts.c |
Problem 4 code to analyze |
3 What to Understand
Ensure that you understand
- How to perform a recursive traversal of data in a hierarchy using system calls
- Why it is usually not necessary to write your own recursive traversal code as library functions often provide this functionality if you know how to access it.
- How to determine types and other data about files with the
stat() / lstat()calls along with associated functions/macros like - The type of information that
pmapdisplays and what OS data structures it corresponds to
4 Questions
Analyze the files in the provided codepack and answer the questions
given in QUESTIONS.txt.
_________________
HW 07 QUESTIONS
_________________
- Name: (FILL THIS in)
- NetID: (THE kauf0095 IN kauf0095@umn.edu)
Write your answers to the questions below directly in this text file.
HW quiz questions will be related to the questions in this file.
Overview
========
Typing `make' will now create the subdirectory `subdir-small' which is
used in recursive traversals.
Compile and run the programs in the provide directory. A Makefile is
present to ease this task. After compiling, experiment with the
compiled programs as shown below and note the results they print.
,----
| > make
| chmod u+x make-subdir-small.sh
| ./make-subdir-small.sh
| gcc -Wall -g -c nftw_listing.c
| gcc -Wall -g -c print_file_info.c
| gcc -Wall -g -o nftw_listing nftw_listing.o print_file_info.o
| gcc -Wall -g -c recursive_listing.c
| gcc -Wall -g -o recursive_listing recursive_listing.o print_file_info.o
|
| > ./recursive_listing
| T MTIME BYTES FILENAME
| = ======================== ======== ================
| D Fri Mar 27 14:32:34 2020 4096 .
| F Thu Mar 26 14:29:47 2020 21912 ./nftw_demo
| ...
|
| > ./recursive_listing recursive_listing.c
| ...
|
| > ./recursive_listing subdir-small
| ...
|
| > ./nftw_listing
| ...
| > ./nftw_listing recursive_listing.c
| ...
| > ./nftw_listing subdir-small
| ...
`----
The bulk of this problem is to examine the provided code to determine
how library and system calls are used to carry out their task: listing
the contents of an entire directory hierarchy.
PROBLEM 1: Recursive Directory Traversal
========================================
Examine the code in `recursive_listing.c'. A number of system calls
are used within it which are worth studying.
(A) lstat()
~~~~~~~~~~~
Near the top, a call to `lstat()' is made. As the name implies, this
call is related to the `stat()' system call. Describe what these two
do and how they differ. You man need to consult the manual page for
them which is obtained via `man 2 stat' (section 2 for system calls).
(B) opendir() / readdir()
~~~~~~~~~~~~~~~~~~~~~~~~~
Later in the code, the system calls `opendir() / readdir()' are
employed. Describe what these system calls do and how they are
employed in the code. Also mention a set of files that should be
"ignored" and how this is accomplished.
(C) Base and Recursive Cases
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
As the name of the file and constituent functions implies,
`recursive_listing.c' takes a recursive approach. Some actions are
performed for all situations in `recursive_print()'. State these. Then
briefly describe what the Base and Recursive cases are for this code
are and how they work together to list all files in a directory tree.
PROBLEM 2: Printing File Info
=============================
Examine the `print_file_info.c' file which contains the function used
in `recursive_listing.c' to print information on files.
(A) File Kinds
~~~~~~~~~~~~~~
Describe the technique that is used in `print_file_info()' to
determine the kind of file (regular, directory, symlink, etc.).
- Where is this information stored?
- How can one set a variable to a character like 'D' if the file is a
directory?
Note that there are two control mechanisms shown to demonstrate
determining file kinds, one is used while the other is commented. Both
work similarly.
(B) Time Printing
~~~~~~~~~~~~~~~~~
From our previous discussion of the `stat()' system call, recall that
it produces several pieces of time information about files. One of
these is the Last Modification Time (`mtime'). This field is stored
in a in `sb->st_mtime' but is in binary format.
What function is used to produce a human-readable version of `mtime'
and what is an annoyning "gotchya" associated with this formatting
function which is demonstrated `print_file_info()'?
PROBLEM 3: nftw() Library Call
==============================
Coders have been known to "reinvent the wheel" at times, often due to
their being unaware of pre-established alternatives. The code in
`recursive_listng.c' is simple enough but has a number of subtle
issues that can be crippling: what if it runs out of file descriptors?
how should it behave if the files in the directory are changing? what
if a different operation than print/total are desired?
To rectify these issues, most modern Unix systems have a library
function that performs a "File Tree Walk" recursively visiting every
file in a directory structure and allowing arbitrary actions to be
performed on it. One such function available on many Unixes is
`nftw()' with the `n' indicating a limit on the total file descriptors
used during the file tree walk.
This problem explores this interesting and useful function.
(A) nftw_listing.c
~~~~~~~~~~~~~~~~~~
Examine the `nftw_listing.c' program. Describe the similarities and
differences you see in this program to `recursive_listing.c'.
Describe what you infer to be the purpose of the `nftw()' function.
(B) Arguments to nftw()
~~~~~~~~~~~~~~~~~~~~~~~
One extremely interesting aspect of the `nftw()' function is 2nd
argument.
- Describe what is unusual about this argument and relate it to how
the `print_file_info()' function is written. You may wish to
consult `man nftw'.
- If one wanted to do something other than print file information
(such as just sum up the total files visited), how could one do this
with `nftw()'?
(C) Additional Information on nftw()
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Several alternatives to `nftw()' exist including the older `ftw()'
which is a simplified version of it. Some systems (including Linux)
feature a `fts()' function which provides an interface similar to
`opendir() / readdir()' but which recursively visits an entire
directory tree rather than a single directory.
While C is not widely known for it Functional Programming facilities,
there are several places where functions are passed as arguments. The
C syntax for this is often confusing compared to functional languages
like Ocaml and Lisp, but it is usually worth the effort as
higher-order functions that take other functions as arguments can
provide tremendous flexibility.
Aside from `nftw()' taking a function to perform on files, a much more
common application of higher-order functions in C is the standard
`qsort()' (quick sort) function. It sorts arbitrary types of data
using a "comparison" function associated with the function. A
demonstration appears of this appears in `qsort_demo.c' and is worth
examining.
PROBLEM 4: Virtual Memory and pmap
==================================
(A) memory_parts memory areas
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Examine the source code for the provided `memory_parts.c'
program. Identify what region of program memory you expect the
following variables to be allocated into:
- global_arr[]
- local_arr[]
- malloc_arr
(B) Running memory_parts and pmap
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Compile the `memory_parts' using the provided Makefile.
,----
| > make memory_parts
`----
Run the program and note that it prints several pieces of information
- The addresses of several of the variables allocated
- Its Process ID (PID) which is a unique number used to identify the
running program. This is an integer.
For example, the output might be
,----
| > ./memory-parts
| 0x5605a7c271e9 : main()
| 0x5605a7c2a0c0 : global_arr
| 0x7ffe5ff7d600 : local_arr
| 0x5605a92442a0 : malloc_arr
| 0x7f1fa7303000 : mmap'd file
| 0x600000000000 : mmap'd block1
| 0x600000001000 : mmap'd block2
| my pid is 8406
| press any key to continue
`----
so the programs PID is 11160
The program will also stop at this point until a key is pressed. DO
NOT PRESS A KEY YET.
Open another terminal and type the following command in that new
terminal.
,----
| > pmap THE-PID-NUMBER-THAT-WAS-PRINTED-EARLIER
`----
Paste the output of pmap below.
(C) Program Addresses vs Mapped Addresses
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
pmap prints out the virtual address space table for the program. The
leftmost column is a virtual address mapped by the OS for the program
to some physical location. The next column is the size of the area of
memory associated with that starting address. The 3rd column contains
permissions of the program has for the memory area: r for read, w for
read, x for execute. The final column is contains any identifying
information about the memory area that pmap can discern.
Compare the addresses of variables and functions from the paused
program to the output. Try to determine the virtual address space in
which each variable resides and what region of program memory that
virtual address must belong to (stack, heap, globals, text). In some
cases, the identifying information provided by pmap may make this
obvious.
(D) Min Size of Mapped Areas
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The minimum size of any virtual area of memory appears to be 4K. Why
is this the case?
(E) Additional Observations
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Notice that in addition to the "normal" variables that are mapped,
there is also an entry for the mmap()'d file 'gettysburg.txt' in the
virtual address table. The mmap() function is explored in the next
problem but note its calling sequence which involves use of a couple
system calls:
1. `open()' which is a low level file opening call which returns a
numeric file descriptor.
2. `fstat()' which obtains information such as size for an open file
based on its numeric file descriptor. The `stat()' system call was
explored earlier in the class and does the same thing provided the
name of a file.
There are additional calls to mmap() which allocate memory to the
program at a specific virtual address. Similar code to this is often
used to allocate and expand the heap area of memory for programs in
implementations of `malloc()'.
Some Unix implementations, Linux among them, include a `mremap()'
function. This function is related to `mmap()' as `realloc()' is
related to `malloc()'
- `mmap()' makes requests to map virtual memory into the address space
of the program, possibly associated with a file but sometimes just
DRAM. A specific size is specified for the mapping which remains
fixed independent of any underlying file.
- `mremap()' allows the size of the virtual memory mapping to be
changed, often to expand it if needed. This is useful in cases where
a file has been `mmap()''d but changes size: `mremap()' can be used
to grow the memory mapping to account for additions to the file.
`mremap()' may fail if there is insufficient space to grow a
specific region of mapping. However, additional options can be
passed to have the mapping moved to a new location with sufficient
size in a similar way to `realloc()'. Like `realloc()' though, this
movement invalidates all existing pointers to and into that area of
memory.