CSCI 4061 HW07: Recursive Directory Traversal / pmap
- Due: 11:59pm Thu 6/9/2020
- Approximately 0.83% of total grade
- Homework and Quizzes are open resource/open collaboration. You must submit your own work but you may freely discuss HW topics with other members of the class.
CODE DISTRIBUTION: hw07-code.zip
CHANGELOG:
- Wed Mar 10 11:33:39 AM CST 2021
- Some minor errors such as a missing Makefile for
memory_parts
and a mis-numbered problem have been corrected.
1 Rationale
A variety of applications require code to traverse an entire directory
structure visiting all files present in to perform some operation.
At the shell level, utilities like find
are good at doing this. For
programs requiring lower-level interactions, such as version control
software (e.g. git, svn, cvs), the C and system call interfaces to
traversing directories are usually used. As a tree-like structure,
code that traverses directories will almost always have a recursive
character to it whether explicit or implicit.
This HW explores code to visit all files an a directory hierarchy
using standard system calls. The nftw()
library call which encodes
this functionality is also examined. Completing the HW will inform on
how to accomplish this task without breaking the brain.
In addition, the pmap
utility is useful to show portions of the page
table associated with a running program and is explored in one
problem.
Associated Reading / Preparation
Stevens/Rago Ch 4.20-4.23 discuss functions like opendir() /
readdir()
in the context of traversing a directory hierarchy. They
allude to the nftw()
function and demonstrate their own
implementation of it.
Consult the manual page on pmap
for more information on it.
Grading Policy
Credit for this HW is earned by taking the associate Quiz which is
linked under Gradescope
. The quiz will ask similar questions as
those that are present in the QUESTIONS.txt
file and those that
complete all answers in QUESTIONS.txt
should have no trouble with
the quiz.
See the full policy in the syllabus.
2 Codepack
The codepack contains the following files:
File | Description |
---|---|
QUESTIONS.txt |
Questions to answer |
nftw/ |
Directory for Problems 1-3 |
Makefile |
Run make to set up the required directory structure and build programs for this HW |
recursive_listing.c |
Problem 1 code to analyze |
print_file_info.c |
Problem 2 code to analyze |
nftw_listing.c |
Problem 3 code to analyze |
qsort_demo.c |
Optional file demonstrating qsort() function |
pmap-memory-parts/ |
Directory for Problem 4 |
memory_parts.c |
Problem 4 code to analyze |
3 What to Understand
Ensure that you understand
- How to perform a recursive traversal of data in a hierarchy using system calls
- Why it is usually not necessary to write your own recursive traversal code as library functions often provide this functionality if you know how to access it.
- How to determine types and other data about files with the
stat() / lstat()
calls along with associated functions/macros like - The type of information that
pmap
displays and what OS data structures it corresponds to
4 Questions
Analyze the files in the provided codepack and answer the questions
given in QUESTIONS.txt
.
_________________ HW 07 QUESTIONS _________________ - Name: (FILL THIS in) - NetID: (THE kauf0095 IN kauf0095@umn.edu) Write your answers to the questions below directly in this text file. HW quiz questions will be related to the questions in this file. Overview ======== Typing `make' will now create the subdirectory `subdir-small' which is used in recursive traversals. Compile and run the programs in the provide directory. A Makefile is present to ease this task. After compiling, experiment with the compiled programs as shown below and note the results they print. ,---- | > make | chmod u+x make-subdir-small.sh | ./make-subdir-small.sh | gcc -Wall -g -c nftw_listing.c | gcc -Wall -g -c print_file_info.c | gcc -Wall -g -o nftw_listing nftw_listing.o print_file_info.o | gcc -Wall -g -c recursive_listing.c | gcc -Wall -g -o recursive_listing recursive_listing.o print_file_info.o | | > ./recursive_listing | T MTIME BYTES FILENAME | = ======================== ======== ================ | D Fri Mar 27 14:32:34 2020 4096 . | F Thu Mar 26 14:29:47 2020 21912 ./nftw_demo | ... | | > ./recursive_listing recursive_listing.c | ... | | > ./recursive_listing subdir-small | ... | | > ./nftw_listing | ... | > ./nftw_listing recursive_listing.c | ... | > ./nftw_listing subdir-small | ... `---- The bulk of this problem is to examine the provided code to determine how library and system calls are used to carry out their task: listing the contents of an entire directory hierarchy. PROBLEM 1: Recursive Directory Traversal ======================================== Examine the code in `recursive_listing.c'. A number of system calls are used within it which are worth studying. (A) lstat() ~~~~~~~~~~~ Near the top, a call to `lstat()' is made. As the name implies, this call is related to the `stat()' system call. Describe what these two do and how they differ. You man need to consult the manual page for them which is obtained via `man 2 stat' (section 2 for system calls). (B) opendir() / readdir() ~~~~~~~~~~~~~~~~~~~~~~~~~ Later in the code, the system calls `opendir() / readdir()' are employed. Describe what these system calls do and how they are employed in the code. Also mention a set of files that should be "ignored" and how this is accomplished. (C) Base and Recursive Cases ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ As the name of the file and constituent functions implies, `recursive_listing.c' takes a recursive approach. Some actions are performed for all situations in `recursive_print()'. State these. Then briefly describe what the Base and Recursive cases are for this code are and how they work together to list all files in a directory tree. PROBLEM 2: Printing File Info ============================= Examine the `print_file_info.c' file which contains the function used in `recursive_listing.c' to print information on files. (A) File Kinds ~~~~~~~~~~~~~~ Describe the technique that is used in `print_file_info()' to determine the kind of file (regular, directory, symlink, etc.). - Where is this information stored? - How can one set a variable to a character like 'D' if the file is a directory? Note that there are two control mechanisms shown to demonstrate determining file kinds, one is used while the other is commented. Both work similarly. (B) Time Printing ~~~~~~~~~~~~~~~~~ From our previous discussion of the `stat()' system call, recall that it produces several pieces of time information about files. One of these is the Last Modification Time (`mtime'). This field is stored in a in `sb->st_mtime' but is in binary format. What function is used to produce a human-readable version of `mtime' and what is an annoyning "gotchya" associated with this formatting function which is demonstrated `print_file_info()'? PROBLEM 3: nftw() Library Call ============================== Coders have been known to "reinvent the wheel" at times, often due to their being unaware of pre-established alternatives. The code in `recursive_listng.c' is simple enough but has a number of subtle issues that can be crippling: what if it runs out of file descriptors? how should it behave if the files in the directory are changing? what if a different operation than print/total are desired? To rectify these issues, most modern Unix systems have a library function that performs a "File Tree Walk" recursively visiting every file in a directory structure and allowing arbitrary actions to be performed on it. One such function available on many Unixes is `nftw()' with the `n' indicating a limit on the total file descriptors used during the file tree walk. This problem explores this interesting and useful function. (A) nftw_listing.c ~~~~~~~~~~~~~~~~~~ Examine the `nftw_listing.c' program. Describe the similarities and differences you see in this program to `recursive_listing.c'. Describe what you infer to be the purpose of the `nftw()' function. (B) Arguments to nftw() ~~~~~~~~~~~~~~~~~~~~~~~ One extremely interesting aspect of the `nftw()' function is 2nd argument. - Describe what is unusual about this argument and relate it to how the `print_file_info()' function is written. You may wish to consult `man nftw'. - If one wanted to do something other than print file information (such as just sum up the total files visited), how could one do this with `nftw()'? (C) Additional Information on nftw() ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Several alternatives to `nftw()' exist including the older `ftw()' which is a simplified version of it. Some systems (including Linux) feature a `fts()' function which provides an interface similar to `opendir() / readdir()' but which recursively visits an entire directory tree rather than a single directory. While C is not widely known for it Functional Programming facilities, there are several places where functions are passed as arguments. The C syntax for this is often confusing compared to functional languages like Ocaml and Lisp, but it is usually worth the effort as higher-order functions that take other functions as arguments can provide tremendous flexibility. Aside from `nftw()' taking a function to perform on files, a much more common application of higher-order functions in C is the standard `qsort()' (quick sort) function. It sorts arbitrary types of data using a "comparison" function associated with the function. A demonstration appears of this appears in `qsort_demo.c' and is worth examining. PROBLEM 4: Virtual Memory and pmap ================================== (A) memory_parts memory areas ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Examine the source code for the provided `memory_parts.c' program. Identify what region of program memory you expect the following variables to be allocated into: - global_arr[] - local_arr[] - malloc_arr (B) Running memory_parts and pmap ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Compile the `memory_parts' using the provided Makefile. ,---- | > make memory_parts `---- Run the program and note that it prints several pieces of information - The addresses of several of the variables allocated - Its Process ID (PID) which is a unique number used to identify the running program. This is an integer. For example, the output might be ,---- | > ./memory-parts | 0x5605a7c271e9 : main() | 0x5605a7c2a0c0 : global_arr | 0x7ffe5ff7d600 : local_arr | 0x5605a92442a0 : malloc_arr | 0x7f1fa7303000 : mmap'd file | 0x600000000000 : mmap'd block1 | 0x600000001000 : mmap'd block2 | my pid is 8406 | press any key to continue `---- so the programs PID is 11160 The program will also stop at this point until a key is pressed. DO NOT PRESS A KEY YET. Open another terminal and type the following command in that new terminal. ,---- | > pmap THE-PID-NUMBER-THAT-WAS-PRINTED-EARLIER `---- Paste the output of pmap below. (C) Program Addresses vs Mapped Addresses ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ pmap prints out the virtual address space table for the program. The leftmost column is a virtual address mapped by the OS for the program to some physical location. The next column is the size of the area of memory associated with that starting address. The 3rd column contains permissions of the program has for the memory area: r for read, w for read, x for execute. The final column is contains any identifying information about the memory area that pmap can discern. Compare the addresses of variables and functions from the paused program to the output. Try to determine the virtual address space in which each variable resides and what region of program memory that virtual address must belong to (stack, heap, globals, text). In some cases, the identifying information provided by pmap may make this obvious. (D) Min Size of Mapped Areas ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The minimum size of any virtual area of memory appears to be 4K. Why is this the case? (E) Additional Observations ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Notice that in addition to the "normal" variables that are mapped, there is also an entry for the mmap()'d file 'gettysburg.txt' in the virtual address table. The mmap() function is explored in the next problem but note its calling sequence which involves use of a couple system calls: 1. `open()' which is a low level file opening call which returns a numeric file descriptor. 2. `fstat()' which obtains information such as size for an open file based on its numeric file descriptor. The `stat()' system call was explored earlier in the class and does the same thing provided the name of a file. There are additional calls to mmap() which allocate memory to the program at a specific virtual address. Similar code to this is often used to allocate and expand the heap area of memory for programs in implementations of `malloc()'. Some Unix implementations, Linux among them, include a `mremap()' function. This function is related to `mmap()' as `realloc()' is related to `malloc()' - `mmap()' makes requests to map virtual memory into the address space of the program, possibly associated with a file but sometimes just DRAM. A specific size is specified for the mapping which remains fixed independent of any underlying file. - `mremap()' allows the size of the virtual memory mapping to be changed, often to expand it if needed. This is useful in cases where a file has been `mmap()''d but changes size: `mremap()' can be used to grow the memory mapping to account for additions to the file. `mremap()' may fail if there is insufficient space to grow a specific region of mapping. However, additional options can be passed to have the mapping moved to a new location with sufficient size in a similar way to `realloc()'. Like `realloc()' though, this movement invalidates all existing pointers to and into that area of memory.