CMSC216 Final Exam Review Exercise
CODE DISTRIBUTION: hw-review-final-code.zip
- Download the code distribution every HW
- See further setup instructions below
CHANGELOG: None
1 Review Problem
File Manifest
FILE | Notes |
---|---|
star_search.c |
Code to review / alter |
Makefile |
Makefile |
star.h |
Header with data types / includes |
stars_close.dat |
Binary data file used by program |
star_table.txt |
Text version of the binary data |
star_gen.c |
C program to generate binary data |
star_search_commented.c |
Solutions to code review |
star_search_stat.c |
|
star_search_read0.c |
|
star_search_mmap.c |
Overview
Nosfew Syscalls is prototyping some code to read a large database of
information on stars. The eventual code will need to run on a large
database where information on stars is stored in a star_t
struct in
binary format. The provided stars_close.dat
file has a small
collection of these stored in binary as an array of structs which is
used for testing. Nosfew has finished the testing version which can be
run as follows:
>> make gcc -Wall -g -Wno-unused-variable -c star_gen.c gcc -Wall -g -Wno-unused-variable -o star_gen star_gen.o gcc -Wall -g -Wno-unused-variable -c star_search.c gcc -Wall -g -Wno-unused-variable -o star_search star_search.o >> ./star_search usage: ./star_search <star_class> <filename> >> ./star_search G2 stars_close.dat 6032 bytes in file stars_close.dat | Alpha Centauri A | G2 | 4.4000 | 1 stars matched >> ./star_search M5 stars_close.dat 6032 bytes in file stars_close.dat | Barnard's Star | M5 | 5.9400 | | L 789-6 B | M5 | 11.0800 | | Struve 2398 B | M5 | 11.6000 | | L 725-32 | M5 | 12.1200 | | Ross 780 | M5 | 15.3400 | | LP 816-60 | M5 | 17.9100 | 6 stars matched >> ./star_search M3.5 stars_close.dat 6032 bytes in file stars_close.dat | Luyten's Star | M3.5 | 12.3900 | | Wolf 1061 | M3.5 | 13.9100 | | BD+68 946 | M3.5 | 14.7700 | | CD-44 11909 | M3.5 | 16.4500 | | LP 656-38 | M3.5 | 17.8500 | | L 205-128 | M3.5 | 18.9500 | | Wolf 1055 A | M3.5 | 19.1600 | | CD-36 13940 B | M3.5 | 19.7400 | 8 stars matched
The code itself is in the file star_search.c
1: // star_search.c: search stars 2: 3: #include "star.h" 4: 5: int main(int argc, char *argv[]){ 6: if(argc < 3){ 7: printf("usage: %s <star_class> <filename>\n",argv[0]); 8: return 1; 9: } 10: 11: char *star_class = argv[1]; 12: char *star_file = argv[2]; 13: 14: // find size of the data 15: int pid = fork(); 16: if(pid == 0){ 17: int outfd = open("statout.txt", O_WRONLY|O_TRUNC|O_CREAT, S_IRUSR|S_IWUSR); 18: if(outfd == -1){ 19: perror("open() failed"); 20: exit(1); 21: } 22: int ret = dup2(outfd, STDOUT_FILENO); 23: if(ret == -1){ 24: perror("dup2() failed"); 25: exit(1); 26: } 27: char *argv[] = {"stat", star_file, NULL}; 28: execvp(argv[0], argv); 29: perror("exec() failed"); 30: exit(EXIT_FAILURE); 31: } 32: 33: int status; 34: int ret = waitpid(pid, &status, 0); 35: if( !WIFEXITED(status) || WEXITSTATUS(status)!=0 ){ 36: printf("Couldn't get the size of %s\n",star_file); 37: return EXIT_FAILURE; 38: } 39: 40: // get size 41: FILE *statfile = fopen("statout.txt","r"); 42: char buf[128], *res; 43: res = fgets(buf, 128, statfile); 44: res = fgets(buf, 128, statfile); 45: if(res == NULL){ 46: printf("I/O Failed\n"); 47: fclose(statfile); 48: return EXIT_FAILURE; 49: } 50: int size; 51: sscanf(buf, " Size: %d", &size); 52: fclose(statfile); 53: printf("%d bytes in file %s\n",size,star_file); 54: 55: // get stars 56: int starfd = open(star_file, O_RDONLY); 57: if(starfd == -1){ 58: perror("open() on starfile failed"); 59: return EXIT_FAILURE; 60: } 61: int total_bytes = 0; 62: int hits = 0; 63: star_t cur_star; 64: while(total_bytes < size){ 65: int nbytes = read(starfd, &cur_star, sizeof(star_t)); 66: if( strcmp(cur_star.star_class, star_class)==0 ){ 67: printf("| %20s | %8s | %12.4f |\n", 68: cur_star.name, cur_star.star_class, cur_star.dist); 69: hits++; 70: } 71: total_bytes += nbytes; 72: } 73: close(starfd); 74: printf("%d stars matched\n",hits); 75: return 0; 76: }
Code Review
You have been tasked to do a code review for Nosfew to provide feedback and suggestions for changes which will improve the quality and maintainability of his code. Below are a series of items to address in the code review.
Remember: you are mentoring Nosfew so be gentle when offering him feedback. Beginners who are encouraged with constructive comments will be open to more coaching; discouraging criticism tends to drive folks away from seeking feedback.
(A) Commenting
Nosfew has a typical beginner tendency to comment little in his code. Add your own comments to elaborate on the different parts of the code and their intended purpose. Be a little more verbose than you might be with peer programmers to set a concrete example for Nosfew about informative commenting practices.
If you aren't sure how some of the parts of the program work, discuss with your colleagues to determine what's going on.
SOLUTION
Click for Solution
Below is one possible version of the code which includes more commentary to explain what's going to improve maintainability.
1: // star_search_commented.c: more information commenting on the 2: // progression of the computation. 3: 4: #include "star.h" 5: 6: int main(int argc, char *argv[]){ 7: if(argc < 3){ 8: printf("usage: %s <star_class> <filename>\n",argv[0]); 9: return 1; 10: } 11: 12: char *star_class = argv[1]; 13: char *star_file = argv[2]; 14: 15: // find size of the data 16: // 17: // Use a child process that exec() the "stat" command line utility 18: // get the size of the data file in bytes; uses I/O redirection to 19: // channel output into a file which will be parsed by the parent 20: // process. 21: int pid = fork(); 22: 23: if(pid == 0){ // CHILD CODE ONLY 24: int outfd = open("statout.txt", O_WRONLY|O_TRUNC|O_CREAT, S_IRUSR|S_IWUSR); 25: if(outfd == -1){ 26: perror("open() failed"); 27: exit(1); 28: } 29: int ret = dup2(outfd, STDOUT_FILENO); // redirect output into the file 30: if(ret == -1){ 31: perror("dup2() failed"); 32: exit(1); 33: } 34: char *argv[] = {"stat", star_file, NULL}; // exec the stat command 35: execvp(argv[0], argv); // child will exit here or 36: perror("exec() failed"); // on a fail, bail out 37: exit(EXIT_FAILURE); 38: } 39: 40: int status; // PARENT CODE ONLY 41: int ret = waitpid(pid, &status, 0); // wait until child is finished 42: if( !WIFEXITED(status) || WEXITSTATUS(status)!=0 ){ 43: printf("Couldn't get the size of %s\n",star_file); 44: return EXIT_FAILURE; 45: } 46: 47: // get size 48: // 49: // extract the size in bytes of star_file from the text output of 50: // stat; it's on the 2nd line after the word "Size:" 51: FILE *statfile = fopen("statout.txt","r"); // open the output file the child created 52: char buf[128], *res; 53: res = fgets(buf, 128, statfile); // skip the first line 54: res = fgets(buf, 128, statfile); // get the next line which will be " Size: <bytes> ..." 55: if(res == NULL){ 56: printf("I/O Failed\n"); 57: fclose(statfile); 58: return EXIT_FAILURE; 59: } 60: int size; 61: sscanf(buf, " Size: %d", &size); // get the size data from this line 62: fclose(statfile); // done with file 63: printf("%d bytes in file %s\n",size,star_file); // report bytes for ease of debugging 64: 65: // get stars 66: // 67: // open and process the main stars file until the end of the file; 68: // use the size in bytes obtained above to terminate the input loop 69: int starfd = open(star_file, O_RDONLY); // open original file which contains the star data 70: if(starfd == -1){ 71: perror("open() on starfile failed"); 72: return EXIT_FAILURE; 73: } 74: int total_bytes = 0; // total bytes procesed from the file 75: int hits = 0; // number of planets matching the star_type requested 76: star_t cur_star; // current star read from file 77: while(total_bytes < size){ 78: int nbytes = read(starfd, &cur_star, sizeof(star_t)); 79: if( strcmp(cur_star.star_class, star_class)==0 ){ // check for a matching star_type 80: printf("| %20s | %8s | %12.4f |\n", 81: cur_star.name, cur_star.star_class, cur_star.dist); 82: hits++; 83: } 84: total_bytes += nbytes; // tack on the total bytes as progress is made 85: } 86: close(starfd); 87: printf("%d stars matched\n",hits); 88: return 0; 89: }
(B) Size Determination
Nosfew had an irregular education concerning UNIX system calls so didn't get exposed to some of the important ones that you've had the benefit of learning. He cleverly worked around a gap in his knowledge on how to find the size in bytes of a file. Explain how he did this and also offer him guidance on what he might do instead in the future to more directly and efficiently determine the size of a file if needed. If you are feeling particularly benevolent, write revision of the program to demonstrate this technique.
SOLUTION
Click for Solution
- Nosfew used
fork()/exec()
to run the command linestat
utility in a child process. That reports file statistics about the input file which he redirected into an output file which the parent process used to extract the size. - Nosfew could avoid all this by using the
stat() / fstat()
system call which directly provides information on the file such as its size in bytes. This cuts out entirely the need to start a child process and parse any output from it significantly reducing the code for the program.
1: // star_search_stat.c: demonstrate use of the stat() system call to 2: // easily get the size of the input file. 3: 4: #include "star.h" 5: 6: int main(int argc, char *argv[]){ 7: if(argc < 3){ 8: printf("usage: %s <star_class> <filename>\n",argv[0]); 9: return 1; 10: } 11: 12: char *star_class = argv[1]; 13: char *star_file = argv[2]; 14: 15: // find size of the data 16: // 17: // Use a call to stat() to get the size of the star_file in bytes 18: struct stat statbuf; 19: int ret = stat(star_file, &statbuf); 20: if(ret == -1){ 21: perror("stat() failed"); 22: return EXIT_FAILURE; 23: } 24: int size = statbuf.st_size; 25: printf("%d bytes in file %s\n",size,star_file); // report bytes for ease of debugging 26: 27: // get stars 28: // 29: // open and process the main stars file until the end of the file; 30: // use the size in bytes obtained above to terminate the input loop 31: int starfd = open(star_file, O_RDONLY); // open original file which contains the star data 32: if(starfd == -1){ 33: perror("open() on starfile failed"); 34: return EXIT_FAILURE; 35: } 36: int total_bytes = 0; // total bytes procesed from the file 37: int hits = 0; // number of planets matching the star_type requested 38: star_t cur_star; // current star read from file 39: while(total_bytes < size){ 40: int nbytes = read(starfd, &cur_star, sizeof(star_t)); 41: if( strcmp(cur_star.star_class, star_class)==0 ){ // check for a matching star_type 42: printf("| %20s | %8s | %12.4f |\n", 43: cur_star.name, cur_star.star_class, cur_star.dist); 44: hits++; 45: } 46: total_bytes += nbytes; // tack on the total bytes as progress is made 47: } 48: close(starfd); 49: printf("%d stars matched\n",hits); 50: return 0; 51: }
(C) No Size Needed
While Nosfew is aware of the read()
system call, he is unaware of
some of its properties that make it unnecessary to know how many bytes
are in a file to process it. Re-write the main input loop in the
program to demonstrate this technique to Nosfew, that no size is
needed for this loop to function correctly.
SOLUTION
Click for Solution
read()
will return 0 on reaching the end of a file and having 0
bytes left to read. This is used easily to simplify the input loop so
that no a priori size is needed.
- Nosfew used
fork()/exec()
to run the command linestat
utility in a child process. That reports file statistics about the input file which he redirected into an output file which the parent process used to extract the size. - Nosfew could avoid all this by using the
stat() / fstat()
system call which directly provides information on the file such as its size in bytes. This cuts out entirely the need to start a child process and parse any output from it significantly reducing the code for the program.
1: // star_search_read0.c: demonstrate use of return value from read() to 2: // simplify the input loop. 3: 4: #include "star.h" 5: 6: int main(int argc, char *argv[]){ 7: if(argc < 3){ 8: printf("usage: %s <star_class> <filename>\n",argv[0]); 9: return 1; 10: } 11: 12: char *star_class = argv[1]; 13: char *star_file = argv[2]; 14: 15: // get stars 16: // 17: // open and process the main stars file until the end of the file; 18: // use the size in bytes obtained above to terminate the input loop 19: // 20: // use the return value from read() to terminate the loop 21: int starfd = open(star_file, O_RDONLY); // open original file which contains the star data 22: if(starfd == -1){ 23: perror("open() on starfile failed"); 24: return EXIT_FAILURE; 25: } 26: int hits = 0; // number of planets matching the star_type requested 27: star_t cur_star; // current star read from file 28: while(1){ 29: int nbytes = read(starfd, &cur_star, sizeof(star_t)); 30: if(nbytes == 0){ 31: break; 32: } 33: if( strcmp(cur_star.star_class, star_class)==0 ){ // check for a matching star_type 34: printf("| %20s | %8s | %12.4f |\n", 35: cur_star.name, cur_star.star_class, cur_star.dist); 36: hits++; 37: } 38: } 39: close(starfd); 40: printf("%d stars matched\n",hits); 41: return 0; 42: }
(D) Memory Mapping
It is expected that the star files that will be utilized will be large and more complex operations may be required on them than simply linearly scanning. Since you've had a good education about the memory hierarchy, you know that Memory Mapping the star data file would let the OS manage movement of large star files between permanent storage and DRAM.
Nosfew is not aware of how to memory map files. Re-write the
star_search.c
program to use a memory mapped file instead of the
traditional I/O regime to demonstrate this technique to him.
SOLUTION
Click for Solution
mmap()
can be used to map in the pages of the file after opening it and
getting the size with stat.
- Nosfew used
fork()/exec()
to run the command linestat
utility in a child process. That reports file statistics about the input file which he redirected into an output file which the parent process used to extract the size. - Nosfew could avoid all this by using the
stat() / fstat()
system call which directly provides information on the file such as its size in bytes. This cuts out entirely the need to start a child process and parse any output from it significantly reducing the code for the program.
1: // star_search_mmap.c: demonstrate use of the mmap() to memory map the 2: // data file and access it scontents. 3: 4: #include "star.h" 5: 6: int main(int argc, char *argv[]){ 7: if(argc < 3){ 8: printf("usage: %s <star_class> <filename>\n",argv[0]); 9: return 1; 10: } 11: 12: char *star_class = argv[1]; 13: char *star_file = argv[2]; 14: 15: int starfd = open(star_file, O_RDONLY); // open file which contains the star data 16: struct stat statbuf; 17: int ret = fstat(starfd, &statbuf); // use fstat() to get info on an already-open fd 18: if(ret == -1){ 19: perror("fstat() failed"); 20: return EXIT_FAILURE; 21: } 22: int size = statbuf.st_size; 23: star_t *stars = mmap(NULL, // OS assigns virtual address 24: size, // map whole file 25: PROT_READ, MAP_SHARED, // only reading file contents, could change perms later 26: starfd, 0); // map the open file starting at offset 0 27: 28: int nstars = size / sizeof(star_t); // calculate number of star_t structs in the file 29: int hits = 0; // number of planets matching the star_type requested 30: for(int i=0; i<nstars; i++){ 31: if( strcmp(stars[i].star_class, star_class)==0 ){ // check for a matching star_type 32: printf("| %20s | %8s | %12.4f |\n", 33: stars[i].name, stars[i].star_class, stars[i].dist); 34: hits++; 35: } 36: } 37: 38: munmap(stars, size); 39: close(starfd); 40: printf("%d stars matched\n",hits); 41: return 0; 42: }
(E) Multi-Threading
Once a file is memory mapped, it becomes much easier to divide work like searching among multiple threads. Nosfew is not aware of how to create and coordinate threads. Talk through a solution with him about how the Star Search program might be multi-threaded to improve performance.
- What kind of data is shared?
- What shared data needs to be coordinated as it changes?
- How would you arrange to "split up" the work of searching for stars that meet the search criteria?
- If the program continues to just print out stars, what data is shared?
- If the program instead collects the indices or names of all stars that meet the query criteria, what data needs to be shared among threads now?
No solution is provided for this problem. Write your own to show to Nosfew!