Last Updated: 2025-05-15 Thu 13:21

CMSC216 Final Exam Review Exercise

CODE DISTRIBUTION: hw-review-final-code.zip

  • Download the code distribution every HW
  • See further setup instructions below

CHANGELOG: None

1 Review Problem

File Manifest

FILE Notes
star_search.c Code to review / alter
Makefile Makefile
star.h Header with data types / includes
stars_close.dat Binary data file used by program
star_table.txt Text version of the binary data
star_gen.c C program to generate binary data
   
star_search_commented.c Solutions to code review
star_search_stat.c  
star_search_read0.c  
star_search_mmap.c  

Overview

Nosfew Syscalls is prototyping some code to read a large database of information on stars. The eventual code will need to run on a large database where information on stars is stored in a star_t struct in binary format. The provided stars_close.dat file has a small collection of these stored in binary as an array of structs which is used for testing. Nosfew has finished the testing version which can be run as follows:

>> make
gcc -Wall -g -Wno-unused-variable -c star_gen.c
gcc -Wall -g -Wno-unused-variable -o star_gen star_gen.o
gcc -Wall -g -Wno-unused-variable -c star_search.c
gcc -Wall -g -Wno-unused-variable -o star_search star_search.o

>> ./star_search
usage: ./star_search <star_class> <filename>

>> ./star_search G2 stars_close.dat 
6032 bytes in file stars_close.dat
|     Alpha Centauri A |       G2 |       4.4000 |
1 stars matched

>> ./star_search M5 stars_close.dat 
6032 bytes in file stars_close.dat
|       Barnard's Star |       M5 |       5.9400 |
|            L 789-6 B |       M5 |      11.0800 |
|        Struve 2398 B |       M5 |      11.6000 |
|             L 725-32 |       M5 |      12.1200 |
|             Ross 780 |       M5 |      15.3400 |
|            LP 816-60 |       M5 |      17.9100 |
6 stars matched

>> ./star_search M3.5 stars_close.dat 
6032 bytes in file stars_close.dat
|        Luyten's Star |     M3.5 |      12.3900 |
|            Wolf 1061 |     M3.5 |      13.9100 |
|            BD+68 946 |     M3.5 |      14.7700 |
|          CD-44 11909 |     M3.5 |      16.4500 |
|            LP 656-38 |     M3.5 |      17.8500 |
|            L 205-128 |     M3.5 |      18.9500 |
|          Wolf 1055 A |     M3.5 |      19.1600 |
|        CD-36 13940 B |     M3.5 |      19.7400 |
8 stars matched

The code itself is in the file star_search.c

 1: // star_search.c: search stars
 2: 
 3: #include "star.h"
 4: 
 5: int main(int argc, char *argv[]){
 6:   if(argc < 3){
 7:     printf("usage: %s <star_class> <filename>\n",argv[0]);
 8:     return 1;
 9:   }
10: 
11:   char *star_class = argv[1];
12:   char *star_file = argv[2];
13: 
14:   // find size of the data
15:   int pid = fork();
16:   if(pid == 0){
17:     int outfd = open("statout.txt", O_WRONLY|O_TRUNC|O_CREAT, S_IRUSR|S_IWUSR);
18:     if(outfd == -1){
19:       perror("open() failed");
20:       exit(1);
21:     }
22:     int ret = dup2(outfd, STDOUT_FILENO);
23:     if(ret == -1){
24:       perror("dup2() failed");
25:       exit(1);
26:     }
27:     char *argv[] = {"stat", star_file, NULL};
28:     execvp(argv[0], argv);
29:     perror("exec() failed");
30:     exit(EXIT_FAILURE);
31:   }
32: 
33:   int status;
34:   int ret = waitpid(pid, &status, 0);
35:   if( !WIFEXITED(status) || WEXITSTATUS(status)!=0 ){
36:     printf("Couldn't get the size of %s\n",star_file);
37:     return EXIT_FAILURE;
38:   }
39:   
40:   // get size
41:   FILE *statfile = fopen("statout.txt","r");
42:   char buf[128], *res;
43:   res = fgets(buf, 128, statfile);
44:   res = fgets(buf, 128, statfile);
45:   if(res == NULL){
46:     printf("I/O Failed\n");
47:     fclose(statfile);
48:     return EXIT_FAILURE;
49:   }
50:   int size;
51:   sscanf(buf, " Size: %d", &size);
52:   fclose(statfile);
53:   printf("%d bytes in file %s\n",size,star_file);
54: 
55:   // get stars
56:   int starfd = open(star_file, O_RDONLY);
57:   if(starfd == -1){
58:     perror("open() on starfile failed");
59:     return EXIT_FAILURE;
60:   }
61:   int total_bytes = 0;
62:   int hits = 0;
63:   star_t cur_star;
64:   while(total_bytes < size){
65:     int nbytes = read(starfd, &cur_star, sizeof(star_t));
66:     if( strcmp(cur_star.star_class, star_class)==0 ){
67:       printf("| %20s | %8s | %12.4f |\n",
68:              cur_star.name, cur_star.star_class, cur_star.dist);
69:       hits++;
70:     }
71:     total_bytes += nbytes;
72:   }
73:   close(starfd);
74:   printf("%d stars matched\n",hits);
75:   return 0;
76: }

Code Review

You have been tasked to do a code review for Nosfew to provide feedback and suggestions for changes which will improve the quality and maintainability of his code. Below are a series of items to address in the code review.

Remember: you are mentoring Nosfew so be gentle when offering him feedback. Beginners who are encouraged with constructive comments will be open to more coaching; discouraging criticism tends to drive folks away from seeking feedback.

(A) Commenting

Nosfew has a typical beginner tendency to comment little in his code. Add your own comments to elaborate on the different parts of the code and their intended purpose. Be a little more verbose than you might be with peer programmers to set a concrete example for Nosfew about informative commenting practices.

If you aren't sure how some of the parts of the program work, discuss with your colleagues to determine what's going on.

SOLUTION

Click for Solution

Below is one possible version of the code which includes more commentary to explain what's going to improve maintainability.

 1: // star_search_commented.c: more information commenting on the
 2: // progression of the computation.
 3: 
 4: #include "star.h"
 5: 
 6: int main(int argc, char *argv[]){
 7:   if(argc < 3){
 8:     printf("usage: %s <star_class> <filename>\n",argv[0]);
 9:     return 1;
10:   }
11: 
12:   char *star_class = argv[1];
13:   char *star_file = argv[2];
14:   
15:   // find size of the data
16:   //
17:   // Use a child process that exec() the "stat" command line utility
18:   // get the size of the data file in bytes; uses I/O redirection to
19:   // channel output into a file which will be parsed by the parent
20:   // process.
21:   int pid = fork();
22: 
23:   if(pid == 0){                                     // CHILD CODE ONLY
24:     int outfd = open("statout.txt", O_WRONLY|O_TRUNC|O_CREAT, S_IRUSR|S_IWUSR);
25:     if(outfd == -1){
26:       perror("open() failed");
27:       exit(1);
28:     }
29:     int ret = dup2(outfd, STDOUT_FILENO);           // redirect output into the file
30:     if(ret == -1){
31:       perror("dup2() failed");
32:       exit(1);
33:     }
34:     char *argv[] = {"stat", star_file, NULL};       // exec the stat command
35:     execvp(argv[0], argv);                          // child will exit here or
36:     perror("exec() failed");                        // on a fail, bail out
37:     exit(EXIT_FAILURE);
38:   }
39: 
40:   int status;                                       // PARENT CODE ONLY
41:   int ret = waitpid(pid, &status, 0);               // wait until child is finished
42:   if( !WIFEXITED(status) || WEXITSTATUS(status)!=0 ){
43:     printf("Couldn't get the size of %s\n",star_file);
44:     return EXIT_FAILURE;
45:   }
46:   
47:   // get size
48:   //
49:   // extract the size in bytes of star_file from the text output of
50:   // stat; it's on the 2nd line after the word "Size:"
51:   FILE *statfile = fopen("statout.txt","r");        // open the output file the child created
52:   char buf[128], *res;
53:   res = fgets(buf, 128, statfile);                  // skip the first line
54:   res = fgets(buf, 128, statfile);                  // get the next line which will be "    Size: <bytes> ..."
55:   if(res == NULL){
56:     printf("I/O Failed\n");
57:     fclose(statfile);
58:     return EXIT_FAILURE;
59:   }
60:   int size;
61:   sscanf(buf, " Size: %d", &size);                  // get the size data from this line
62:   fclose(statfile);                                 // done with file
63:   printf("%d bytes in file %s\n",size,star_file);   // report bytes for ease of debugging
64: 
65:   // get stars
66:   //
67:   // open and process the main stars file until the end of the file;
68:   // use the size in bytes obtained above to terminate the input loop
69:   int starfd = open(star_file, O_RDONLY);           // open original file which contains the star data
70:   if(starfd == -1){
71:     perror("open() on starfile failed");
72:     return EXIT_FAILURE;
73:   }
74:   int total_bytes = 0;                              // total bytes procesed from the file
75:   int hits = 0;                                     // number of planets matching the star_type requested
76:   star_t cur_star;                                  // current star read from file
77:   while(total_bytes < size){
78:     int nbytes = read(starfd, &cur_star, sizeof(star_t));
79:     if( strcmp(cur_star.star_class, star_class)==0 ){ // check for a matching star_type
80:       printf("| %20s | %8s | %12.4f |\n",
81:              cur_star.name, cur_star.star_class, cur_star.dist);
82:       hits++;
83:     }
84:     total_bytes += nbytes;                          // tack on the total bytes as progress is made
85:   }
86:   close(starfd);
87:   printf("%d stars matched\n",hits);
88:   return 0;
89: }

(B) Size Determination

Nosfew had an irregular education concerning UNIX system calls so didn't get exposed to some of the important ones that you've had the benefit of learning. He cleverly worked around a gap in his knowledge on how to find the size in bytes of a file. Explain how he did this and also offer him guidance on what he might do instead in the future to more directly and efficiently determine the size of a file if needed. If you are feeling particularly benevolent, write revision of the program to demonstrate this technique.

SOLUTION

Click for Solution
  • Nosfew used fork()/exec() to run the command line stat utility in a child process. That reports file statistics about the input file which he redirected into an output file which the parent process used to extract the size.
  • Nosfew could avoid all this by using the stat() / fstat() system call which directly provides information on the file such as its size in bytes. This cuts out entirely the need to start a child process and parse any output from it significantly reducing the code for the program.
 1: // star_search_stat.c: demonstrate use of the stat() system call to
 2: // easily get the size of the input file.
 3: 
 4: #include "star.h"
 5: 
 6: int main(int argc, char *argv[]){
 7:   if(argc < 3){
 8:     printf("usage: %s <star_class> <filename>\n",argv[0]);
 9:     return 1;
10:   }
11: 
12:   char *star_class = argv[1];
13:   char *star_file = argv[2];
14:   
15:   // find size of the data
16:   //
17:   // Use a call to stat() to get the size of the star_file in bytes
18:   struct stat statbuf;
19:   int ret = stat(star_file, &statbuf);
20:   if(ret == -1){
21:     perror("stat() failed");
22:     return EXIT_FAILURE;
23:   }
24:   int size = statbuf.st_size;
25:   printf("%d bytes in file %s\n",size,star_file);   // report bytes for ease of debugging
26: 
27:   // get stars
28:   //
29:   // open and process the main stars file until the end of the file;
30:   // use the size in bytes obtained above to terminate the input loop
31:   int starfd = open(star_file, O_RDONLY);           // open original file which contains the star data
32:   if(starfd == -1){
33:     perror("open() on starfile failed");
34:     return EXIT_FAILURE;
35:   }
36:   int total_bytes = 0;                              // total bytes procesed from the file
37:   int hits = 0;                                     // number of planets matching the star_type requested
38:   star_t cur_star;                                  // current star read from file
39:   while(total_bytes < size){
40:     int nbytes = read(starfd, &cur_star, sizeof(star_t));
41:     if( strcmp(cur_star.star_class, star_class)==0 ){ // check for a matching star_type
42:       printf("| %20s | %8s | %12.4f |\n",
43:              cur_star.name, cur_star.star_class, cur_star.dist);
44:       hits++;
45:     }
46:     total_bytes += nbytes;                          // tack on the total bytes as progress is made
47:   }
48:   close(starfd);
49:   printf("%d stars matched\n",hits);
50:   return 0;
51: }

(C) No Size Needed

While Nosfew is aware of the read() system call, he is unaware of some of its properties that make it unnecessary to know how many bytes are in a file to process it. Re-write the main input loop in the program to demonstrate this technique to Nosfew, that no size is needed for this loop to function correctly.

SOLUTION

Click for Solution

read() will return 0 on reaching the end of a file and having 0 bytes left to read. This is used easily to simplify the input loop so that no a priori size is needed.

  • Nosfew used fork()/exec() to run the command line stat utility in a child process. That reports file statistics about the input file which he redirected into an output file which the parent process used to extract the size.
  • Nosfew could avoid all this by using the stat() / fstat() system call which directly provides information on the file such as its size in bytes. This cuts out entirely the need to start a child process and parse any output from it significantly reducing the code for the program.
 1: // star_search_read0.c: demonstrate use of return value from read() to
 2: // simplify the input loop.
 3: 
 4: #include "star.h"
 5: 
 6: int main(int argc, char *argv[]){
 7:   if(argc < 3){
 8:     printf("usage: %s <star_class> <filename>\n",argv[0]);
 9:     return 1;
10:   }
11: 
12:   char *star_class = argv[1];
13:   char *star_file = argv[2];
14:   
15:   // get stars
16:   //
17:   // open and process the main stars file until the end of the file;
18:   // use the size in bytes obtained above to terminate the input loop
19:   //
20:   // use the return value from read() to terminate the loop
21:   int starfd = open(star_file, O_RDONLY);           // open original file which contains the star data
22:   if(starfd == -1){
23:     perror("open() on starfile failed");
24:     return EXIT_FAILURE;
25:   }
26:   int hits = 0;                                     // number of planets matching the star_type requested
27:   star_t cur_star;                                  // current star read from file
28:   while(1){
29:     int nbytes = read(starfd, &cur_star, sizeof(star_t));
30:     if(nbytes == 0){
31:       break;
32:     }
33:     if( strcmp(cur_star.star_class, star_class)==0 ){ // check for a matching star_type
34:       printf("| %20s | %8s | %12.4f |\n",
35:              cur_star.name, cur_star.star_class, cur_star.dist);
36:       hits++;
37:     }
38:   }
39:   close(starfd);
40:   printf("%d stars matched\n",hits);
41:   return 0;
42: }

(D) Memory Mapping

It is expected that the star files that will be utilized will be large and more complex operations may be required on them than simply linearly scanning. Since you've had a good education about the memory hierarchy, you know that Memory Mapping the star data file would let the OS manage movement of large star files between permanent storage and DRAM.

Nosfew is not aware of how to memory map files. Re-write the star_search.c program to use a memory mapped file instead of the traditional I/O regime to demonstrate this technique to him.

SOLUTION

Click for Solution

mmap() can be used to map in the pages of the file after opening it and getting the size with stat.

  • Nosfew used fork()/exec() to run the command line stat utility in a child process. That reports file statistics about the input file which he redirected into an output file which the parent process used to extract the size.
  • Nosfew could avoid all this by using the stat() / fstat() system call which directly provides information on the file such as its size in bytes. This cuts out entirely the need to start a child process and parse any output from it significantly reducing the code for the program.
 1: // star_search_mmap.c: demonstrate use of the mmap() to memory map the
 2: // data file and access it scontents.
 3: 
 4: #include "star.h"
 5: 
 6: int main(int argc, char *argv[]){
 7:   if(argc < 3){
 8:     printf("usage: %s <star_class> <filename>\n",argv[0]);
 9:     return 1;
10:   }
11: 
12:   char *star_class = argv[1];
13:   char *star_file = argv[2];
14:   
15:   int starfd = open(star_file, O_RDONLY);           // open file which contains the star data
16:   struct stat statbuf;
17:   int ret = fstat(starfd, &statbuf);                // use fstat() to get info on an already-open fd
18:   if(ret == -1){
19:     perror("fstat() failed");
20:     return EXIT_FAILURE;
21:   }
22:   int size = statbuf.st_size;
23:   star_t *stars = mmap(NULL,                        // OS assigns virtual address
24:                        size,                        // map whole file
25:                        PROT_READ, MAP_SHARED,       // only reading file contents, could change perms later
26:                        starfd, 0);                  // map the open file starting at offset 0
27: 
28:   int nstars = size / sizeof(star_t);               // calculate number of star_t structs in the file
29:   int hits = 0;                                     // number of planets matching the star_type requested
30:   for(int i=0; i<nstars; i++){
31:     if( strcmp(stars[i].star_class, star_class)==0 ){ // check for a matching star_type
32:       printf("| %20s | %8s | %12.4f |\n",
33:              stars[i].name, stars[i].star_class, stars[i].dist);
34:       hits++;
35:     }
36:   }
37: 
38:   munmap(stars, size);
39:   close(starfd);
40:   printf("%d stars matched\n",hits);
41:   return 0;
42: }

(E) Multi-Threading

Once a file is memory mapped, it becomes much easier to divide work like searching among multiple threads. Nosfew is not aware of how to create and coordinate threads. Talk through a solution with him about how the Star Search program might be multi-threaded to improve performance.

  • What kind of data is shared?
  • What shared data needs to be coordinated as it changes?
  • How would you arrange to "split up" the work of searching for stars that meet the search criteria?
  • If the program continues to just print out stars, what data is shared?
  • If the program instead collects the indices or names of all stars that meet the query criteria, what data needs to be shared among threads now?

No solution is provided for this problem. Write your own to show to Nosfew!


Web Accessibility
Author: Chris Kauffman (profk@umd.edu)
Date: 2025-05-15 Thu 13:21