CSCI 4061 Lab09: Simple Pipelines
- Due: 11:59pm Tue 3/30/2021 on Gradescope
- Approximately 1.00% of total grade
CODE DISTRIBUTION: lab09-code.zip
- Download the code distribution
- See further setup instructions below
CHANGELOG: Empty
1 Rationale
The true power of Unix pipelines is only available if one is aware of the primitive tools, particularly text processing programs, that are available on Unix. This lab gives a brief demonstration of how these can be combined in interesting ways to solve new problems such as extracting specific data with relative ease.
NOTE: A recent lecture discusses specifically how to solve this lab and watching the video recording can help students needing assistance with it.
Grading Policy
Credit for this Lab is earned by completing the exercises here and
submitting a Zip of the work to Gradescope. Students are responsible
to check that the results produced locally via make test are
reflected on Gradescope after submitting their completed
Zip. Successful completion earns 1 Engagement Point.
Lab Exercises are open resource/open collaboration and students are encouraged to coopearte on labs. Students may submit work as groups of up to 5 to Gradescope: one person submits then adds the names of their group members to the submission.
See the full policies in the course syllabus.
2 Codepack
The codepack for this lab is linked at the top of this document. Always download it and unzip/unpack it. It should contain the following files which are briefly described.
| File | Use | Description | 
|---|---|---|
| QUESTIONS.txt | EDIT | Questions to answer: fill in the multiple choice selections in this file. | 
| biggest_increases.c | CREATE | Shell file to create to complete for the CODE portion | 
| stock-apple.csv | Data | Comma Separate Value files which are data to process | 
| stock-gamestop.csv | Data | |
| stock-uber.csv | Data | |
| column-data.txt | Data | Simple test file to experiment on with text tools | 
| topk.sh | Provided | Shell script demonstrating a pipeline that was discussed in lecture | 
| gettysburg.txt | Data | Plain text to process using topk.sh | 
| QUESTIONS.txt.bk | Backup | Backup copy of the original file to help revert if needed | 
| Makefile | Build | Enables make testandmake zip | 
| testy | Testing | Test running scripts | 
| test_lab09.org | Testing | Tests for this lab | 
3 QUESTIONS.txt File Contents
Below are the contents of the QUESTIONS.txt file for the lab.
Follow the instructions in it to complete the QUIZ and CODE questions
for the lab.
                           __________________
                            LAB 09 QUESTIONS
                           __________________
Lab Instructions
================
  Follow the instructions below to experiment with topics related to
  this lab.
  - For sections marked QUIZ, fill in an (X) for the appropriate
    response in this file. Use the command `make test-quiz' to see if
    all of your answers are correct.
  - For sections marked CODE, complete the code indicated. Use the
    command `make test-code' to check if your code is complete.
  - DO NOT CHANGE any parts of this file except the QUIZ sections as it
    may interfere with the tests otherwise.
  - If your `QUESTIONS.txt' file seems corrupted, restore it by copying
    over the `QUESTIONS.txt.bk' backup file.
  - When you complete the exercises, check your answers with `make test'
    and if all is well, create a zip file with `make zip' and upload it
    to Gradescope. Ensure that the Autograder there reflects your local
    results.
  - IF YOU WORK IN A GROUP only one member needs to submit and then add
    the names of their group.
QUIZ Text tools for Pipelines
=============================
  Experiment with the options provided below to familiarize with some
  common text tools that are used in constructing pipelines.
  Which of following uses of `sort' will sort the data in
  `column-data.txt' in reverse numerical order by the first "key"
  (column)?
  - ( ) `cat column-data.txt | sort -k 2 -r'
  - ( ) `cat column-data.txt | sort -n'
  - ( ) `cat column-data.txt | sort -rn'
  - ( ) `cat column-data.txt | sort -k 1 -n'
  Which of the following uses of `tr' will replace the `.' (period)
  character in `column-data.txt' with a space?
  - ( ) `cat column-data.txt | tr -d '.''
  - ( ) `cat column-data.txt | tr '.' ' ''
  - ( ) `cat column-data.txt | tr ' ' '.''
  - ( ) `cat column-data.txt | tr -c '.' ' ''
  Which of the following uses of `awk' will print the 3rd column
  followed by the sum of the 1st and 4th columns?
  - ( ) `cat column-data.txt | awk '{print $3,($1+$4)}''
  - ( ) `cat column-data.txt | awk '{$3,($1+$4)}''
  - ( ) `cat column-data.txt | awk 'print $3,($1+$4)''
  - ( ) `cat column-data.txt | awk '{print
    fields[3],(fields[1]+fields[4])}''
CODE biggest_increases.sh
=========================
  Use the tool knowledge that you accumulated above and do some manual
  reading to write a small shell script which does the following:
  Print the dates of the biggest single day increase in a stock file CSV
  that is formatted like those provided.
  As an example, the `stock-gamestop.csv' is a Comman Separated Value
  (CSV) file which looks like the following:
  ,----
  | Date,Open,High,Low,Close,Volume
  | 03/19/2021,"195.73","227.00","182.66","200.27","24,677,301"
  | 03/18/2021,"214.00","218.88","195.65","201.75","11,799,910"
  | 03/17/2021,"217.84","231.47","204.00","209.81","16,481,590"
  | 03/16/2021,"203.16","220.70","172.35","208.17","35,422,871"
  | ...
  `----
  Each column is divided by commas. The first column is the date, 2nd
  column the opening price for a stock, and 5th column the closing price
  for the stock.  The price increase/decrease in a single day is found
  by subtracting: Column 5 minus Column 2.
  This program is easily completed using a Pipeline of UNIX text
  utilities, likely in a configuration like
  ,----
  | cat stock-file.csv | tool1 | tool2 | ...
  `----
  Some good choices for tools are
  - `cat' to output the entire contents of a file
  - `head / tail' to output the first few or last few lines of a file
  - `tr / sed' to transform input to newly formatted output
  - `awk' to extract individual, space-separated items on a line or
    perform simple arithmetic between line elements
  - `sort' to sort input according alphabetic/numeric criteria and print
    the sorted output
  To complete the code, place the finished pipeline in a shell script
  file called `biggest_increases.sh' which can be passed parameters that
  indicate which stock file to run on and how many of the top increases
  to show. See the provided `topk.sh' script which was discussed in
  lecture to see examples of how to formulate this shell script file.
  Some demos of the `biggest_increases.sh' script are as follows. Note
  the output format is `INCREASE DATE' for the script.
  ,----
  | >> ./biggest_increases.sh 3 stocks-gamestop.csv 
  | 59.42 01/26/2021
  | 47.01 02/24/2021
  | 39.61 03/08/2021
  | 
  | >> ./biggest_increases.sh 8 stocks-gamestop.csv 
  | 59.42 01/26/2021
  | 47.01 02/24/2021
  | 39.61 03/08/2021
  | 29.19 03/09/2021
  | 22.42 01/22/2021
  | 18.36 03/11/2021
  | 15.86 03/01/2021
  | 10.98 01/13/2021
  | 
  | >> ./biggest_increases.sh 5 stocks-apple.csv 
  | 5.54 09/21/2020
  | 5.11 08/21/2020
  | 4.34 10/12/2020
  | 4.04 03/01/2021
  | 3.85 09/25/2020
  |  
  | >> ./biggest_increases.sh 2 stocks-uber.csv 
  | 4.53 03/19/2020
  | 3.83 12/02/2020
  `----
4 Submission
Follow the instructions at the end of Lab01 if you need a refresher on how to upload your completed lab zip to Gradescope.
5 For More Information
While not a core operating system topic such as memory management, concurrency, or process creation, pipelines of tools are extremely useful. To gain some more insight into possibilities, consider the Tool Time session which goes into some more detail on the topic.