CSCI 4061 Lab09: Simple Pipelines
- Due: 11:59pm Tue 3/30/2021 on Gradescope
- Approximately 1.00% of total grade
CODE DISTRIBUTION: lab09-code.zip
- Download the code distribution
- See further setup instructions below
CHANGELOG: Empty
1 Rationale
The true power of Unix pipelines is only available if one is aware of the primitive tools, particularly text processing programs, that are available on Unix. This lab gives a brief demonstration of how these can be combined in interesting ways to solve new problems such as extracting specific data with relative ease.
NOTE: A recent lecture discusses specifically how to solve this lab and watching the video recording can help students needing assistance with it.
Grading Policy
Credit for this Lab is earned by completing the exercises here and
submitting a Zip of the work to Gradescope. Students are responsible
to check that the results produced locally via make test
are
reflected on Gradescope after submitting their completed
Zip. Successful completion earns 1 Engagement Point.
Lab Exercises are open resource/open collaboration and students are encouraged to coopearte on labs. Students may submit work as groups of up to 5 to Gradescope: one person submits then adds the names of their group members to the submission.
See the full policies in the course syllabus.
2 Codepack
The codepack for this lab is linked at the top of this document. Always download it and unzip/unpack it. It should contain the following files which are briefly described.
File | Use | Description |
---|---|---|
QUESTIONS.txt |
EDIT | Questions to answer: fill in the multiple choice selections in this file. |
biggest_increases.c |
CREATE | Shell file to create to complete for the CODE portion |
stock-apple.csv |
Data | Comma Separate Value files which are data to process |
stock-gamestop.csv |
Data | |
stock-uber.csv |
Data | |
column-data.txt |
Data | Simple test file to experiment on with text tools |
topk.sh |
Provided | Shell script demonstrating a pipeline that was discussed in lecture |
gettysburg.txt |
Data | Plain text to process using topk.sh |
QUESTIONS.txt.bk |
Backup | Backup copy of the original file to help revert if needed |
Makefile |
Build | Enables make test and make zip |
testy |
Testing | Test running scripts |
test_lab09.org |
Testing | Tests for this lab |
3 QUESTIONS.txt File Contents
Below are the contents of the QUESTIONS.txt
file for the lab.
Follow the instructions in it to complete the QUIZ and CODE questions
for the lab.
__________________ LAB 09 QUESTIONS __________________ Lab Instructions ================ Follow the instructions below to experiment with topics related to this lab. - For sections marked QUIZ, fill in an (X) for the appropriate response in this file. Use the command `make test-quiz' to see if all of your answers are correct. - For sections marked CODE, complete the code indicated. Use the command `make test-code' to check if your code is complete. - DO NOT CHANGE any parts of this file except the QUIZ sections as it may interfere with the tests otherwise. - If your `QUESTIONS.txt' file seems corrupted, restore it by copying over the `QUESTIONS.txt.bk' backup file. - When you complete the exercises, check your answers with `make test' and if all is well, create a zip file with `make zip' and upload it to Gradescope. Ensure that the Autograder there reflects your local results. - IF YOU WORK IN A GROUP only one member needs to submit and then add the names of their group. QUIZ Text tools for Pipelines ============================= Experiment with the options provided below to familiarize with some common text tools that are used in constructing pipelines. Which of following uses of `sort' will sort the data in `column-data.txt' in reverse numerical order by the first "key" (column)? - ( ) `cat column-data.txt | sort -k 2 -r' - ( ) `cat column-data.txt | sort -n' - ( ) `cat column-data.txt | sort -rn' - ( ) `cat column-data.txt | sort -k 1 -n' Which of the following uses of `tr' will replace the `.' (period) character in `column-data.txt' with a space? - ( ) `cat column-data.txt | tr -d '.'' - ( ) `cat column-data.txt | tr '.' ' '' - ( ) `cat column-data.txt | tr ' ' '.'' - ( ) `cat column-data.txt | tr -c '.' ' '' Which of the following uses of `awk' will print the 3rd column followed by the sum of the 1st and 4th columns? - ( ) `cat column-data.txt | awk '{print $3,($1+$4)}'' - ( ) `cat column-data.txt | awk '{$3,($1+$4)}'' - ( ) `cat column-data.txt | awk 'print $3,($1+$4)'' - ( ) `cat column-data.txt | awk '{print fields[3],(fields[1]+fields[4])}'' CODE biggest_increases.sh ========================= Use the tool knowledge that you accumulated above and do some manual reading to write a small shell script which does the following: Print the dates of the biggest single day increase in a stock file CSV that is formatted like those provided. As an example, the `stock-gamestop.csv' is a Comman Separated Value (CSV) file which looks like the following: ,---- | Date,Open,High,Low,Close,Volume | 03/19/2021,"195.73","227.00","182.66","200.27","24,677,301" | 03/18/2021,"214.00","218.88","195.65","201.75","11,799,910" | 03/17/2021,"217.84","231.47","204.00","209.81","16,481,590" | 03/16/2021,"203.16","220.70","172.35","208.17","35,422,871" | ... `---- Each column is divided by commas. The first column is the date, 2nd column the opening price for a stock, and 5th column the closing price for the stock. The price increase/decrease in a single day is found by subtracting: Column 5 minus Column 2. This program is easily completed using a Pipeline of UNIX text utilities, likely in a configuration like ,---- | cat stock-file.csv | tool1 | tool2 | ... `---- Some good choices for tools are - `cat' to output the entire contents of a file - `head / tail' to output the first few or last few lines of a file - `tr / sed' to transform input to newly formatted output - `awk' to extract individual, space-separated items on a line or perform simple arithmetic between line elements - `sort' to sort input according alphabetic/numeric criteria and print the sorted output To complete the code, place the finished pipeline in a shell script file called `biggest_increases.sh' which can be passed parameters that indicate which stock file to run on and how many of the top increases to show. See the provided `topk.sh' script which was discussed in lecture to see examples of how to formulate this shell script file. Some demos of the `biggest_increases.sh' script are as follows. Note the output format is `INCREASE DATE' for the script. ,---- | >> ./biggest_increases.sh 3 stocks-gamestop.csv | 59.42 01/26/2021 | 47.01 02/24/2021 | 39.61 03/08/2021 | | >> ./biggest_increases.sh 8 stocks-gamestop.csv | 59.42 01/26/2021 | 47.01 02/24/2021 | 39.61 03/08/2021 | 29.19 03/09/2021 | 22.42 01/22/2021 | 18.36 03/11/2021 | 15.86 03/01/2021 | 10.98 01/13/2021 | | >> ./biggest_increases.sh 5 stocks-apple.csv | 5.54 09/21/2020 | 5.11 08/21/2020 | 4.34 10/12/2020 | 4.04 03/01/2021 | 3.85 09/25/2020 | | >> ./biggest_increases.sh 2 stocks-uber.csv | 4.53 03/19/2020 | 3.83 12/02/2020 `----
4 Submission
Follow the instructions at the end of Lab01 if you need a refresher on how to upload your completed lab zip to Gradescope.
5 For More Information
While not a core operating system topic such as memory management, concurrency, or process creation, pipelines of tools are extremely useful. To gain some more insight into possibilities, consider the Tool Time session which goes into some more detail on the topic.