CSCI 4061 Lab09: Simple Pipelines

Due: 11:59pm Tue 3/30/2021 on Gradescope
Approximately 1.00% of total grade

CODE DISTRIBUTION: lab09-code.zip

Download the code distribution
See further setup instructions below

CHANGELOG: Empty

1. Rationale
2. Codepack
3. QUESTIONS.txt File Contents
4. Submission
5. For More Information

1 Rationale

The true power of Unix pipelines is only available if one is aware of the primitive tools, particularly text processing programs, that are available on Unix. This lab gives a brief demonstration of how these can be combined in interesting ways to solve new problems such as extracting specific data with relative ease.

NOTE: A recent lecture discusses specifically how to solve this lab and watching the video recording can help students needing assistance with it.

Grading Policy

Credit for this Lab is earned by completing the exercises here and submitting a Zip of the work to Gradescope. Students are responsible to check that the results produced locally via make test are reflected on Gradescope after submitting their completed Zip. Successful completion earns 1 Engagement Point.

Lab Exercises are open resource/open collaboration and students are encouraged to coopearte on labs. Students may submit work as groups of up to 5 to Gradescope: one person submits then adds the names of their group members to the submission.

See the full policies in the course syllabus.

2 Codepack

The codepack for this lab is linked at the top of this document. Always download it and unzip/unpack it. It should contain the following files which are briefly described.

File	Use	Description
`QUESTIONS.txt`	EDIT	Questions to answer: fill in the multiple choice selections in this file.
`biggest_increases.c`	CREATE	Shell file to create to complete for the CODE portion
`stock-apple.csv`	Data	Comma Separate Value files which are data to process
`stock-gamestop.csv`	Data
`stock-uber.csv`	Data
`column-data.txt`	Data	Simple test file to experiment on with text tools
`topk.sh`	Provided	Shell script demonstrating a pipeline that was discussed in lecture
`gettysburg.txt`	Data	Plain text to process using `topk.sh`
`QUESTIONS.txt.bk`	Backup	Backup copy of the original file to help revert if needed
`Makefile`	Build	Enables `make test` and `make zip`
`testy`	Testing	Test running scripts
`test_lab09.org`	Testing	Tests for this lab

3 QUESTIONS.txt File Contents

Below are the contents of the QUESTIONS.txt file for the lab. Follow the instructions in it to complete the QUIZ and CODE questions for the lab.

                           __________________

                            LAB 09 QUESTIONS
                           __________________





Lab Instructions
================

  Follow the instructions below to experiment with topics related to
  this lab.
  - For sections marked QUIZ, fill in an (X) for the appropriate
    response in this file. Use the command `make test-quiz' to see if
    all of your answers are correct.
  - For sections marked CODE, complete the code indicated. Use the
    command `make test-code' to check if your code is complete.
  - DO NOT CHANGE any parts of this file except the QUIZ sections as it
    may interfere with the tests otherwise.
  - If your `QUESTIONS.txt' file seems corrupted, restore it by copying
    over the `QUESTIONS.txt.bk' backup file.
  - When you complete the exercises, check your answers with `make test'
    and if all is well, create a zip file with `make zip' and upload it
    to Gradescope. Ensure that the Autograder there reflects your local
    results.
  - IF YOU WORK IN A GROUP only one member needs to submit and then add
    the names of their group.


QUIZ Text tools for Pipelines
=============================

  Experiment with the options provided below to familiarize with some
  common text tools that are used in constructing pipelines.

  Which of following uses of `sort' will sort the data in
  `column-data.txt' in reverse numerical order by the first "key"
  (column)?
  - ( ) `cat column-data.txt | sort -k 2 -r'
  - ( ) `cat column-data.txt | sort -n'
  - ( ) `cat column-data.txt | sort -rn'
  - ( ) `cat column-data.txt | sort -k 1 -n'

  Which of the following uses of `tr' will replace the `.' (period)
  character in `column-data.txt' with a space?
  - ( ) `cat column-data.txt | tr -d '.''
  - ( ) `cat column-data.txt | tr '.' ' ''
  - ( ) `cat column-data.txt | tr ' ' '.''
  - ( ) `cat column-data.txt | tr -c '.' ' ''

  Which of the following uses of `awk' will print the 3rd column
  followed by the sum of the 1st and 4th columns?
  - ( ) `cat column-data.txt | awk '{print $3,($1+$4)}''
  - ( ) `cat column-data.txt | awk '{$3,($1+$4)}''
  - ( ) `cat column-data.txt | awk 'print $3,($1+$4)''
  - ( ) `cat column-data.txt | awk '{print
    fields[3],(fields[1]+fields[4])}''


CODE biggest_increases.sh
=========================

  Use the tool knowledge that you accumulated above and do some manual
  reading to write a small shell script which does the following:

  Print the dates of the biggest single day increase in a stock file CSV
  that is formatted like those provided.

  As an example, the `stock-gamestop.csv' is a Comman Separated Value
  (CSV) file which looks like the following:

  ,----
  | Date,Open,High,Low,Close,Volume
  | 03/19/2021,"195.73","227.00","182.66","200.27","24,677,301"
  | 03/18/2021,"214.00","218.88","195.65","201.75","11,799,910"
  | 03/17/2021,"217.84","231.47","204.00","209.81","16,481,590"
  | 03/16/2021,"203.16","220.70","172.35","208.17","35,422,871"
  | ...
  `----

  Each column is divided by commas. The first column is the date, 2nd
  column the opening price for a stock, and 5th column the closing price
  for the stock.  The price increase/decrease in a single day is found
  by subtracting: Column 5 minus Column 2.

  This program is easily completed using a Pipeline of UNIX text
  utilities, likely in a configuration like
  ,----
  | cat stock-file.csv | tool1 | tool2 | ...
  `----

  Some good choices for tools are
  - `cat' to output the entire contents of a file
  - `head / tail' to output the first few or last few lines of a file
  - `tr / sed' to transform input to newly formatted output
  - `awk' to extract individual, space-separated items on a line or
    perform simple arithmetic between line elements
  - `sort' to sort input according alphabetic/numeric criteria and print
    the sorted output

  To complete the code, place the finished pipeline in a shell script
  file called `biggest_increases.sh' which can be passed parameters that
  indicate which stock file to run on and how many of the top increases
  to show. See the provided `topk.sh' script which was discussed in
  lecture to see examples of how to formulate this shell script file.

  Some demos of the `biggest_increases.sh' script are as follows. Note
  the output format is `INCREASE DATE' for the script.
  ,----
  | >> ./biggest_increases.sh 3 stocks-gamestop.csv 
  | 59.42 01/26/2021
  | 47.01 02/24/2021
  | 39.61 03/08/2021
  | 
  | >> ./biggest_increases.sh 8 stocks-gamestop.csv 
  | 59.42 01/26/2021
  | 47.01 02/24/2021
  | 39.61 03/08/2021
  | 29.19 03/09/2021
  | 22.42 01/22/2021
  | 18.36 03/11/2021
  | 15.86 03/01/2021
  | 10.98 01/13/2021
  | 
  | >> ./biggest_increases.sh 5 stocks-apple.csv 
  | 5.54 09/21/2020
  | 5.11 08/21/2020
  | 4.34 10/12/2020
  | 4.04 03/01/2021
  | 3.85 09/25/2020
  |  
  | >> ./biggest_increases.sh 2 stocks-uber.csv 
  | 4.53 03/19/2020
  | 3.83 12/02/2020
  `----

4 Submission

Follow the instructions at the end of Lab01 if you need a refresher on how to upload your completed lab zip to Gradescope.

5 For More Information

While not a core operating system topic such as memory management, concurrency, or process creation, pipelines of tools are extremely useful. To gain some more insight into possibilities, consider the Tool Time session which goes into some more detail on the topic.