CMSC216 HW03: Basic I/O + Binary Integers
- Due: 11:59pm 18-Sep-2024
- Approximately 0.83% of total grade
- Homework and Quizzes are open resource/open collaboration. You must submit your own work but you may freely discuss HW topics with other members of the class.
CODE DISTRIBUTION: hw03-code.zip
- Download the code distribution
- See further setup instructions below
CHANGELOG:
1 Rationale
Reading data from files is an essential skill in any programming
environment and can be a bit trickier in C due to its reliance on
manual memory management via malloc()
. This HW discusses a standard
two-pass input strategy for dealing with such situations.
Modern computers represent all data using bits, collections of 1's and 0's. Integer data pervades many applications so understanding the bit-level representation of integers lends insight into their strengths and limitations. The two's complement representation for integers is by far the most common used by hardware and is therefore considered specially in this HW. C provides facilities to manipulate data at the bit level using logical and shift operations which are also discussed.
Associated Reading / Preparation
Most C references will contain information on C's standard I/O
libraries which discuss functions like fopen(), fscanf(), rewind()
and fclose()
.
Bryant and O'Hallaron: Ch 2.1-2.3 on binary representations of integers in unsigned and two's complement signed format. This page also includes a summary of the Two's Complement system.
Grading Policy
Credit for this HW is earned by taking the associated HW Quiz which is
linked under Gradescope
. The quiz will ask similar questions as
those that are present in the QUESTIONS.txt
file and those that
complete all answers in QUESTIONS.txt
should have no trouble with
the quiz.
Homework and Quizzes are open resource/open collaboration. You must submit your own work but you may freely discuss HW topics with other members of the class.
See the full policies in the course syllabus.
2 Codepack
The codepack for the HW contains the following files:
File | Description |
---|---|
QUESTIONS.txt |
Questions to answer |
twopass.c |
Code for Problem 1 |
nums1.txt / nums2.txt |
Data files for Problem 2 |
convert.c |
C file for Problem 4 |
twos_complement.c |
Simple demonstration of C code for twos complement |
3 Two's Complement Arithmetic
Understanding two's complement is necessary to complete a problem
QUESTIONS.txt
. The short presentation below gives the basics of
two's complement but you should study the material presented in our
textbook and lecture to supplement it.
The principles of two's complement representation for integers are as follows.
- Fix a number of bits to represent integers such as 8 bits or 32 bits
Zero is the bit string of all 0s. In 8 bits this is
0000 0000
Positive numbers always start with a 0. In 8 bits
+91
would be0101 1011
Negative numbers always start with a 1. In 8 bits, the following is a negative number
1010 0101
To determine the value of negative number in two's complement or negate it:
- Invert the bits: ones become zeros, zeros to ones. The C operator
invert is the tilde
~
symbol. - Add 1
This process on the above negative number is as follows.
~ 1010 0101 = negative number ----------- 0101 1010 = +90 + 1 ----------- 0101 1011 = +91
So the above negative number is
-91
as 91 is its twos complement.- Invert the bits: ones become zeros, zeros to ones. The C operator
invert is the tilde
The same process works to convert from positive to negative.
~ 0101 1011 = +91 ----------- ~ 1010 0100 = negative number + 1 ----------- ~ 1010 0101 = -91
So the above negative number is
-91
as 91 is its twos complement.C's unary negation operator performs these two steps of inverting and adding 1:
int p = 91; // 0101 1011 int n = -p; // 1010 0101
The context for interpreting bits matters. When interpreting bits as an unsigned quantity, each bit represents a power of 2 that should be added.
unsigned char p = 0b10100101; // 1 0 1 0 0 1 0 1 // 128 + 0 + 32 + 0 + 0 + 4 + 0 + 1 // = 165
In contrast, the same bits when interpreted in a signed context are a negative quantity. In the 2's complement representation, the highest order bit is associated with a negative power of two.
char p = 0b10100101; // signed quantity, negative due to leading 1 // 1 0 1 0 0 1 0 1 // -128 + 0 + 32 + 0 + 0 + 4 + 0 + 1 // = -91
Due to 0 starting with a
0
, there is one more bit string available for negative numbers than for positive numbers. This explains the asymmetric range of C data types. 8-bit characters have a range of -128 to +127 which are the following bit strings+127 = 0111 1111 -128 = 1000 0000
Arithmetic in twos complement has some nice properties such as going from 0 to -1:
0000 0000 = zero + 1111 1111 = -1 ----------- 1111 1111 = -1
More details of arithmetic will be discussed later.
For wider bit widths than 8, extend the sign bit for numbers left as in the following
8-bit +91: 0101 1011 16-bit +91: 0000 0000 0101 1011 8-bit -91: 1010 0101 16-bit -91: 1111 1111 1010 0101
4 Octal and Hexadecimal Representation of Numbers
The base-10 numbering system is taught rigorously from an early age and is familiar to most students. We have seen in the earlier Two's Complement system that numbers can also written in base-2 as Binary. This is indicative of numbers being independent of any particular notation or base. Binary is frequently used in computer science but is cumbersome as it takes a LOT of digits to write a number. When bits must be written, often they are written in a base-8 Octal format or base-16 Hexadecimal format. This is in part due to their compactness: the same bits are shown with fewer printed characters. Here is a short example.
|---------+--------------+----------------+----------------+-----------------+-----------------------| | Decimal | Byte = 8bits | Byte by 4 | Hexadecimal | Byte by 3 | Octal | |---------+--------------+----------------+----------------+-----------------+-----------------------| | 87 | 01010111 | bin: 0101 0111 | 57 = 5*16 + 7 | bin: 01 010 111 | 127 = 1*8^2 + 2*8 + 7 | | | | hex: 5 7 | hex dec | oct: 1 2 7 | oct dec | | | | | | | | | 60 | 00111100 | bin: 0011 1100 | 3C = 3*16 + 12 | bin: 00 111 100 | 074 = 0*8^2 + 7*8 + 4 | | | | hex: 3 C=12 | hex dec | oct: 0 7 4 | oct dec | | | | | | | | | 226 | 11100010 | bin: 1110 0010 | E2 = 14*16 + 2 | bin: 11 100 010 | 342 = 3*8^2 + 4*8 + 2 | | | | hex: E=14 2 | hex dec | oct: 3 4 2 | oct dec | |---------+--------------+----------------+----------------+-----------------+-----------------------|
Several items are of note:
- The same number is written in 4 forms, Decimal, Binary, Hexadecimal, and Octal
- A single Hexadecimal digit ranges from decimal 0-15. Values larger
than 9 are written as letters:
A=10, B=11, C=12, D=13, E=14, F=15
- Converting between Binary and Hexadecimal is easiest when grouping bits by 4: each 4 bits corresponds to one hexadecimal digit
- Octal digits ranger from 0-7
- Converting between Binary and Octal is easiest when grouping bits by 3: each 3 bits corresponds to one octal digit
5 What to Understand
Ensure that you understand
- Basics of reading data from files and allocating space for it
- Signed vs unsigned representations
- Two's complement representation of integers
- Conversions between binary, decimal, octal, and hexadecimal notation
6 Questions
Analyze the files in the provided codepack and answer the questions
given in QUESTIONS.txt
.
_________________ HW 03 QUESTIONS _________________ Write your answers to the questions below directly in this text file to prepare for the associated quiz. Credit for the HW is earned by completing the associated online quiz on Gradescope. PROBLEM 1: Two-Pass Input and malloc() ====================================== One frequently encounters the following situation when reading files - The file contains some data that is to be read into a dynamically allocated array - There is no indication of how much data is in the file Some file formats are designed to prevent this situation by storing the number of elements they contain early in the file (see the "Treasure Map" files in a recent lab for an example of this). However, programs can usually be written to handle input files without any size information which can make creating data files more convenient. C programs which want to read file data into an array commonly use a *two-pass* input strategy when the size of array needed is not known. 1. Read through the file once to count items in the file, the allocate memory for an array of that size 2. Back up to the beginning of the file and read data from the file into the array The provided program `twopass.c' demonstrates this technique along with several other input / formatting techniques. A ~ Compile and run the `twopass.c' program on the two provided text files `nums1.txt' and `nums2.txt'. Paste the output the program produces below. B ~ Examine the function `read_all_doubles()' in `twopass.c' and describe which lines/blocks of code carry out the following actions. Use the provided comments in the file as a guide. - Opens the file for reading and checks for errors - Counts all the doubles in the file - Allocates memory for doubles in the file - Moves the file read position to the beginning of the file - Closes the file when reading is complete C ~ In your answer to the previous problem, you should have identified a use of `malloc()' in `read_all_doubles()'. Where is this memory `free()''d and why? /Note: This question is similar to a lecture exercise on when to free() memory that has been malloc()'d./ D ~ Examine how the doubles read from file are printed in the `main()' function. Discuss below the format specifiers used by `printf()' and the width/precision modifiers that are used to get a "table-like" appearance. You may wish to consult the Manual page on `printf()' such as the one here: <http://man.he.net/?topic=printf+3§ion=3> (Note: Unix systems have several `printf()' functions and "Section 3" of the manual pages describes the C version). Contrast: Arrays vs Linked Lists ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Two-pass I/O or a data file containing the number of elements to read are required when reading data into an array as the size of the array must usually be fixed ahead of time for maximal efficiency. However, when reading into other data structures, notably Linked Lists, data can be "added on" efficiently to the data structure as it is read. This is one strength of such data structures compared to arrays but there are also disadvantages to consider. Understanding the tradeoffs between the different data structures available and choosing one that is most suited to the situation is an important part of good program design. Optional Enrichment: Reading without Storing ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ One minor irritation in `read_all_doubles()' is the following code: ,---- | double tmp; // temporary space to read one double | ...; | while(1){ // first input pass: indefinite loop to count doubles | int ret = fscanf(fin, "%lf", &tmp); // try to read a double | ...; | } `---- This loop is counting the doubles in an input file but discarding them. Avoiding the need to allocate the `tmp' variable is desirable and can be done using some more advanced format specifiers in `fscanf() / scanf()' that indicate "read a double but do not store it anywhere". This allows the counting loop to happen without need for any additional space for doubles. Look how to get `fscanf()' to parse items like doubles without requiring space to store the result. PROBLEM 2: Number conversions ============================= A ~ Fill in the following table of equivalences. Filling in the table from top to bottom is advantageous as earlier rows can sometimes be used to infer lower values. Feel free to make use of any ASCII table or the table.c program provided in the week 3 lecture code pack. ,---- | |-----+------+-----+-----------+--------------| | | Dec | Hex | Oct | Binary | Char | | |-----+------+-----+-----------+--------------| | | 9 | 0x09 | 11 | 0000 1001 | TAB | | | 10 | ? | ? | ? | \n (newline) | | | ? | 0x20 | ? | ? | SPACE | | | ? | ? | ? | 0011 0010 | ? | | | 65 | 0x41 | 101 | 0100 0001 | A | | | 66 | ? | ? | ? | ? | | | ? | 0x4F | 117 | ? | O | | | 80 | ? | ? | ? | P | | | 91 | ? | 133 | 0101 1011 | [ | | | 97 | 0x61 | 141 | ? | ? | | | ? | ? | 172 | 0111 1010 | z | | | 145 | 0x91 | 221 | ? | none | | | 160 | ? | ? | 1010 0000 | none | | | 180 | 0xB4 | 264 | ? | none | | | 255 | ? | ? | ? | none | | |-----+------+-----+-----------+--------------| `---- B ~ Fill in the bits, hex, and decimal values for the given examples. The first example is completed for you. Assume all of these are 32 bit unsigned integers. ,---- | COMPLETED | Binary: 0000 0000 0000 0000 0001 1000 1110 1001 | 0 0 0 0 1 8 E 9 | Hex : 0018E9 | Decimal: 6377 | | | NUMBER 1 | Binary: 0000 0000 0010 1111 0011 1010 1000 1101 | ? | Hex : ?? | Decimal: ?? | | | NUMBER 2 | Binary: ?? | 7 F 8 3 5 A 0 B | Hex : 7F835A0B | Decimal: ?? `---- PROBLEM 3: Signed Integer Conversions ===================================== A ~ Apply the steps involved in converting the following positive binary number to it's two's complement negation in 8-bit signed format. Recall the steps are - Invert the bits - Add one Apply these steps to the following number: ,---- | 0111 1100 = 0x7C = 124 (decimal) `---- The result is the two's complement representation of -124. Convert back to positive via the same process - Invert the bits - Add one to show that the original bits are gotten back. B ~ Complete the following table of equivalences assuming 8-bit twos-complement signed integers. The rightmost column is the inverse of the binary representation: flip 1's to 0's, and vice versa. ,---- | |------+------+-----------+-----------| | | Dec | Hex | Binary | Inverse | | |------+------+-----------+-----------| | | +5 | 0x05 | 0000 0101 | 1111 1010 | | | -5 | ? | 1111 1011 | ? | | | +32 | 0x20 | ? | ? | | | -32 | 0xE0 | ? | 0001 1111 | | | +127 | 0x7F | ? | ? | | | -127 | 0x81 | ? | ? | | | -128 | ? | 1000 0000 | ? | | | +2 | ? | ? | ? | | | -2 | 0xFE | ? | ? | | | +1 | 0x01 | 0000 0001 | ? | | | -1 | ? | 1111 1111 | ? | | | 0 | ? | ? | ? | | |------+------+-----------+-----------| `---- PROBLEM 4: Converting Strings to Numbers ======================================== Inspect the program in the code pack called `convert.c'. Compile and run it using ,---- | > gcc convert.c | > ./a.out `---- Describe briefly what kind of conversion is being done by the `convert()' function given. - A. What kind of data is input? - B. What result is produced by the function? - C. How is a success versus an error reported? - D. Why is this kind of conversion needed? - E. What built-in C function (useful for the assignment) does this conversion function use and how is its calling convention different from convert()?