a5 programm22

profilesehj
a5-comp2140-2018.pdf

COMP 2140 Assignment 5

Huffman Coding

Pak Ching Li

Due: Friday April 6, 2018 - 11:59 PM (Last Day of Classes)

Programming Standards

When writing code for this course, follow the programming standards, available on this course’s website on UMLearn. Failure to do so will result in the loss of marks.

Objective

In this assignment, you will use binary trees and heaps to implement the Huffman coding scheme.

Your Program

General Overview: In this assignment, your task is to implement the encoding and decoding algorithms for Huffman coding. Huffman coding will be discussed in class and therefore we will not spend much time talking about how the algorithms work.

There are three major steps to the solution that needs to be dealt with:

1. Reading in a data file and computing the frequency of characters.

2. Using these frequencies, build an Huffman encoding tree and encode the data file using it.

3. Using the encoding of the data file and the Huffman tree, decode to obtain original data file.

Important: Before you go any further, you should go through the slides from the lecture notes on Huffman coding and have them handy as you are reading the rest of the assignment.

1-Reading Data File and Frequency Computation

In this step, you are to prompt for a text data file, open it and compute (and store) the frequency of each character that occurs.

To help you in computing the frequencies, a set of helper classes in the file FrequencyList.java is provided that will do this for you. These classes will create a linked list where each node’s data consists of a character and its frequency (in the data file). In addition, the class FrequencyTester.java shows how to use these helper classes. You are free to modify the code as you please, if you want.

Note: When you are using a method to read in a line from the input file, the result does not include a newline (\n). Therefore, you have to manually add it as an input character at the end of each line (see code for FrequencyTester.java).

In this step, your program should output the frequency table of the symbols. The code to print this table is already part of the code provided. See FrequencyTester.java on how to do this. For example, on the test file smalltest.txt provided, your output should look like:

1

Symbols and their frequencies

k:1

T:1

.:2

n:2

i:1

f:1

m:1

I:1

<NEWLINE>:3

?:1

u:2

y:2

a:3

w:1

,:1

r:2

t:1

<SPACE>:8

o:4

l:2

e:5

h:4

2- Building the Huffman encoding tree

In this step, you are to build a Huffman encoding tree, which is a binary tree (but not a BST) which will encode each of the characters in your input as a sequence of zeros and ones. Initially create a Huffman tree for each data item in your linked list from the previous step and put these into the heap (see slide 13-14). To build the Huffman encoding tree, you successive remove the two items (trees) in your heap, combine them and then insert the resulting tree into the heap. The data that you need to store for this new node is the concatenation of the symbols of the two joined nodes and the sum of the frequencies of the two joined nodes. Remember that the frequency of a node is used as the priority in the heap. Repeat this process until there is one tree left in the heap. This algorithm is also stated on slide 15 of the lecture notes.

The sole tree in the heap is the Huffman encoding tree you want. This tree is used to encode the input file. The encoding for a character is obtained by traversing the tree from the root to a leaf, where traversing a left child means encoding contains a 0 and traversing a right child means encoding a 1. See slides 38-40 of lecture notes for more details.

Using the Huffman encoding tree, output the encoding of each distinct character that is encoded by the Huffman tree. This can be done by traversing the tree and printing the encoding for a character when you encounter a leaf node. For example: An output will look something like: Huffman Encoding

n:000

s:001

\n:01 t:10

e:11

Then, encode the entire input file and output the encoding. This will be a sequence of 0’s and 1’s. For each line of the input text, you should print the encoding on a separate line. For example, if the input file contains the line seentee\n, you should output 001111100010111101 when encoding using the tree given by Figure 1. In addition you should store the entire encoding of the input file in a string variable. Note: The Huffman encoding tree is not necessarily unique.

2

{n} 1

{s} 1

{n,s} 2

{\n} 3

{t} 4

{e} 4

{n,s,\n} 5

{t,e} 8

{n,s,\n,t,e} 13

Figure 1: Sample Huffman Encoding Tree

3-Decoding the file

In this last step, you should decode the encoded string of 0’s and 1’s (from the previous step ) and output the decoded text. Of course, you will need to use the Huffman encoding tree (from the previous step) for the decoding process. For example if the encoding string is 001111100010111101, then, using the tree in Figure 1 as the encoding tree, your output should be seentee followed by a newline character. The exact algorithm is discussed in the lecture notes (slides 41-43).

Note: If you decode the encoding string and the output is identical to the contents of the input file, then your program probably is working.

Classes

At a minimum, you should create Java classes to represent a heap, and a Huffman tree.

What to Hand In

You will hand in a zipped (.zip) file containing the source file A5<your last name><your student id>.java (eg. A3Wong6251989) which contains the main() method. In addition, include any other .java files that is put of your solution, including the supplied file FrequencyList.java. You may elect to create a .java file for each class you create or have all the classes contained in a single file, or something in between.

3