a5 programm22
Priority Queues and Huffman Coding
Helen Cameron
Comp 2140
Helen Cameron Huffman Coding 1/45
Huffman Encoding: An Application
Huffman encoding is used to compress files.
Helen Cameron Huffman Coding 2/45
Huffman Encoding: An Application
Most web pages use UTF-8 for character encoding.
UTF-8 uses 8 bits to encode each character — the character encodings are all the same length.
However, a file probably doesn’t use all 28 = 256 possible characters.
Helen Cameron Huffman Coding 3/45
Huffman Encoding: An Application
How Huffman encoding compresses a file:
The characters in the file are encoded with fewer bits (and different characters may be encoded with a different number of bits), so that the file takes up less space.
Idea: Encode the characters used most frequently with the fewest bits.
Helen Cameron Huffman Coding 4/45
Ambiguous and Unambiguous Codes
Suppose we have the following codes:
Letter Code a 1
b 0
c 101
Suppose we see an encoded string “101”. Does that represent “aba” or “c”?
The code is ambiguous because the code for “a” is a prefix of the code for “c” — when we see “1” in an encoded string, we can’t know whether that represents “a” or the start of “c”.
Helen Cameron Huffman Coding 5/45
Ambiguous and Unambiguous Codes
We want a code that has the prefix property: no character code is the prefix of any other character code.
Example:
Letter Code a 00
b 1
c 01
What does “011100” represent?
A code with the prefix property is unambiguous: an encoded string has only one interpretation.
Helen Cameron Huffman Coding 6/45
Huffman Encoding
A binary tree can represent an encoding:
The characters to be encoded are stored in the leaves.
The code for a character is given by the path from the root to the leaf containing the character.
Moving to a left child represents “0”.
Moving to a right child represents “1”.
Helen Cameron Huffman Coding 7/45
Example Huffman Tree
{a} {c}
{a,c} {b}
{a,b,c}
Letter Code a 00
b 1
c 01
Helen Cameron Huffman Coding 8/45
Huffman Tree
Each node represents a set of characters and the sum of the frequencies with which the characters in the set appear in the text we want to compress.
A leaf contains a one-character set and the frequency of that character in the text.
An internal node always has two children and contains the union of the sets of characters of its two children.
Helen Cameron Huffman Coding 9/45
A More Detailed Example
{f} 1
{s} 1
{f,s} 2
{c} 3
{b} 4
{u} 4
{f,s,c} 5
{b,u} 8
{f,s,c,b,u} 13
Helen Cameron Huffman Coding 10/45
Huffman Encoding
The leaves in a Huffman tree can be stored at different levels.
The leaves at different levels have codes of different lengths.
Huffman’s algorithm generates a Huffman tree where the most frequent letters are stored closest to the root (have short codes) and the least frequently used letters are stored farthest from the root (have the longest codes).
Helen Cameron Huffman Coding 11/45
Letter Frequencies
Suppose we have a text with the following characters and frequencies:
Char Freq Char Freq Char Freq
a 18 l 12 t 8 d 5 n 12 . 2 e 9 r 6 t 27 8 11 s 5 $ 4
where t represents a blank and $ represents a new line character.
Helen Cameron Huffman Coding 12/45
Huffman’s Algorithm: First Step
For each character-frequency pair, create a one-node tree (i.e., a leaf containing the character and its frequency).
{a} 18
{d} 5
{e} 9
{i} 11
{l} 12
{n} 12
{r} 6
{s} 5
{t} 8
{.} 2
{t} 27
{$} 4
Helen Cameron Huffman Coding 13/45
Huffman’s Algorithm: Second Step
Now place the one-node trees in a priority queue, where the priority of a tree is the frequency recorded in its root node. In this case, the smaller the frequency the higher the priority — the lowest frequencies come out first. (Use a min heap, not a max heap.)
{.} 2
{$} 4
{d} 5
{s} 5
{r} 6
{t} 8
{e} 9
{i} 11
{l} 12
{n} 12
{a} 18
{t} 27
(I’m just using an ordered list as a priority queue to simplify the diagrams — but you should use a min heap.)
Helen Cameron Huffman Coding 14/45
Huffman’s Algorithm: The Loop
Perform the following steps:
DeleteMin twice to get the two trees T1 and T2 with the lowest frequencies in their roots.
Create a new root and make T1 and T2 the children of the new root.
The frequency to put in the new root is the sum of the frequencies in the roots of T1 and T2.
The set of characters in the new root is the union of the sets of characters in the roots of T1 and T2.
Insert the new tree into the priority queue.
Repeat until there’s only one tree in the priority queue. The remaining tree is the Huffman encoding of the characters.
Helen Cameron Huffman Coding 15/45
Huffman’s Algorithm: Looping
The first two deleteMins give us “.” and “$” as T1 and T2, so we build the following tree:
{.} 2
{$} 4
{.,$} 6
Now we insert the new tree into the priority queue.
Helen Cameron Huffman Coding 16/45
Huffman’s Algorithm: Looping
{d} 5
{s} 5
{r} 6
{.} 2
{$} 4
{.,$} 6
{t} 8
{e} 9
{i} 11
{l} 12
{n} 12
{a} 18
{t} 27
There’s still more than one tree in the priority queue, so we loop again.
Helen Cameron Huffman Coding 17/45
Huffman’s Algorithm: Looping
The second two deleteMins give us “d” and “s” as T1 and T2, so we build the following tree:
{d} 5
{s} 5
{d,s} 10
Now we insert the new tree into the priority queue.
Helen Cameron Huffman Coding 18/45
Huffman’s Algorithm: Looping
{r} 6
{.} 2
{$} 4
{.,$} 6
{t} 8
{e} 9
{d} 5
{s} 5
{d,s} 10
{i} 11
{l} 12
{n} 12
{a} 18
{t} 27
Again, there’s more than one tree in the priority queue, so we loop again.
Helen Cameron Huffman Coding 19/45
Huffman’s Algorithm: Looping
The third two deleteMins give us “r” and “{.,$}” as T1 and T2, so we build the following tree:
{r} 6
{.} 2
{$} 4
{.,$} 6
{r,.,$} 12
Now we insert the new tree into the priority queue.
Helen Cameron Huffman Coding 20/45
Huffman’s Algorithm: Looping
{t} 8
{e} 9
{d} 5
{s} 5
{d,s} 10
{i} 11
{l} 12
{n} 12
{r} 6
{.} 2
{$} 4
{.,$} 6
{r,.,$} 12
{a} 18
{t} 27
Again, there’s more than one tree in the priority queue, so we loop again.
Helen Cameron Huffman Coding 21/45
Huffman’s Algorithm: Looping
The fourth two deleteMins give us “t” and “e” as T1 and T2, so we build the following tree:
{t} 8
{e} 9
{t,e} 17
Now we insert the new tree into the priority queue.
Helen Cameron Huffman Coding 22/45
Huffman’s Algorithm: Looping
{d} 5
{s} 5
{d,s} 10
{i} 11
{l} 12
{n} 12
{r} 6
{.} 2
{$} 4
{.,$} 6
{r,.,$} 12
{t} 8
{e} 9
{t,e} 17
{a} 18
{t} 27
Again, there’s more than one tree in the priority queue, so we loop again.
Helen Cameron Huffman Coding 23/45
Huffman’s Algorithm: Looping
The fifth two deleteMins give us “{d,s}” and “{i}” as T1 and T2, so we build the following tree:
{d} 5
{s} 5
{d,s} 10
{i} 11
{d,s,i} 21
Now we insert the new tree into the priority queue.
Helen Cameron Huffman Coding 24/45
Huffman’s Algorithm: Looping
{l} 12
{n} 12
{r} 6
{.} 2
{$} 4
{.,$} 6
{r,.,$} 12
{t} 8
{e} 9
{t,e} 17
{a} 18
{d} 5
{s} 5
{d,s} 10
{i} 11
{d,s,i} 21
{t} 27
Again, there’s more than one tree in the priority queue, so we loop again.
Helen Cameron Huffman Coding 25/45
Huffman’s Algorithm: Looping
The sixth two deleteMins give us “l” and “n” as T1 and T2, so we build the following tree:
{l} 12
{n} 12
{l,n} 24
Now we insert the new tree into the priority queue.
Helen Cameron Huffman Coding 26/45
Huffman’s Algorithm: Looping
{r} 6
{.} 2
{$} 4
{.,$} 6
{r,.,$} 12
{t} 8
{e} 9
{t,e} 17
{a} 18
{d} 5
{s} 5
{d,s} 10
{i} 11
{d,s,i} 21
{l} 12
{n} 12
{l,n} 24
{t} 27
Again, there’s more than one tree in the priority queue, so we loop again.
Helen Cameron Huffman Coding 27/45
Huffman’s Algorithm: Looping
The seventh two deleteMins give us “{r,.,$}” and “{t,e}” as T1 and T2, so we build the following tree:
{r} 6
{.} 2
{$} 4
{.,$} 6
{r,.,$} 12
{t} 8
{e} 9
{t,e} 17
{r,.,$,t,e} 29
Now we insert the new tree into the priority queue.
Helen Cameron Huffman Coding 28/45
Huffman’s Algorithm: Looping
{a} 18
{d} 5
{s} 5
{d,s} 10
{i} 11
{d,s,i} 21
{l} 12
{n} 12
{l,n} 24
{t} 27
{r} 6
{.} 2
{$} 4
{.,$} 6
{r,.,$} 12
{t} 8
{e} 9
{t,e} 17
{r,.,$,t,e} 29
Again, there’s more than one tree in the priority queue, so we loop again.
Helen Cameron Huffman Coding 29/45
Huffman’s Algorithm: Looping
The eighth two deleteMins give us “{a}” and “{d,s,i}” as T1 and T2, so we build the following tree:
{a} 18
{d} 5
{s} 5
{d,s} 10
{i} 11
{d,s,i} 21
{a,d,s,i} 39
Now we insert the new tree into the priority queue.
Helen Cameron Huffman Coding 30/45
Huffman’s Algorithm: Looping
{l} 12
{n} 12
{l,n} 24
{t} 27
{r} 6
{.} 2
{$} 4
{.,$} 6
{r,.,$} 12
{t} 8
{e} 9
{t,e} 17
{r,.,$,t,e} 29
{a} 18
{d} 5
{s} 5
{d,s} 10
{i} 11
{d,s,i} 21
{a,d,s,i} 39
Again, there’s more than one tree in the priority queue, so we loop again.
Helen Cameron Huffman Coding 31/45
Huffman’s Algorithm: Looping
The ninth two deleteMins give us “{l,n}” and “{t}” as T1 and T2, so we build the following tree:
{l} 12
{n} 12
{l,n} 24
{t} 27
{l,n,t} 51
Now we insert the new tree into the priority queue.
Helen Cameron Huffman Coding 32/45
Huffman’s Algorithm: Looping
{r} 6
{.} 2
{$} 4
{.,$} 6
{r,.,$} 12
{t} 8
{e} 9
{t,e} 17
{r,.,$,t,e} 29
{a} 18
{d} 5
{s} 5
{d,s} 10
{i} 11
{d,s,i} 21
{a,d,s,i} 39
{l} 12
{n} 12
{l,n} 24
{t} 27
{l,n,t} 51
Again, there’s more than one tree in the priority queue, so we loop again.
Helen Cameron Huffman Coding 33/45
Huffman’s Algorithm: Looping
The tenth two deleteMins give us “{r,.,$,t,e}” and “{a,d,s,i}” as T1 and T2, so we build the following tree:
{r} 6
{.} 2
{$} 4
{.,$} 6
{r,.,$} 12
{t} 8
{e} 9
{t,e} 17
{r,.,$,t,e} 29
{a} 18
{d} 5
{s} 5
{d,s} 10
{i} 11
{d,s,i} 21
{a,d,s,i} 39
{r,.,$,t,e,a,d,s,i} 68
Helen Cameron Huffman Coding 34/45
Huffman’s Algorithm: Looping
{l} 12
{n} 12
{l,n} 24
{t} 27
{l,n,t} 51
{r} 6
{.} 2
{$} 4
{.,$} 6
{r,.,$} 12
{t} 8
{e} 9
{t,e} 17
{r,.,$,t,e} 29
{a} 18
{d} 5
{s} 5
{d,s} 10
{i} 11
{d,s,i} 21
{a,d,s,i} 39
{r,.,$,t,e,a,d,s,i} 68
Again, there’s more than one tree in the priority queue, so we loop again.
Helen Cameron Huffman Coding 35/45
Huffman’s Algorithm: Looping
The final two deleteMins give us “{l,n,t}” and “{r,.,$,t,e,a,d,s,i}” as T1 and T2, so we build the following tree:
{l} 12
{n} 12
{l,n} 24
{t} 27
{l,n,t} 51
{r} 6
{.} 2
{$} 4
{.,$} 6
{r,.,$} 12
{t} 8
{e} 9
{t,e} 17
{r,.,$,t,e} 29
{a} 18
{d} 5
{s} 5
{d,s} 10
{i} 11
{d,s,i} 21
{a,d,s,i} 39
{r,.,$,t,e,a,d,s,i} 68
{l,n,t,r,.,$,t,e,a,d,s,i} 119
Helen Cameron Huffman Coding 36/45
Huffman’s Algorithm: Looping
Now there’s only one tree in the priority queue, so that tree is the Huffman code for the characters and their frequencies.
Now traverse the tree and build a table of letters and their codes, which is then used to encode the text.
Helen Cameron Huffman Coding 37/45
Encoding Text Using the Huffman Tree
To find the code for a character x:
Start at the root with an empty code for x, and move down to the leaves.
At each interior node, move to the child whose character set contains x.
If you move to a left child, add a 0-bit onto the end of the code for x.
If you move to a right child, add a 1-bit onto the end of the code for x.
Go through the text from left to right, doing this coding for each character in the text.
Helen Cameron Huffman Coding 38/45
Example: Encoding with a Huffman Tree
Encode “seen tee” with the following tree:
{n} 1
{s} 1
{n,s} 2
{t} 3
{t} 4
{e} 4
{n,s,t} 5
{t,e} 8
{n,s,t,t,e} 13
Helen Cameron Huffman Coding 39/45
Example: Encoding with a Huffman Tree
Encode “seen tee” with the following tree:
{n} 1
{s} 1
{n,s} 2
{t} 3
{t} 4
{e} 4
{n,s,t} 5
{t,e} 8
{n,s,t,t,e} 13
Answer: 001111100001101111
001︸︷︷︸ s
11︸︷︷︸ e
11︸︷︷︸ e
000︸︷︷︸ n
01︸︷︷︸ t
10︸︷︷︸ t
11︸︷︷︸ e
11︸︷︷︸ e
Helen Cameron Huffman Coding 40/45
Decoding an Encoded Text Using the Huffman Tree
The encoded text is simply a long bit-string. Go through the bit-string from left to right, performing the following steps:
Start at the root.
For each 0-bit, move to the left child of the current node.
For each 1-bit, move to the right child of the current node.
When you reach a leaf, output the character stored in the leaf (it’s the next character of the decoded text). Then return to the root, and perform the same process for the next bits of the encoded text.
Helen Cameron Huffman Coding 41/45
Example: Decoding Using the Huffman Tree
Decode “1011000010001110001” with the following tree:
{n} 1
{s} 1
{n,s} 2
{t} 3
{t} 4
{e} 4
{n,s,t} 5
{t,e} 8
{n,s,t,t,e} 13
Helen Cameron Huffman Coding 42/45
Example: Decoding Using the Huffman Tree
Decode “1011000010001110001” with the following tree:
{n} 1
{s} 1
{n,s} 2
{t} 3
{t} 4
{e} 4
{n,s,t} 5
{t,e} 8
{n,s,t,t,e} 13
Answer: “ten nets”.
Helen Cameron Huffman Coding 43/45
Huffman Practice
Give the Huffman tree and then encode the following text:
“a stitch in time saves nine”
Helen Cameron Huffman Coding 44/45
Huffman Practice: Frequencies
The frequencies of characters in our text
“a stitch in time saves nine”
Char Freq Char Freq Char Freq
a 2 i 4 t 3 c 1 m 1 v 1 e 3 n 3 t 5 h 1 s 3
Helen Cameron Huffman Coding 45/45