a5 programm22

profilesehj
09z2HeapsAndHuffman.pdf

Priority Queues and Huffman Coding

Helen Cameron

Comp 2140

Helen Cameron Huffman Coding 1/45

Huffman Encoding: An Application

Huffman encoding is used to compress files.

Helen Cameron Huffman Coding 2/45

Huffman Encoding: An Application

Most web pages use UTF-8 for character encoding.

UTF-8 uses 8 bits to encode each character — the character encodings are all the same length.

However, a file probably doesn’t use all 28 = 256 possible characters.

Helen Cameron Huffman Coding 3/45

Huffman Encoding: An Application

How Huffman encoding compresses a file:

The characters in the file are encoded with fewer bits (and different characters may be encoded with a different number of bits), so that the file takes up less space.

Idea: Encode the characters used most frequently with the fewest bits.

Helen Cameron Huffman Coding 4/45

Ambiguous and Unambiguous Codes

Suppose we have the following codes:

Letter Code a 1

b 0

c 101

Suppose we see an encoded string “101”. Does that represent “aba” or “c”?

The code is ambiguous because the code for “a” is a prefix of the code for “c” — when we see “1” in an encoded string, we can’t know whether that represents “a” or the start of “c”.

Helen Cameron Huffman Coding 5/45

Ambiguous and Unambiguous Codes

We want a code that has the prefix property: no character code is the prefix of any other character code.

Example:

Letter Code a 00

b 1

c 01

What does “011100” represent?

A code with the prefix property is unambiguous: an encoded string has only one interpretation.

Helen Cameron Huffman Coding 6/45

Huffman Encoding

A binary tree can represent an encoding:

The characters to be encoded are stored in the leaves.

The code for a character is given by the path from the root to the leaf containing the character.

Moving to a left child represents “0”.

Moving to a right child represents “1”.

Helen Cameron Huffman Coding 7/45

Example Huffman Tree

{a} {c}

{a,c} {b}

{a,b,c}

Letter Code a 00

b 1

c 01

Helen Cameron Huffman Coding 8/45

Huffman Tree

Each node represents a set of characters and the sum of the frequencies with which the characters in the set appear in the text we want to compress.

A leaf contains a one-character set and the frequency of that character in the text.

An internal node always has two children and contains the union of the sets of characters of its two children.

Helen Cameron Huffman Coding 9/45

A More Detailed Example

{f} 1

{s} 1

{f,s} 2

{c} 3

{b} 4

{u} 4

{f,s,c} 5

{b,u} 8

{f,s,c,b,u} 13

Helen Cameron Huffman Coding 10/45

Huffman Encoding

The leaves in a Huffman tree can be stored at different levels.

The leaves at different levels have codes of different lengths.

Huffman’s algorithm generates a Huffman tree where the most frequent letters are stored closest to the root (have short codes) and the least frequently used letters are stored farthest from the root (have the longest codes).

Helen Cameron Huffman Coding 11/45

Letter Frequencies

Suppose we have a text with the following characters and frequencies:

Char Freq Char Freq Char Freq

a 18 l 12 t 8 d 5 n 12 . 2 e 9 r 6 t 27 8 11 s 5 $ 4

where t represents a blank and $ represents a new line character.

Helen Cameron Huffman Coding 12/45

Huffman’s Algorithm: First Step

For each character-frequency pair, create a one-node tree (i.e., a leaf containing the character and its frequency).

{a} 18

{d} 5

{e} 9

{i} 11

{l} 12

{n} 12

{r} 6

{s} 5

{t} 8

{.} 2

{t} 27

{$} 4

Helen Cameron Huffman Coding 13/45

Huffman’s Algorithm: Second Step

Now place the one-node trees in a priority queue, where the priority of a tree is the frequency recorded in its root node. In this case, the smaller the frequency the higher the priority — the lowest frequencies come out first. (Use a min heap, not a max heap.)

{.} 2

{$} 4

{d} 5

{s} 5

{r} 6

{t} 8

{e} 9

{i} 11

{l} 12

{n} 12

{a} 18

{t} 27

(I’m just using an ordered list as a priority queue to simplify the diagrams — but you should use a min heap.)

Helen Cameron Huffman Coding 14/45

Huffman’s Algorithm: The Loop

Perform the following steps:

DeleteMin twice to get the two trees T1 and T2 with the lowest frequencies in their roots.

Create a new root and make T1 and T2 the children of the new root.

The frequency to put in the new root is the sum of the frequencies in the roots of T1 and T2.

The set of characters in the new root is the union of the sets of characters in the roots of T1 and T2.

Insert the new tree into the priority queue.

Repeat until there’s only one tree in the priority queue. The remaining tree is the Huffman encoding of the characters.

Helen Cameron Huffman Coding 15/45

Huffman’s Algorithm: Looping

The first two deleteMins give us “.” and “$” as T1 and T2, so we build the following tree:

{.} 2

{$} 4

{.,$} 6

Now we insert the new tree into the priority queue.

Helen Cameron Huffman Coding 16/45

Huffman’s Algorithm: Looping

{d} 5

{s} 5

{r} 6

{.} 2

{$} 4

{.,$} 6

{t} 8

{e} 9

{i} 11

{l} 12

{n} 12

{a} 18

{t} 27

There’s still more than one tree in the priority queue, so we loop again.

Helen Cameron Huffman Coding 17/45

Huffman’s Algorithm: Looping

The second two deleteMins give us “d” and “s” as T1 and T2, so we build the following tree:

{d} 5

{s} 5

{d,s} 10

Now we insert the new tree into the priority queue.

Helen Cameron Huffman Coding 18/45

Huffman’s Algorithm: Looping

{r} 6

{.} 2

{$} 4

{.,$} 6

{t} 8

{e} 9

{d} 5

{s} 5

{d,s} 10

{i} 11

{l} 12

{n} 12

{a} 18

{t} 27

Again, there’s more than one tree in the priority queue, so we loop again.

Helen Cameron Huffman Coding 19/45

Huffman’s Algorithm: Looping

The third two deleteMins give us “r” and “{.,$}” as T1 and T2, so we build the following tree:

{r} 6

{.} 2

{$} 4

{.,$} 6

{r,.,$} 12

Now we insert the new tree into the priority queue.

Helen Cameron Huffman Coding 20/45

Huffman’s Algorithm: Looping

{t} 8

{e} 9

{d} 5

{s} 5

{d,s} 10

{i} 11

{l} 12

{n} 12

{r} 6

{.} 2

{$} 4

{.,$} 6

{r,.,$} 12

{a} 18

{t} 27

Again, there’s more than one tree in the priority queue, so we loop again.

Helen Cameron Huffman Coding 21/45

Huffman’s Algorithm: Looping

The fourth two deleteMins give us “t” and “e” as T1 and T2, so we build the following tree:

{t} 8

{e} 9

{t,e} 17

Now we insert the new tree into the priority queue.

Helen Cameron Huffman Coding 22/45

Huffman’s Algorithm: Looping

{d} 5

{s} 5

{d,s} 10

{i} 11

{l} 12

{n} 12

{r} 6

{.} 2

{$} 4

{.,$} 6

{r,.,$} 12

{t} 8

{e} 9

{t,e} 17

{a} 18

{t} 27

Again, there’s more than one tree in the priority queue, so we loop again.

Helen Cameron Huffman Coding 23/45

Huffman’s Algorithm: Looping

The fifth two deleteMins give us “{d,s}” and “{i}” as T1 and T2, so we build the following tree:

{d} 5

{s} 5

{d,s} 10

{i} 11

{d,s,i} 21

Now we insert the new tree into the priority queue.

Helen Cameron Huffman Coding 24/45

Huffman’s Algorithm: Looping

{l} 12

{n} 12

{r} 6

{.} 2

{$} 4

{.,$} 6

{r,.,$} 12

{t} 8

{e} 9

{t,e} 17

{a} 18

{d} 5

{s} 5

{d,s} 10

{i} 11

{d,s,i} 21

{t} 27

Again, there’s more than one tree in the priority queue, so we loop again.

Helen Cameron Huffman Coding 25/45

Huffman’s Algorithm: Looping

The sixth two deleteMins give us “l” and “n” as T1 and T2, so we build the following tree:

{l} 12

{n} 12

{l,n} 24

Now we insert the new tree into the priority queue.

Helen Cameron Huffman Coding 26/45

Huffman’s Algorithm: Looping

{r} 6

{.} 2

{$} 4

{.,$} 6

{r,.,$} 12

{t} 8

{e} 9

{t,e} 17

{a} 18

{d} 5

{s} 5

{d,s} 10

{i} 11

{d,s,i} 21

{l} 12

{n} 12

{l,n} 24

{t} 27

Again, there’s more than one tree in the priority queue, so we loop again.

Helen Cameron Huffman Coding 27/45

Huffman’s Algorithm: Looping

The seventh two deleteMins give us “{r,.,$}” and “{t,e}” as T1 and T2, so we build the following tree:

{r} 6

{.} 2

{$} 4

{.,$} 6

{r,.,$} 12

{t} 8

{e} 9

{t,e} 17

{r,.,$,t,e} 29

Now we insert the new tree into the priority queue.

Helen Cameron Huffman Coding 28/45

Huffman’s Algorithm: Looping

{a} 18

{d} 5

{s} 5

{d,s} 10

{i} 11

{d,s,i} 21

{l} 12

{n} 12

{l,n} 24

{t} 27

{r} 6

{.} 2

{$} 4

{.,$} 6

{r,.,$} 12

{t} 8

{e} 9

{t,e} 17

{r,.,$,t,e} 29

Again, there’s more than one tree in the priority queue, so we loop again.

Helen Cameron Huffman Coding 29/45

Huffman’s Algorithm: Looping

The eighth two deleteMins give us “{a}” and “{d,s,i}” as T1 and T2, so we build the following tree:

{a} 18

{d} 5

{s} 5

{d,s} 10

{i} 11

{d,s,i} 21

{a,d,s,i} 39

Now we insert the new tree into the priority queue.

Helen Cameron Huffman Coding 30/45

Huffman’s Algorithm: Looping

{l} 12

{n} 12

{l,n} 24

{t} 27

{r} 6

{.} 2

{$} 4

{.,$} 6

{r,.,$} 12

{t} 8

{e} 9

{t,e} 17

{r,.,$,t,e} 29

{a} 18

{d} 5

{s} 5

{d,s} 10

{i} 11

{d,s,i} 21

{a,d,s,i} 39

Again, there’s more than one tree in the priority queue, so we loop again.

Helen Cameron Huffman Coding 31/45

Huffman’s Algorithm: Looping

The ninth two deleteMins give us “{l,n}” and “{t}” as T1 and T2, so we build the following tree:

{l} 12

{n} 12

{l,n} 24

{t} 27

{l,n,t} 51

Now we insert the new tree into the priority queue.

Helen Cameron Huffman Coding 32/45

Huffman’s Algorithm: Looping

{r} 6

{.} 2

{$} 4

{.,$} 6

{r,.,$} 12

{t} 8

{e} 9

{t,e} 17

{r,.,$,t,e} 29

{a} 18

{d} 5

{s} 5

{d,s} 10

{i} 11

{d,s,i} 21

{a,d,s,i} 39

{l} 12

{n} 12

{l,n} 24

{t} 27

{l,n,t} 51

Again, there’s more than one tree in the priority queue, so we loop again.

Helen Cameron Huffman Coding 33/45

Huffman’s Algorithm: Looping

The tenth two deleteMins give us “{r,.,$,t,e}” and “{a,d,s,i}” as T1 and T2, so we build the following tree:

{r} 6

{.} 2

{$} 4

{.,$} 6

{r,.,$} 12

{t} 8

{e} 9

{t,e} 17

{r,.,$,t,e} 29

{a} 18

{d} 5

{s} 5

{d,s} 10

{i} 11

{d,s,i} 21

{a,d,s,i} 39

{r,.,$,t,e,a,d,s,i} 68

Helen Cameron Huffman Coding 34/45

Huffman’s Algorithm: Looping

{l} 12

{n} 12

{l,n} 24

{t} 27

{l,n,t} 51

{r} 6

{.} 2

{$} 4

{.,$} 6

{r,.,$} 12

{t} 8

{e} 9

{t,e} 17

{r,.,$,t,e} 29

{a} 18

{d} 5

{s} 5

{d,s} 10

{i} 11

{d,s,i} 21

{a,d,s,i} 39

{r,.,$,t,e,a,d,s,i} 68

Again, there’s more than one tree in the priority queue, so we loop again.

Helen Cameron Huffman Coding 35/45

Huffman’s Algorithm: Looping

The final two deleteMins give us “{l,n,t}” and “{r,.,$,t,e,a,d,s,i}” as T1 and T2, so we build the following tree:

{l} 12

{n} 12

{l,n} 24

{t} 27

{l,n,t} 51

{r} 6

{.} 2

{$} 4

{.,$} 6

{r,.,$} 12

{t} 8

{e} 9

{t,e} 17

{r,.,$,t,e} 29

{a} 18

{d} 5

{s} 5

{d,s} 10

{i} 11

{d,s,i} 21

{a,d,s,i} 39

{r,.,$,t,e,a,d,s,i} 68

{l,n,t,r,.,$,t,e,a,d,s,i} 119

Helen Cameron Huffman Coding 36/45

Huffman’s Algorithm: Looping

Now there’s only one tree in the priority queue, so that tree is the Huffman code for the characters and their frequencies.

Now traverse the tree and build a table of letters and their codes, which is then used to encode the text.

Helen Cameron Huffman Coding 37/45

Encoding Text Using the Huffman Tree

To find the code for a character x:

Start at the root with an empty code for x, and move down to the leaves.

At each interior node, move to the child whose character set contains x.

If you move to a left child, add a 0-bit onto the end of the code for x.

If you move to a right child, add a 1-bit onto the end of the code for x.

Go through the text from left to right, doing this coding for each character in the text.

Helen Cameron Huffman Coding 38/45

Example: Encoding with a Huffman Tree

Encode “seen tee” with the following tree:

{n} 1

{s} 1

{n,s} 2

{t} 3

{t} 4

{e} 4

{n,s,t} 5

{t,e} 8

{n,s,t,t,e} 13

Helen Cameron Huffman Coding 39/45

Example: Encoding with a Huffman Tree

Encode “seen tee” with the following tree:

{n} 1

{s} 1

{n,s} 2

{t} 3

{t} 4

{e} 4

{n,s,t} 5

{t,e} 8

{n,s,t,t,e} 13

Answer: 001111100001101111

001︸︷︷︸ s

11︸︷︷︸ e

11︸︷︷︸ e

000︸︷︷︸ n

01︸︷︷︸ t

10︸︷︷︸ t

11︸︷︷︸ e

11︸︷︷︸ e

Helen Cameron Huffman Coding 40/45

Decoding an Encoded Text Using the Huffman Tree

The encoded text is simply a long bit-string. Go through the bit-string from left to right, performing the following steps:

Start at the root.

For each 0-bit, move to the left child of the current node.

For each 1-bit, move to the right child of the current node.

When you reach a leaf, output the character stored in the leaf (it’s the next character of the decoded text). Then return to the root, and perform the same process for the next bits of the encoded text.

Helen Cameron Huffman Coding 41/45

Example: Decoding Using the Huffman Tree

Decode “1011000010001110001” with the following tree:

{n} 1

{s} 1

{n,s} 2

{t} 3

{t} 4

{e} 4

{n,s,t} 5

{t,e} 8

{n,s,t,t,e} 13

Helen Cameron Huffman Coding 42/45

Example: Decoding Using the Huffman Tree

Decode “1011000010001110001” with the following tree:

{n} 1

{s} 1

{n,s} 2

{t} 3

{t} 4

{e} 4

{n,s,t} 5

{t,e} 8

{n,s,t,t,e} 13

Answer: “ten nets”.

Helen Cameron Huffman Coding 43/45

Huffman Practice

Give the Huffman tree and then encode the following text:

“a stitch in time saves nine”

Helen Cameron Huffman Coding 44/45

Huffman Practice: Frequencies

The frequencies of characters in our text

“a stitch in time saves nine”

Char Freq Char Freq Char Freq

a 2 i 4 t 3 c 1 m 1 v 1 e 3 n 3 t 5 h 1 s 3

Helen Cameron Huffman Coding 45/45