Data Compression-Introduction to Data Compression Notes | EduRev

: Data Compression-Introduction to Data Compression Notes | EduRev

 Page 1


1
CSEP 590 
Data Compression
Autumn 2007
Adaptive Huffman Coding
CSEP 590 - Lecture 2 - Autumn 2007 2
Adaptive Huffman Coding
• One pass
• During the pass calculate the frequencies
• Update the Huffman tree accordingly
– Coder – new Huffman tree computed after transmitting the 
symbol
– Decoder – new Huffman tree computed after receiving the 
symbol
• Symbol set and their initial codes must be known 
ahead of time.  
• Need NYT (not yet transmitted symbol) to indicate a 
new leaf is needed in the tree.
CSEP 590 - Lecture 2 - Autumn 2007 3
Optimal Tree Numbering
• a : 5, b: 2, c : 1, d : 3
a
c b
d
CSEP 590 - Lecture 2 - Autumn 2007 4
Weight the Nodes
• a : 5, b: 2, c : 1, d : 3
11
6
3
1 2
5
3
a
c b
d
CSEP 590 - Lecture 2 - Autumn 2007 5
Number the Nodes
• a : 5, b: 2, c : 1, d : 3
11
6
3
1 2
5
3
a
c b
d
Number the nodes as they are removed from the 
priority queue.
2
7
5
6
3
4
1
CSEP 590 - Lecture 2 - Autumn 2007 6
Adaptive Huffman Principle
• In an optimal tree for n symbols there is a 
numbering of the nodes y
1
<y
2
<... <y
2n-1
such 
that their corresponding weights x
1
,x
2
, ... , x
2n-1
satisfy:
– x
1
< x
2
< ... < x
2n-1
– siblings are numbered consecutively
• And vice versa
– That is, if there is such a numbering then the tree is 
optimal.  We call this the node number invariant.
Page 2


1
CSEP 590 
Data Compression
Autumn 2007
Adaptive Huffman Coding
CSEP 590 - Lecture 2 - Autumn 2007 2
Adaptive Huffman Coding
• One pass
• During the pass calculate the frequencies
• Update the Huffman tree accordingly
– Coder – new Huffman tree computed after transmitting the 
symbol
– Decoder – new Huffman tree computed after receiving the 
symbol
• Symbol set and their initial codes must be known 
ahead of time.  
• Need NYT (not yet transmitted symbol) to indicate a 
new leaf is needed in the tree.
CSEP 590 - Lecture 2 - Autumn 2007 3
Optimal Tree Numbering
• a : 5, b: 2, c : 1, d : 3
a
c b
d
CSEP 590 - Lecture 2 - Autumn 2007 4
Weight the Nodes
• a : 5, b: 2, c : 1, d : 3
11
6
3
1 2
5
3
a
c b
d
CSEP 590 - Lecture 2 - Autumn 2007 5
Number the Nodes
• a : 5, b: 2, c : 1, d : 3
11
6
3
1 2
5
3
a
c b
d
Number the nodes as they are removed from the 
priority queue.
2
7
5
6
3
4
1
CSEP 590 - Lecture 2 - Autumn 2007 6
Adaptive Huffman Principle
• In an optimal tree for n symbols there is a 
numbering of the nodes y
1
<y
2
<... <y
2n-1
such 
that their corresponding weights x
1
,x
2
, ... , x
2n-1
satisfy:
– x
1
< x
2
< ... < x
2n-1
– siblings are numbered consecutively
• And vice versa
– That is, if there is such a numbering then the tree is 
optimal.  We call this the node number invariant.
2
CSEP 590 - Lecture 2 - Autumn 2007 7
Initialization
• Symbols a
1
, a
2
, ... ,a
m 
have a basic prefix code, 
used when symbols are first encountered.
• Example: a, b ,c, d, e, f, g, h, i, j
0
0
0
0
0
0
0
0
0
1 1
1
1
1
1
1
1
1
a
h g
f e d c b
j i
CSEP 590 - Lecture 2 - Autumn 2007 8
Initialization
• The tree will encode up to m + 1 symbols 
including NYT.
• We reserve numbers 1 to 2m + 1 for node 
numbering.
• The initial Huffman tree consists of a single 
node
0
2m + 1
weight
node number
NYT
CSEP 590 - Lecture 2 - Autumn 2007 9
Coding Algorithm
1. If a new symbol is encountered then output the 
code for NYT followed by the fixed code for the 
symbol.  Add the new symbol to the tree.
2. If an old symbol is encountered then output its 
code.
3. Update the tree to preserve the node number 
invariant.
CSEP 590 - Lecture 2 - Autumn 2007 10
Decoding Algorithm
1. Decode the symbol using the current tree.
2. If NYT is encountered then use the fixed code to 
decode the symbol.  Add the new symbol to the 
tree.
3. Update the tree to preserve the node number 
invariant.
CSEP 590 - Lecture 2 - Autumn 2007 11
Updating the Tree
1. Let y be leaf (symbol) with current weight x.*
2. If y the root update x by 1, otherwise,
3. Exchange y with the largest numbered node with 
the same weight (unless it is the parent).**
4. Update x by 1
5. Let y be the parent with its weight x and go to 2. 
*We never update the weight of NYT
** This exchange will preserve the node number invariant
CSEP 590 - Lecture 2 - Autumn 2007 12
Example
• aabcdad in alphabet {a,b,..., j}
0 21
output = 000
fixed code
NYT
fixed code for a
Page 3


1
CSEP 590 
Data Compression
Autumn 2007
Adaptive Huffman Coding
CSEP 590 - Lecture 2 - Autumn 2007 2
Adaptive Huffman Coding
• One pass
• During the pass calculate the frequencies
• Update the Huffman tree accordingly
– Coder – new Huffman tree computed after transmitting the 
symbol
– Decoder – new Huffman tree computed after receiving the 
symbol
• Symbol set and their initial codes must be known 
ahead of time.  
• Need NYT (not yet transmitted symbol) to indicate a 
new leaf is needed in the tree.
CSEP 590 - Lecture 2 - Autumn 2007 3
Optimal Tree Numbering
• a : 5, b: 2, c : 1, d : 3
a
c b
d
CSEP 590 - Lecture 2 - Autumn 2007 4
Weight the Nodes
• a : 5, b: 2, c : 1, d : 3
11
6
3
1 2
5
3
a
c b
d
CSEP 590 - Lecture 2 - Autumn 2007 5
Number the Nodes
• a : 5, b: 2, c : 1, d : 3
11
6
3
1 2
5
3
a
c b
d
Number the nodes as they are removed from the 
priority queue.
2
7
5
6
3
4
1
CSEP 590 - Lecture 2 - Autumn 2007 6
Adaptive Huffman Principle
• In an optimal tree for n symbols there is a 
numbering of the nodes y
1
<y
2
<... <y
2n-1
such 
that their corresponding weights x
1
,x
2
, ... , x
2n-1
satisfy:
– x
1
< x
2
< ... < x
2n-1
– siblings are numbered consecutively
• And vice versa
– That is, if there is such a numbering then the tree is 
optimal.  We call this the node number invariant.
2
CSEP 590 - Lecture 2 - Autumn 2007 7
Initialization
• Symbols a
1
, a
2
, ... ,a
m 
have a basic prefix code, 
used when symbols are first encountered.
• Example: a, b ,c, d, e, f, g, h, i, j
0
0
0
0
0
0
0
0
0
1 1
1
1
1
1
1
1
1
a
h g
f e d c b
j i
CSEP 590 - Lecture 2 - Autumn 2007 8
Initialization
• The tree will encode up to m + 1 symbols 
including NYT.
• We reserve numbers 1 to 2m + 1 for node 
numbering.
• The initial Huffman tree consists of a single 
node
0
2m + 1
weight
node number
NYT
CSEP 590 - Lecture 2 - Autumn 2007 9
Coding Algorithm
1. If a new symbol is encountered then output the 
code for NYT followed by the fixed code for the 
symbol.  Add the new symbol to the tree.
2. If an old symbol is encountered then output its 
code.
3. Update the tree to preserve the node number 
invariant.
CSEP 590 - Lecture 2 - Autumn 2007 10
Decoding Algorithm
1. Decode the symbol using the current tree.
2. If NYT is encountered then use the fixed code to 
decode the symbol.  Add the new symbol to the 
tree.
3. Update the tree to preserve the node number 
invariant.
CSEP 590 - Lecture 2 - Autumn 2007 11
Updating the Tree
1. Let y be leaf (symbol) with current weight x.*
2. If y the root update x by 1, otherwise,
3. Exchange y with the largest numbered node with 
the same weight (unless it is the parent).**
4. Update x by 1
5. Let y be the parent with its weight x and go to 2. 
*We never update the weight of NYT
** This exchange will preserve the node number invariant
CSEP 590 - Lecture 2 - Autumn 2007 12
Example
• aabcdad in alphabet {a,b,..., j}
0 21
output = 000
fixed code
NYT
fixed code for a
3
CSEP 590 - Lecture 2 - Autumn 2007 13
Example
• aabcdad
0
19
output = 000 
1
1
20
0 1
21
a NYT
CSEP 590 - Lecture 2 - Autumn 2007 14
Example
• aabcdad
0
output = 0001
1
1
0 1
21
a NYT
19
20
CSEP 590 - Lecture 2 - Autumn 2007 15
Example
• aabcdad
0
output = 0001 
2
1
0 1
21
a NYT
19
20
CSEP 590 - Lecture 2 - Autumn 2007 16
Example
• aabcdad
0
output = 00010001
2
2
0 1
21
NYT
fixed code for b
a NYT
19
20
CSEP 590 - Lecture 2 - Autumn 2007 17
Example
• aabcdad
output = 00010001 
2
2
0 1
21
a
20
0
19
a
0
17 0 18
0 1
b NYT
CSEP 590 - Lecture 2 - Autumn 2007 18
Example
• aabcdad
output = 00010001 
2
2
0 1
21
a
20
0
19
a
0
17 1 18
0 1
b NYT
Page 4


1
CSEP 590 
Data Compression
Autumn 2007
Adaptive Huffman Coding
CSEP 590 - Lecture 2 - Autumn 2007 2
Adaptive Huffman Coding
• One pass
• During the pass calculate the frequencies
• Update the Huffman tree accordingly
– Coder – new Huffman tree computed after transmitting the 
symbol
– Decoder – new Huffman tree computed after receiving the 
symbol
• Symbol set and their initial codes must be known 
ahead of time.  
• Need NYT (not yet transmitted symbol) to indicate a 
new leaf is needed in the tree.
CSEP 590 - Lecture 2 - Autumn 2007 3
Optimal Tree Numbering
• a : 5, b: 2, c : 1, d : 3
a
c b
d
CSEP 590 - Lecture 2 - Autumn 2007 4
Weight the Nodes
• a : 5, b: 2, c : 1, d : 3
11
6
3
1 2
5
3
a
c b
d
CSEP 590 - Lecture 2 - Autumn 2007 5
Number the Nodes
• a : 5, b: 2, c : 1, d : 3
11
6
3
1 2
5
3
a
c b
d
Number the nodes as they are removed from the 
priority queue.
2
7
5
6
3
4
1
CSEP 590 - Lecture 2 - Autumn 2007 6
Adaptive Huffman Principle
• In an optimal tree for n symbols there is a 
numbering of the nodes y
1
<y
2
<... <y
2n-1
such 
that their corresponding weights x
1
,x
2
, ... , x
2n-1
satisfy:
– x
1
< x
2
< ... < x
2n-1
– siblings are numbered consecutively
• And vice versa
– That is, if there is such a numbering then the tree is 
optimal.  We call this the node number invariant.
2
CSEP 590 - Lecture 2 - Autumn 2007 7
Initialization
• Symbols a
1
, a
2
, ... ,a
m 
have a basic prefix code, 
used when symbols are first encountered.
• Example: a, b ,c, d, e, f, g, h, i, j
0
0
0
0
0
0
0
0
0
1 1
1
1
1
1
1
1
1
a
h g
f e d c b
j i
CSEP 590 - Lecture 2 - Autumn 2007 8
Initialization
• The tree will encode up to m + 1 symbols 
including NYT.
• We reserve numbers 1 to 2m + 1 for node 
numbering.
• The initial Huffman tree consists of a single 
node
0
2m + 1
weight
node number
NYT
CSEP 590 - Lecture 2 - Autumn 2007 9
Coding Algorithm
1. If a new symbol is encountered then output the 
code for NYT followed by the fixed code for the 
symbol.  Add the new symbol to the tree.
2. If an old symbol is encountered then output its 
code.
3. Update the tree to preserve the node number 
invariant.
CSEP 590 - Lecture 2 - Autumn 2007 10
Decoding Algorithm
1. Decode the symbol using the current tree.
2. If NYT is encountered then use the fixed code to 
decode the symbol.  Add the new symbol to the 
tree.
3. Update the tree to preserve the node number 
invariant.
CSEP 590 - Lecture 2 - Autumn 2007 11
Updating the Tree
1. Let y be leaf (symbol) with current weight x.*
2. If y the root update x by 1, otherwise,
3. Exchange y with the largest numbered node with 
the same weight (unless it is the parent).**
4. Update x by 1
5. Let y be the parent with its weight x and go to 2. 
*We never update the weight of NYT
** This exchange will preserve the node number invariant
CSEP 590 - Lecture 2 - Autumn 2007 12
Example
• aabcdad in alphabet {a,b,..., j}
0 21
output = 000
fixed code
NYT
fixed code for a
3
CSEP 590 - Lecture 2 - Autumn 2007 13
Example
• aabcdad
0
19
output = 000 
1
1
20
0 1
21
a NYT
CSEP 590 - Lecture 2 - Autumn 2007 14
Example
• aabcdad
0
output = 0001
1
1
0 1
21
a NYT
19
20
CSEP 590 - Lecture 2 - Autumn 2007 15
Example
• aabcdad
0
output = 0001 
2
1
0 1
21
a NYT
19
20
CSEP 590 - Lecture 2 - Autumn 2007 16
Example
• aabcdad
0
output = 00010001
2
2
0 1
21
NYT
fixed code for b
a NYT
19
20
CSEP 590 - Lecture 2 - Autumn 2007 17
Example
• aabcdad
output = 00010001 
2
2
0 1
21
a
20
0
19
a
0
17 0 18
0 1
b NYT
CSEP 590 - Lecture 2 - Autumn 2007 18
Example
• aabcdad
output = 00010001 
2
2
0 1
21
a
20
0
19
a
0
17 1 18
0 1
b NYT
4
CSEP 590 - Lecture 2 - Autumn 2007 19
Example
• aabcdad
output = 00010001 
2
2
0 1
21
a
20
0
19
a
1
17 1 18
0 1
b NYT
CSEP 590 - Lecture 2 - Autumn 2007 20
Example
• aabcdad
2
3
0 1
21
a
20
0
19
a
1
17 1 18
0 1
b
output = 0001000100010
NYT
fixed code for c
NYT
CSEP 590 - Lecture 2 - Autumn 2007 21
Example
• aabcdad
19
output = 0001000100010 
2
3
20
0 1
21
a
1
17 1 18
0 1
0 a
0
15
0
16
0 1
NYT
a
b
c
CSEP 590 - Lecture 2 - Autumn 2007 22
Example
• aabcdad
19
output = 0001000100010 
2
3
20
0 1
21
a
1
17 1 18
0 1
0 a
0
15
1
16
0 1
NYT
a
b
c
CSEP 590 - Lecture 2 - Autumn 2007 23
Example
• aabcdad
19
output = 0001000100010 
2
3
20
0 1
21
a
1
17 1 18
0 1
0 a
1
15
1
16
0 1
NYT
a
b
c
CSEP 590 - Lecture 2 - Autumn 2007 24
Example
• aabcdad
19
output = 0001000100010 
2
3
20
0 1
21
a
2
17 1 18
0 1
0 a
1
15
1
16
0 1
NYT
a
b
c
Page 5


1
CSEP 590 
Data Compression
Autumn 2007
Adaptive Huffman Coding
CSEP 590 - Lecture 2 - Autumn 2007 2
Adaptive Huffman Coding
• One pass
• During the pass calculate the frequencies
• Update the Huffman tree accordingly
– Coder – new Huffman tree computed after transmitting the 
symbol
– Decoder – new Huffman tree computed after receiving the 
symbol
• Symbol set and their initial codes must be known 
ahead of time.  
• Need NYT (not yet transmitted symbol) to indicate a 
new leaf is needed in the tree.
CSEP 590 - Lecture 2 - Autumn 2007 3
Optimal Tree Numbering
• a : 5, b: 2, c : 1, d : 3
a
c b
d
CSEP 590 - Lecture 2 - Autumn 2007 4
Weight the Nodes
• a : 5, b: 2, c : 1, d : 3
11
6
3
1 2
5
3
a
c b
d
CSEP 590 - Lecture 2 - Autumn 2007 5
Number the Nodes
• a : 5, b: 2, c : 1, d : 3
11
6
3
1 2
5
3
a
c b
d
Number the nodes as they are removed from the 
priority queue.
2
7
5
6
3
4
1
CSEP 590 - Lecture 2 - Autumn 2007 6
Adaptive Huffman Principle
• In an optimal tree for n symbols there is a 
numbering of the nodes y
1
<y
2
<... <y
2n-1
such 
that their corresponding weights x
1
,x
2
, ... , x
2n-1
satisfy:
– x
1
< x
2
< ... < x
2n-1
– siblings are numbered consecutively
• And vice versa
– That is, if there is such a numbering then the tree is 
optimal.  We call this the node number invariant.
2
CSEP 590 - Lecture 2 - Autumn 2007 7
Initialization
• Symbols a
1
, a
2
, ... ,a
m 
have a basic prefix code, 
used when symbols are first encountered.
• Example: a, b ,c, d, e, f, g, h, i, j
0
0
0
0
0
0
0
0
0
1 1
1
1
1
1
1
1
1
a
h g
f e d c b
j i
CSEP 590 - Lecture 2 - Autumn 2007 8
Initialization
• The tree will encode up to m + 1 symbols 
including NYT.
• We reserve numbers 1 to 2m + 1 for node 
numbering.
• The initial Huffman tree consists of a single 
node
0
2m + 1
weight
node number
NYT
CSEP 590 - Lecture 2 - Autumn 2007 9
Coding Algorithm
1. If a new symbol is encountered then output the 
code for NYT followed by the fixed code for the 
symbol.  Add the new symbol to the tree.
2. If an old symbol is encountered then output its 
code.
3. Update the tree to preserve the node number 
invariant.
CSEP 590 - Lecture 2 - Autumn 2007 10
Decoding Algorithm
1. Decode the symbol using the current tree.
2. If NYT is encountered then use the fixed code to 
decode the symbol.  Add the new symbol to the 
tree.
3. Update the tree to preserve the node number 
invariant.
CSEP 590 - Lecture 2 - Autumn 2007 11
Updating the Tree
1. Let y be leaf (symbol) with current weight x.*
2. If y the root update x by 1, otherwise,
3. Exchange y with the largest numbered node with 
the same weight (unless it is the parent).**
4. Update x by 1
5. Let y be the parent with its weight x and go to 2. 
*We never update the weight of NYT
** This exchange will preserve the node number invariant
CSEP 590 - Lecture 2 - Autumn 2007 12
Example
• aabcdad in alphabet {a,b,..., j}
0 21
output = 000
fixed code
NYT
fixed code for a
3
CSEP 590 - Lecture 2 - Autumn 2007 13
Example
• aabcdad
0
19
output = 000 
1
1
20
0 1
21
a NYT
CSEP 590 - Lecture 2 - Autumn 2007 14
Example
• aabcdad
0
output = 0001
1
1
0 1
21
a NYT
19
20
CSEP 590 - Lecture 2 - Autumn 2007 15
Example
• aabcdad
0
output = 0001 
2
1
0 1
21
a NYT
19
20
CSEP 590 - Lecture 2 - Autumn 2007 16
Example
• aabcdad
0
output = 00010001
2
2
0 1
21
NYT
fixed code for b
a NYT
19
20
CSEP 590 - Lecture 2 - Autumn 2007 17
Example
• aabcdad
output = 00010001 
2
2
0 1
21
a
20
0
19
a
0
17 0 18
0 1
b NYT
CSEP 590 - Lecture 2 - Autumn 2007 18
Example
• aabcdad
output = 00010001 
2
2
0 1
21
a
20
0
19
a
0
17 1 18
0 1
b NYT
4
CSEP 590 - Lecture 2 - Autumn 2007 19
Example
• aabcdad
output = 00010001 
2
2
0 1
21
a
20
0
19
a
1
17 1 18
0 1
b NYT
CSEP 590 - Lecture 2 - Autumn 2007 20
Example
• aabcdad
2
3
0 1
21
a
20
0
19
a
1
17 1 18
0 1
b
output = 0001000100010
NYT
fixed code for c
NYT
CSEP 590 - Lecture 2 - Autumn 2007 21
Example
• aabcdad
19
output = 0001000100010 
2
3
20
0 1
21
a
1
17 1 18
0 1
0 a
0
15
0
16
0 1
NYT
a
b
c
CSEP 590 - Lecture 2 - Autumn 2007 22
Example
• aabcdad
19
output = 0001000100010 
2
3
20
0 1
21
a
1
17 1 18
0 1
0 a
0
15
1
16
0 1
NYT
a
b
c
CSEP 590 - Lecture 2 - Autumn 2007 23
Example
• aabcdad
19
output = 0001000100010 
2
3
20
0 1
21
a
1
17 1 18
0 1
0 a
1
15
1
16
0 1
NYT
a
b
c
CSEP 590 - Lecture 2 - Autumn 2007 24
Example
• aabcdad
19
output = 0001000100010 
2
3
20
0 1
21
a
2
17 1 18
0 1
0 a
1
15
1
16
0 1
NYT
a
b
c
5
CSEP 590 - Lecture 2 - Autumn 2007 25
Example
• aabcdad
19 2
4
20
0 1
21
a
2
17 1 18
0 1
0 a
1
15
1
16
0 1
NYT
a
b
c
output = 0001000100010000011
NYT
fixed code for d
CSEP 590 - Lecture 2 - Autumn 2007 26
Example
• aabcdad
19 2
4
20
0 1
21
a
2
17 1 18
0 1
a
1
15
1
16
0 1
NYT
a
b
c
output = 0001000100010000011 
0 a
13
0
14
0 1
d
0
CSEP 590 - Lecture 2 - Autumn 2007 27
Example
• aabcdad
19 2
4
20
0 1
21
a
2
17 1 18
0 1
a
1
15
1
16
0 1
NYT
a
b
c
output = 0001000100010000011 
0 a
13
1
14
0 1
d
0
CSEP 590 - Lecture 2 - Autumn 2007 28
Example
• aabcdad
19 2
4
20
0 1
21
a
2
17 1 18
0 1
a
1
15
1
16
0 1
NYT
a
b
c
output = 0001000100010000011 
0 a
13
1
14
0 1
d
1
exchange!
CSEP 590 - Lecture 2 - Autumn 2007 29
Example
• aabcdad
19 2
4
20
0 1
21
2
17 1 18
0 1
a
1
15
1
16
0 1
NYT
a
b
c
output = 0001000100010000011 
0 a
13
1
14
0 1
d
1
CSEP 590 - Lecture 2 - Autumn 2007 30
Example
• aabcdad
19 2
4
20
0 1
21
2
17 1 18
0 1
a
2
15
1
16
0 1
NYT
a
b
c
output = 0001000100010000011 
0 a
13
1
14
0 1
d
1
exchange!
Read More
Offer running on EduRev: Apply code STAYHOME200 to get INR 200 off on our premium plan EduRev Infinity!