Courses

# Data Compression-Introduction to Data Compression Notes | EduRev

## : Data Compression-Introduction to Data Compression Notes | EduRev

``` Page 1

1
CSEP 590
Data Compression
Autumn 2007
CSEP 590 - Lecture 2 - Autumn 2007 2
• One pass
• During the pass calculate the frequencies
• Update the Huffman tree accordingly
– Coder – new Huffman tree computed after transmitting the
symbol
– Decoder – new Huffman tree computed after receiving the
symbol
• Symbol set and their initial codes must be known
• Need NYT (not yet transmitted symbol) to indicate a
new leaf is needed in the tree.
CSEP 590 - Lecture 2 - Autumn 2007 3
Optimal Tree Numbering
• a : 5, b: 2, c : 1, d : 3
a
c b
d
CSEP 590 - Lecture 2 - Autumn 2007 4
Weight the Nodes
• a : 5, b: 2, c : 1, d : 3
11
6
3
1 2
5
3
a
c b
d
CSEP 590 - Lecture 2 - Autumn 2007 5
Number the Nodes
• a : 5, b: 2, c : 1, d : 3
11
6
3
1 2
5
3
a
c b
d
Number the nodes as they are removed from the
priority queue.
2
7
5
6
3
4
1
CSEP 590 - Lecture 2 - Autumn 2007 6
• In an optimal tree for n symbols there is a
numbering of the nodes y
1
<y
2
<... <y
2n-1
such
that their corresponding weights x
1
,x
2
, ... , x
2n-1
satisfy:
– x
1
< x
2
< ... < x
2n-1
– siblings are numbered consecutively
• And vice versa
– That is, if there is such a numbering then the tree is
optimal.  We call this the node number invariant.
Page 2

1
CSEP 590
Data Compression
Autumn 2007
CSEP 590 - Lecture 2 - Autumn 2007 2
• One pass
• During the pass calculate the frequencies
• Update the Huffman tree accordingly
– Coder – new Huffman tree computed after transmitting the
symbol
– Decoder – new Huffman tree computed after receiving the
symbol
• Symbol set and their initial codes must be known
• Need NYT (not yet transmitted symbol) to indicate a
new leaf is needed in the tree.
CSEP 590 - Lecture 2 - Autumn 2007 3
Optimal Tree Numbering
• a : 5, b: 2, c : 1, d : 3
a
c b
d
CSEP 590 - Lecture 2 - Autumn 2007 4
Weight the Nodes
• a : 5, b: 2, c : 1, d : 3
11
6
3
1 2
5
3
a
c b
d
CSEP 590 - Lecture 2 - Autumn 2007 5
Number the Nodes
• a : 5, b: 2, c : 1, d : 3
11
6
3
1 2
5
3
a
c b
d
Number the nodes as they are removed from the
priority queue.
2
7
5
6
3
4
1
CSEP 590 - Lecture 2 - Autumn 2007 6
• In an optimal tree for n symbols there is a
numbering of the nodes y
1
<y
2
<... <y
2n-1
such
that their corresponding weights x
1
,x
2
, ... , x
2n-1
satisfy:
– x
1
< x
2
< ... < x
2n-1
– siblings are numbered consecutively
• And vice versa
– That is, if there is such a numbering then the tree is
optimal.  We call this the node number invariant.
2
CSEP 590 - Lecture 2 - Autumn 2007 7
Initialization
• Symbols a
1
, a
2
, ... ,a
m
have a basic prefix code,
used when symbols are first encountered.
• Example: a, b ,c, d, e, f, g, h, i, j
0
0
0
0
0
0
0
0
0
1 1
1
1
1
1
1
1
1
a
h g
f e d c b
j i
CSEP 590 - Lecture 2 - Autumn 2007 8
Initialization
• The tree will encode up to m + 1 symbols
including NYT.
• We reserve numbers 1 to 2m + 1 for node
numbering.
• The initial Huffman tree consists of a single
node
0
2m + 1
weight
node number
NYT
CSEP 590 - Lecture 2 - Autumn 2007 9
Coding Algorithm
1. If a new symbol is encountered then output the
code for NYT followed by the fixed code for the
symbol.  Add the new symbol to the tree.
2. If an old symbol is encountered then output its
code.
3. Update the tree to preserve the node number
invariant.
CSEP 590 - Lecture 2 - Autumn 2007 10
Decoding Algorithm
1. Decode the symbol using the current tree.
2. If NYT is encountered then use the fixed code to
decode the symbol.  Add the new symbol to the
tree.
3. Update the tree to preserve the node number
invariant.
CSEP 590 - Lecture 2 - Autumn 2007 11
Updating the Tree
1. Let y be leaf (symbol) with current weight x.*
2. If y the root update x by 1, otherwise,
3. Exchange y with the largest numbered node with
the same weight (unless it is the parent).**
4. Update x by 1
5. Let y be the parent with its weight x and go to 2.
*We never update the weight of NYT
** This exchange will preserve the node number invariant
CSEP 590 - Lecture 2 - Autumn 2007 12
Example
• aabcdad in alphabet {a,b,..., j}
0 21
output = 000
fixed code
NYT
fixed code for a
Page 3

1
CSEP 590
Data Compression
Autumn 2007
CSEP 590 - Lecture 2 - Autumn 2007 2
• One pass
• During the pass calculate the frequencies
• Update the Huffman tree accordingly
– Coder – new Huffman tree computed after transmitting the
symbol
– Decoder – new Huffman tree computed after receiving the
symbol
• Symbol set and their initial codes must be known
• Need NYT (not yet transmitted symbol) to indicate a
new leaf is needed in the tree.
CSEP 590 - Lecture 2 - Autumn 2007 3
Optimal Tree Numbering
• a : 5, b: 2, c : 1, d : 3
a
c b
d
CSEP 590 - Lecture 2 - Autumn 2007 4
Weight the Nodes
• a : 5, b: 2, c : 1, d : 3
11
6
3
1 2
5
3
a
c b
d
CSEP 590 - Lecture 2 - Autumn 2007 5
Number the Nodes
• a : 5, b: 2, c : 1, d : 3
11
6
3
1 2
5
3
a
c b
d
Number the nodes as they are removed from the
priority queue.
2
7
5
6
3
4
1
CSEP 590 - Lecture 2 - Autumn 2007 6
• In an optimal tree for n symbols there is a
numbering of the nodes y
1
<y
2
<... <y
2n-1
such
that their corresponding weights x
1
,x
2
, ... , x
2n-1
satisfy:
– x
1
< x
2
< ... < x
2n-1
– siblings are numbered consecutively
• And vice versa
– That is, if there is such a numbering then the tree is
optimal.  We call this the node number invariant.
2
CSEP 590 - Lecture 2 - Autumn 2007 7
Initialization
• Symbols a
1
, a
2
, ... ,a
m
have a basic prefix code,
used when symbols are first encountered.
• Example: a, b ,c, d, e, f, g, h, i, j
0
0
0
0
0
0
0
0
0
1 1
1
1
1
1
1
1
1
a
h g
f e d c b
j i
CSEP 590 - Lecture 2 - Autumn 2007 8
Initialization
• The tree will encode up to m + 1 symbols
including NYT.
• We reserve numbers 1 to 2m + 1 for node
numbering.
• The initial Huffman tree consists of a single
node
0
2m + 1
weight
node number
NYT
CSEP 590 - Lecture 2 - Autumn 2007 9
Coding Algorithm
1. If a new symbol is encountered then output the
code for NYT followed by the fixed code for the
symbol.  Add the new symbol to the tree.
2. If an old symbol is encountered then output its
code.
3. Update the tree to preserve the node number
invariant.
CSEP 590 - Lecture 2 - Autumn 2007 10
Decoding Algorithm
1. Decode the symbol using the current tree.
2. If NYT is encountered then use the fixed code to
decode the symbol.  Add the new symbol to the
tree.
3. Update the tree to preserve the node number
invariant.
CSEP 590 - Lecture 2 - Autumn 2007 11
Updating the Tree
1. Let y be leaf (symbol) with current weight x.*
2. If y the root update x by 1, otherwise,
3. Exchange y with the largest numbered node with
the same weight (unless it is the parent).**
4. Update x by 1
5. Let y be the parent with its weight x and go to 2.
*We never update the weight of NYT
** This exchange will preserve the node number invariant
CSEP 590 - Lecture 2 - Autumn 2007 12
Example
• aabcdad in alphabet {a,b,..., j}
0 21
output = 000
fixed code
NYT
fixed code for a
3
CSEP 590 - Lecture 2 - Autumn 2007 13
Example
0
19
output = 000
1
1
20
0 1
21
a NYT
CSEP 590 - Lecture 2 - Autumn 2007 14
Example
0
output = 0001
1
1
0 1
21
a NYT
19
20
CSEP 590 - Lecture 2 - Autumn 2007 15
Example
0
output = 0001
2
1
0 1
21
a NYT
19
20
CSEP 590 - Lecture 2 - Autumn 2007 16
Example
0
output = 00010001
2
2
0 1
21
NYT
fixed code for b
a NYT
19
20
CSEP 590 - Lecture 2 - Autumn 2007 17
Example
output = 00010001
2
2
0 1
21
a
20
0
19
a
0
17 0 18
0 1
b NYT
CSEP 590 - Lecture 2 - Autumn 2007 18
Example
output = 00010001
2
2
0 1
21
a
20
0
19
a
0
17 1 18
0 1
b NYT
Page 4

1
CSEP 590
Data Compression
Autumn 2007
CSEP 590 - Lecture 2 - Autumn 2007 2
• One pass
• During the pass calculate the frequencies
• Update the Huffman tree accordingly
– Coder – new Huffman tree computed after transmitting the
symbol
– Decoder – new Huffman tree computed after receiving the
symbol
• Symbol set and their initial codes must be known
• Need NYT (not yet transmitted symbol) to indicate a
new leaf is needed in the tree.
CSEP 590 - Lecture 2 - Autumn 2007 3
Optimal Tree Numbering
• a : 5, b: 2, c : 1, d : 3
a
c b
d
CSEP 590 - Lecture 2 - Autumn 2007 4
Weight the Nodes
• a : 5, b: 2, c : 1, d : 3
11
6
3
1 2
5
3
a
c b
d
CSEP 590 - Lecture 2 - Autumn 2007 5
Number the Nodes
• a : 5, b: 2, c : 1, d : 3
11
6
3
1 2
5
3
a
c b
d
Number the nodes as they are removed from the
priority queue.
2
7
5
6
3
4
1
CSEP 590 - Lecture 2 - Autumn 2007 6
• In an optimal tree for n symbols there is a
numbering of the nodes y
1
<y
2
<... <y
2n-1
such
that their corresponding weights x
1
,x
2
, ... , x
2n-1
satisfy:
– x
1
< x
2
< ... < x
2n-1
– siblings are numbered consecutively
• And vice versa
– That is, if there is such a numbering then the tree is
optimal.  We call this the node number invariant.
2
CSEP 590 - Lecture 2 - Autumn 2007 7
Initialization
• Symbols a
1
, a
2
, ... ,a
m
have a basic prefix code,
used when symbols are first encountered.
• Example: a, b ,c, d, e, f, g, h, i, j
0
0
0
0
0
0
0
0
0
1 1
1
1
1
1
1
1
1
a
h g
f e d c b
j i
CSEP 590 - Lecture 2 - Autumn 2007 8
Initialization
• The tree will encode up to m + 1 symbols
including NYT.
• We reserve numbers 1 to 2m + 1 for node
numbering.
• The initial Huffman tree consists of a single
node
0
2m + 1
weight
node number
NYT
CSEP 590 - Lecture 2 - Autumn 2007 9
Coding Algorithm
1. If a new symbol is encountered then output the
code for NYT followed by the fixed code for the
symbol.  Add the new symbol to the tree.
2. If an old symbol is encountered then output its
code.
3. Update the tree to preserve the node number
invariant.
CSEP 590 - Lecture 2 - Autumn 2007 10
Decoding Algorithm
1. Decode the symbol using the current tree.
2. If NYT is encountered then use the fixed code to
decode the symbol.  Add the new symbol to the
tree.
3. Update the tree to preserve the node number
invariant.
CSEP 590 - Lecture 2 - Autumn 2007 11
Updating the Tree
1. Let y be leaf (symbol) with current weight x.*
2. If y the root update x by 1, otherwise,
3. Exchange y with the largest numbered node with
the same weight (unless it is the parent).**
4. Update x by 1
5. Let y be the parent with its weight x and go to 2.
*We never update the weight of NYT
** This exchange will preserve the node number invariant
CSEP 590 - Lecture 2 - Autumn 2007 12
Example
• aabcdad in alphabet {a,b,..., j}
0 21
output = 000
fixed code
NYT
fixed code for a
3
CSEP 590 - Lecture 2 - Autumn 2007 13
Example
0
19
output = 000
1
1
20
0 1
21
a NYT
CSEP 590 - Lecture 2 - Autumn 2007 14
Example
0
output = 0001
1
1
0 1
21
a NYT
19
20
CSEP 590 - Lecture 2 - Autumn 2007 15
Example
0
output = 0001
2
1
0 1
21
a NYT
19
20
CSEP 590 - Lecture 2 - Autumn 2007 16
Example
0
output = 00010001
2
2
0 1
21
NYT
fixed code for b
a NYT
19
20
CSEP 590 - Lecture 2 - Autumn 2007 17
Example
output = 00010001
2
2
0 1
21
a
20
0
19
a
0
17 0 18
0 1
b NYT
CSEP 590 - Lecture 2 - Autumn 2007 18
Example
output = 00010001
2
2
0 1
21
a
20
0
19
a
0
17 1 18
0 1
b NYT
4
CSEP 590 - Lecture 2 - Autumn 2007 19
Example
output = 00010001
2
2
0 1
21
a
20
0
19
a
1
17 1 18
0 1
b NYT
CSEP 590 - Lecture 2 - Autumn 2007 20
Example
2
3
0 1
21
a
20
0
19
a
1
17 1 18
0 1
b
output = 0001000100010
NYT
fixed code for c
NYT
CSEP 590 - Lecture 2 - Autumn 2007 21
Example
19
output = 0001000100010
2
3
20
0 1
21
a
1
17 1 18
0 1
0 a
0
15
0
16
0 1
NYT
a
b
c
CSEP 590 - Lecture 2 - Autumn 2007 22
Example
19
output = 0001000100010
2
3
20
0 1
21
a
1
17 1 18
0 1
0 a
0
15
1
16
0 1
NYT
a
b
c
CSEP 590 - Lecture 2 - Autumn 2007 23
Example
19
output = 0001000100010
2
3
20
0 1
21
a
1
17 1 18
0 1
0 a
1
15
1
16
0 1
NYT
a
b
c
CSEP 590 - Lecture 2 - Autumn 2007 24
Example
19
output = 0001000100010
2
3
20
0 1
21
a
2
17 1 18
0 1
0 a
1
15
1
16
0 1
NYT
a
b
c
Page 5

1
CSEP 590
Data Compression
Autumn 2007
CSEP 590 - Lecture 2 - Autumn 2007 2
• One pass
• During the pass calculate the frequencies
• Update the Huffman tree accordingly
– Coder – new Huffman tree computed after transmitting the
symbol
– Decoder – new Huffman tree computed after receiving the
symbol
• Symbol set and their initial codes must be known
• Need NYT (not yet transmitted symbol) to indicate a
new leaf is needed in the tree.
CSEP 590 - Lecture 2 - Autumn 2007 3
Optimal Tree Numbering
• a : 5, b: 2, c : 1, d : 3
a
c b
d
CSEP 590 - Lecture 2 - Autumn 2007 4
Weight the Nodes
• a : 5, b: 2, c : 1, d : 3
11
6
3
1 2
5
3
a
c b
d
CSEP 590 - Lecture 2 - Autumn 2007 5
Number the Nodes
• a : 5, b: 2, c : 1, d : 3
11
6
3
1 2
5
3
a
c b
d
Number the nodes as they are removed from the
priority queue.
2
7
5
6
3
4
1
CSEP 590 - Lecture 2 - Autumn 2007 6
• In an optimal tree for n symbols there is a
numbering of the nodes y
1
<y
2
<... <y
2n-1
such
that their corresponding weights x
1
,x
2
, ... , x
2n-1
satisfy:
– x
1
< x
2
< ... < x
2n-1
– siblings are numbered consecutively
• And vice versa
– That is, if there is such a numbering then the tree is
optimal.  We call this the node number invariant.
2
CSEP 590 - Lecture 2 - Autumn 2007 7
Initialization
• Symbols a
1
, a
2
, ... ,a
m
have a basic prefix code,
used when symbols are first encountered.
• Example: a, b ,c, d, e, f, g, h, i, j
0
0
0
0
0
0
0
0
0
1 1
1
1
1
1
1
1
1
a
h g
f e d c b
j i
CSEP 590 - Lecture 2 - Autumn 2007 8
Initialization
• The tree will encode up to m + 1 symbols
including NYT.
• We reserve numbers 1 to 2m + 1 for node
numbering.
• The initial Huffman tree consists of a single
node
0
2m + 1
weight
node number
NYT
CSEP 590 - Lecture 2 - Autumn 2007 9
Coding Algorithm
1. If a new symbol is encountered then output the
code for NYT followed by the fixed code for the
symbol.  Add the new symbol to the tree.
2. If an old symbol is encountered then output its
code.
3. Update the tree to preserve the node number
invariant.
CSEP 590 - Lecture 2 - Autumn 2007 10
Decoding Algorithm
1. Decode the symbol using the current tree.
2. If NYT is encountered then use the fixed code to
decode the symbol.  Add the new symbol to the
tree.
3. Update the tree to preserve the node number
invariant.
CSEP 590 - Lecture 2 - Autumn 2007 11
Updating the Tree
1. Let y be leaf (symbol) with current weight x.*
2. If y the root update x by 1, otherwise,
3. Exchange y with the largest numbered node with
the same weight (unless it is the parent).**
4. Update x by 1
5. Let y be the parent with its weight x and go to 2.
*We never update the weight of NYT
** This exchange will preserve the node number invariant
CSEP 590 - Lecture 2 - Autumn 2007 12
Example
• aabcdad in alphabet {a,b,..., j}
0 21
output = 000
fixed code
NYT
fixed code for a
3
CSEP 590 - Lecture 2 - Autumn 2007 13
Example
0
19
output = 000
1
1
20
0 1
21
a NYT
CSEP 590 - Lecture 2 - Autumn 2007 14
Example
0
output = 0001
1
1
0 1
21
a NYT
19
20
CSEP 590 - Lecture 2 - Autumn 2007 15
Example
0
output = 0001
2
1
0 1
21
a NYT
19
20
CSEP 590 - Lecture 2 - Autumn 2007 16
Example
0
output = 00010001
2
2
0 1
21
NYT
fixed code for b
a NYT
19
20
CSEP 590 - Lecture 2 - Autumn 2007 17
Example
output = 00010001
2
2
0 1
21
a
20
0
19
a
0
17 0 18
0 1
b NYT
CSEP 590 - Lecture 2 - Autumn 2007 18
Example
output = 00010001
2
2
0 1
21
a
20
0
19
a
0
17 1 18
0 1
b NYT
4
CSEP 590 - Lecture 2 - Autumn 2007 19
Example
output = 00010001
2
2
0 1
21
a
20
0
19
a
1
17 1 18
0 1
b NYT
CSEP 590 - Lecture 2 - Autumn 2007 20
Example
2
3
0 1
21
a
20
0
19
a
1
17 1 18
0 1
b
output = 0001000100010
NYT
fixed code for c
NYT
CSEP 590 - Lecture 2 - Autumn 2007 21
Example
19
output = 0001000100010
2
3
20
0 1
21
a
1
17 1 18
0 1
0 a
0
15
0
16
0 1
NYT
a
b
c
CSEP 590 - Lecture 2 - Autumn 2007 22
Example
19
output = 0001000100010
2
3
20
0 1
21
a
1
17 1 18
0 1
0 a
0
15
1
16
0 1
NYT
a
b
c
CSEP 590 - Lecture 2 - Autumn 2007 23
Example
19
output = 0001000100010
2
3
20
0 1
21
a
1
17 1 18
0 1
0 a
1
15
1
16
0 1
NYT
a
b
c
CSEP 590 - Lecture 2 - Autumn 2007 24
Example
19
output = 0001000100010
2
3
20
0 1
21
a
2
17 1 18
0 1
0 a
1
15
1
16
0 1
NYT
a
b
c
5
CSEP 590 - Lecture 2 - Autumn 2007 25
Example
19 2
4
20
0 1
21
a
2
17 1 18
0 1
0 a
1
15
1
16
0 1
NYT
a
b
c
output = 0001000100010000011
NYT
fixed code for d
CSEP 590 - Lecture 2 - Autumn 2007 26
Example
19 2
4
20
0 1
21
a
2
17 1 18
0 1
a
1
15
1
16
0 1
NYT
a
b
c
output = 0001000100010000011
0 a
13
0
14
0 1
d
0
CSEP 590 - Lecture 2 - Autumn 2007 27
Example
19 2
4
20
0 1
21
a
2
17 1 18
0 1
a
1
15
1
16
0 1
NYT
a
b
c
output = 0001000100010000011
0 a
13
1
14
0 1
d
0
CSEP 590 - Lecture 2 - Autumn 2007 28
Example
19 2
4
20
0 1
21
a
2
17 1 18
0 1
a
1
15
1
16
0 1
NYT
a
b
c
output = 0001000100010000011
0 a
13
1
14
0 1
d
1
exchange!
CSEP 590 - Lecture 2 - Autumn 2007 29
Example
19 2
4
20
0 1
21
2
17 1 18
0 1
a
1
15
1
16
0 1
NYT
a
b
c
output = 0001000100010000011
0 a
13
1
14
0 1
d
1
CSEP 590 - Lecture 2 - Autumn 2007 30
Example
19 2
4
20
0 1
21
2
17 1 18
0 1
a
2
15
1
16
0 1
NYT
a
b
c
output = 0001000100010000011
0 a
13
1
14
0 1
d
1
exchange!
```
Offer running on EduRev: Apply code STAYHOME200 to get INR 200 off on our premium plan EduRev Infinity!