Page 1
Hashing
Hashing is a common method of accessing data records using the hash table.
Hashing can be used to build, search, or delete from a table.
Hash Table: A hash table is a data structure that stores records in an array, called a
hash table. Hash table can be used for quick insertion and searching.
Load Factor: The ratio of the number of items in a table to the table's size is called
the load factor.
Hash Function:
• It is a method for computing table index from key.
• A good hash function is simple, so it can be computed quickly.
• The major advantage of hash tables is their speed.
• If the hash function is slow, this speed will be degraded.
• The purpose of a hash function is to take a range of key values and transform
them into index values in such a way that the key values are distributed
randomly across all the indices of the hash table.
• Goal of hash function:The probability of any two keys hashing to the same
slot is 1/N.
There are many hash functions approaches as follows:
Division Method:
• Mapping a key K into one of m slots by taking the remainder of K divided by
m.
h (K) = K mod m
• Example: Assume a table has 8 slots (m=8). Using division method, insert the
following elements into the hash table. 36,18, 72, 43, and 6 are inserted in the
Page 2
Hashing
Hashing is a common method of accessing data records using the hash table.
Hashing can be used to build, search, or delete from a table.
Hash Table: A hash table is a data structure that stores records in an array, called a
hash table. Hash table can be used for quick insertion and searching.
Load Factor: The ratio of the number of items in a table to the table's size is called
the load factor.
Hash Function:
• It is a method for computing table index from key.
• A good hash function is simple, so it can be computed quickly.
• The major advantage of hash tables is their speed.
• If the hash function is slow, this speed will be degraded.
• The purpose of a hash function is to take a range of key values and transform
them into index values in such a way that the key values are distributed
randomly across all the indices of the hash table.
• Goal of hash function:The probability of any two keys hashing to the same
slot is 1/N.
There are many hash functions approaches as follows:
Division Method:
• Mapping a key K into one of m slots by taking the remainder of K divided by
m.
h (K) = K mod m
• Example: Assume a table has 8 slots (m=8). Using division method, insert the
following elements into the hash table. 36,18, 72, 43, and 6 are inserted in the
order.
36 % S = 4
18 % 8 = 2
72 % 8 = 0
43 % 8 = 3
6 % 8 = 6
[0] [1] [2]
[3] [4] [5] [6]
[7]
72
18 43
36
6
Mid-Square Method:
Mapping a key K into one of m slots, by getting the some middle digits from value
K2.
h(k) = K2 and get middle (logio m) digits
Example: 3121 is a key and square of 3121 is 9740641. Middle part is 406 (with a
table size of 1000)
Folding Method:
Divide the key K into some sections, besides the last section, have same length.
Then, add these sections together.
• Shift folding (123456 is folded as 12+34+56)
• Folding at the boundaries (123456 is folded as 12+43+56)
Radix Hashing:
Transform the number into another base and then divide by the maximum address.
• Example: Addresses from 0 to 99, key = 453 in base 11 = 382. Hash address =
382 mod 99 = 85.
Problems with hashing:
• Collision: No matter what the hash function, there is the possibility that two
different keys could resolve to the same hash address. This situation is
known as a collision.
• Handling the Collisions: The following techniques can be used to handle the
collisions.
° Chaining
° Double hashing (Re-hashing)
° Open Addressing (Linear probing, Quadratic probing, Random probing),
etc.
Chaining:
Page 3
Hashing
Hashing is a common method of accessing data records using the hash table.
Hashing can be used to build, search, or delete from a table.
Hash Table: A hash table is a data structure that stores records in an array, called a
hash table. Hash table can be used for quick insertion and searching.
Load Factor: The ratio of the number of items in a table to the table's size is called
the load factor.
Hash Function:
• It is a method for computing table index from key.
• A good hash function is simple, so it can be computed quickly.
• The major advantage of hash tables is their speed.
• If the hash function is slow, this speed will be degraded.
• The purpose of a hash function is to take a range of key values and transform
them into index values in such a way that the key values are distributed
randomly across all the indices of the hash table.
• Goal of hash function:The probability of any two keys hashing to the same
slot is 1/N.
There are many hash functions approaches as follows:
Division Method:
• Mapping a key K into one of m slots by taking the remainder of K divided by
m.
h (K) = K mod m
• Example: Assume a table has 8 slots (m=8). Using division method, insert the
following elements into the hash table. 36,18, 72, 43, and 6 are inserted in the
order.
36 % S = 4
18 % 8 = 2
72 % 8 = 0
43 % 8 = 3
6 % 8 = 6
[0] [1] [2]
[3] [4] [5] [6]
[7]
72
18 43
36
6
Mid-Square Method:
Mapping a key K into one of m slots, by getting the some middle digits from value
K2.
h(k) = K2 and get middle (logio m) digits
Example: 3121 is a key and square of 3121 is 9740641. Middle part is 406 (with a
table size of 1000)
Folding Method:
Divide the key K into some sections, besides the last section, have same length.
Then, add these sections together.
• Shift folding (123456 is folded as 12+34+56)
• Folding at the boundaries (123456 is folded as 12+43+56)
Radix Hashing:
Transform the number into another base and then divide by the maximum address.
• Example: Addresses from 0 to 99, key = 453 in base 11 = 382. Hash address =
382 mod 99 = 85.
Problems with hashing:
• Collision: No matter what the hash function, there is the possibility that two
different keys could resolve to the same hash address. This situation is
known as a collision.
• Handling the Collisions: The following techniques can be used to handle the
collisions.
° Chaining
° Double hashing (Re-hashing)
° Open Addressing (Linear probing, Quadratic probing, Random probing),
etc.
Chaining:
• A chain is simply a linked list of all the elements with the same hash key.
• A linked list is created at each index in the hash table.
• Hash Function: h (K) = K mod m
• Example: Assume a table has 8 slots (m=8). Using the chaining, insert the
following elements into the hash table. 36,18, 72, 43, 6,10, 5, and 15 are
inserted in the order.
Hash key = key % table size
36 % S
=
4
13 % S
=
2
72 % s
=
0
43 % s
=
3
6 % s
=
6
10 % s
=
2
5 % s
=
5
15 % s
=
7
• A data item's key is hashed to the index in simple hashing, and the item is
inserted into the linked list at that index.
• Other items that hash to the same index are simply added to the linked list.
• Cost is proportional to length of list.
• Average length = N / M, where N is size of array and M is the number of linked
lists.
• Theorem: Let a = N / M > 1 be average length of list. For any t > 1, probability
that list length > t o r is exponentially small in t.
• Analysis of Chaining:
° If M is too large, then too many empty chains.
° If M is too small, then chains are too long.
• There is no need to search for empty cells in the array or table.
• Advantages of Chaining: Unbounded elements can be stored (No bound to the
size of table).
Page 4
Hashing
Hashing is a common method of accessing data records using the hash table.
Hashing can be used to build, search, or delete from a table.
Hash Table: A hash table is a data structure that stores records in an array, called a
hash table. Hash table can be used for quick insertion and searching.
Load Factor: The ratio of the number of items in a table to the table's size is called
the load factor.
Hash Function:
• It is a method for computing table index from key.
• A good hash function is simple, so it can be computed quickly.
• The major advantage of hash tables is their speed.
• If the hash function is slow, this speed will be degraded.
• The purpose of a hash function is to take a range of key values and transform
them into index values in such a way that the key values are distributed
randomly across all the indices of the hash table.
• Goal of hash function:The probability of any two keys hashing to the same
slot is 1/N.
There are many hash functions approaches as follows:
Division Method:
• Mapping a key K into one of m slots by taking the remainder of K divided by
m.
h (K) = K mod m
• Example: Assume a table has 8 slots (m=8). Using division method, insert the
following elements into the hash table. 36,18, 72, 43, and 6 are inserted in the
order.
36 % S = 4
18 % 8 = 2
72 % 8 = 0
43 % 8 = 3
6 % 8 = 6
[0] [1] [2]
[3] [4] [5] [6]
[7]
72
18 43
36
6
Mid-Square Method:
Mapping a key K into one of m slots, by getting the some middle digits from value
K2.
h(k) = K2 and get middle (logio m) digits
Example: 3121 is a key and square of 3121 is 9740641. Middle part is 406 (with a
table size of 1000)
Folding Method:
Divide the key K into some sections, besides the last section, have same length.
Then, add these sections together.
• Shift folding (123456 is folded as 12+34+56)
• Folding at the boundaries (123456 is folded as 12+43+56)
Radix Hashing:
Transform the number into another base and then divide by the maximum address.
• Example: Addresses from 0 to 99, key = 453 in base 11 = 382. Hash address =
382 mod 99 = 85.
Problems with hashing:
• Collision: No matter what the hash function, there is the possibility that two
different keys could resolve to the same hash address. This situation is
known as a collision.
• Handling the Collisions: The following techniques can be used to handle the
collisions.
° Chaining
° Double hashing (Re-hashing)
° Open Addressing (Linear probing, Quadratic probing, Random probing),
etc.
Chaining:
• A chain is simply a linked list of all the elements with the same hash key.
• A linked list is created at each index in the hash table.
• Hash Function: h (K) = K mod m
• Example: Assume a table has 8 slots (m=8). Using the chaining, insert the
following elements into the hash table. 36,18, 72, 43, 6,10, 5, and 15 are
inserted in the order.
Hash key = key % table size
36 % S
=
4
13 % S
=
2
72 % s
=
0
43 % s
=
3
6 % s
=
6
10 % s
=
2
5 % s
=
5
15 % s
=
7
• A data item's key is hashed to the index in simple hashing, and the item is
inserted into the linked list at that index.
• Other items that hash to the same index are simply added to the linked list.
• Cost is proportional to length of list.
• Average length = N / M, where N is size of array and M is the number of linked
lists.
• Theorem: Let a = N / M > 1 be average length of list. For any t > 1, probability
that list length > t o r is exponentially small in t.
• Analysis of Chaining:
° If M is too large, then too many empty chains.
° If M is too small, then chains are too long.
• There is no need to search for empty cells in the array or table.
• Advantages of Chaining: Unbounded elements can be stored (No bound to the
size of table).
• Disadvantage of Chaining: Too many linked lists (overhead of linked lists)
Open Addressing:
In open addressing, when a data item can’t be placed at the hashed index value,
another location in the array is sought. We’ll explore three methods of open
addressing, which vary in the method used to find the next empty location. These
methods are linear probing, quadratic probing, and double hashing.
Linear Probing:
• When using a linear probing method the item will be stored in the next
available slot in the table, assuming that the table is not already full.
• This is implemented via a linear searching for an empty slot, from the point of
collision.
• If the end of table is reached during the linear search, the search will wrap
around to the beginning of the table and continue from there.
• Example: Assume a table has 8 slots (m=8). Using Linear probing, insert the
following elements into the hash table. 36,18, 72, 43, 6,10, 5, and 15 are
inserted in the order.
Hash key = key % table size
36 % 8
=
4
18 % 8
=
2
72 % 8
=
0
43 % 8
=
3
6 % 8
=
6
10 % 8
=
2
5 % 8
=
5
15 % 8
=
7
[ 0]
[1]
[ 2]
[3]
[4]
[5]
[S ]
[7]
72
15
18
43
36
10
6
5
Relationship between probe length (P) and load factor (L) for linear probing:
° For a successful search: P = (1 +1 /(1-L)2) /2
Page 5
Hashing
Hashing is a common method of accessing data records using the hash table.
Hashing can be used to build, search, or delete from a table.
Hash Table: A hash table is a data structure that stores records in an array, called a
hash table. Hash table can be used for quick insertion and searching.
Load Factor: The ratio of the number of items in a table to the table's size is called
the load factor.
Hash Function:
• It is a method for computing table index from key.
• A good hash function is simple, so it can be computed quickly.
• The major advantage of hash tables is their speed.
• If the hash function is slow, this speed will be degraded.
• The purpose of a hash function is to take a range of key values and transform
them into index values in such a way that the key values are distributed
randomly across all the indices of the hash table.
• Goal of hash function:The probability of any two keys hashing to the same
slot is 1/N.
There are many hash functions approaches as follows:
Division Method:
• Mapping a key K into one of m slots by taking the remainder of K divided by
m.
h (K) = K mod m
• Example: Assume a table has 8 slots (m=8). Using division method, insert the
following elements into the hash table. 36,18, 72, 43, and 6 are inserted in the
order.
36 % S = 4
18 % 8 = 2
72 % 8 = 0
43 % 8 = 3
6 % 8 = 6
[0] [1] [2]
[3] [4] [5] [6]
[7]
72
18 43
36
6
Mid-Square Method:
Mapping a key K into one of m slots, by getting the some middle digits from value
K2.
h(k) = K2 and get middle (logio m) digits
Example: 3121 is a key and square of 3121 is 9740641. Middle part is 406 (with a
table size of 1000)
Folding Method:
Divide the key K into some sections, besides the last section, have same length.
Then, add these sections together.
• Shift folding (123456 is folded as 12+34+56)
• Folding at the boundaries (123456 is folded as 12+43+56)
Radix Hashing:
Transform the number into another base and then divide by the maximum address.
• Example: Addresses from 0 to 99, key = 453 in base 11 = 382. Hash address =
382 mod 99 = 85.
Problems with hashing:
• Collision: No matter what the hash function, there is the possibility that two
different keys could resolve to the same hash address. This situation is
known as a collision.
• Handling the Collisions: The following techniques can be used to handle the
collisions.
° Chaining
° Double hashing (Re-hashing)
° Open Addressing (Linear probing, Quadratic probing, Random probing),
etc.
Chaining:
• A chain is simply a linked list of all the elements with the same hash key.
• A linked list is created at each index in the hash table.
• Hash Function: h (K) = K mod m
• Example: Assume a table has 8 slots (m=8). Using the chaining, insert the
following elements into the hash table. 36,18, 72, 43, 6,10, 5, and 15 are
inserted in the order.
Hash key = key % table size
36 % S
=
4
13 % S
=
2
72 % s
=
0
43 % s
=
3
6 % s
=
6
10 % s
=
2
5 % s
=
5
15 % s
=
7
• A data item's key is hashed to the index in simple hashing, and the item is
inserted into the linked list at that index.
• Other items that hash to the same index are simply added to the linked list.
• Cost is proportional to length of list.
• Average length = N / M, where N is size of array and M is the number of linked
lists.
• Theorem: Let a = N / M > 1 be average length of list. For any t > 1, probability
that list length > t o r is exponentially small in t.
• Analysis of Chaining:
° If M is too large, then too many empty chains.
° If M is too small, then chains are too long.
• There is no need to search for empty cells in the array or table.
• Advantages of Chaining: Unbounded elements can be stored (No bound to the
size of table).
• Disadvantage of Chaining: Too many linked lists (overhead of linked lists)
Open Addressing:
In open addressing, when a data item can’t be placed at the hashed index value,
another location in the array is sought. We’ll explore three methods of open
addressing, which vary in the method used to find the next empty location. These
methods are linear probing, quadratic probing, and double hashing.
Linear Probing:
• When using a linear probing method the item will be stored in the next
available slot in the table, assuming that the table is not already full.
• This is implemented via a linear searching for an empty slot, from the point of
collision.
• If the end of table is reached during the linear search, the search will wrap
around to the beginning of the table and continue from there.
• Example: Assume a table has 8 slots (m=8). Using Linear probing, insert the
following elements into the hash table. 36,18, 72, 43, 6,10, 5, and 15 are
inserted in the order.
Hash key = key % table size
36 % 8
=
4
18 % 8
=
2
72 % 8
=
0
43 % 8
=
3
6 % 8
=
6
10 % 8
=
2
5 % 8
=
5
15 % 8
=
7
[ 0]
[1]
[ 2]
[3]
[4]
[5]
[S ]
[7]
72
15
18
43
36
10
6
5
Relationship between probe length (P) and load factor (L) for linear probing:
° For a successful search: P = (1 +1 /(1-L)2) /2
° For an unsuccessful search: P = (1 +1 /(1-L))/2
• Analysis of Linear Probing:
° If load factor is too small then too many empty cells.
Quadratic Probing:
If the collision occurs, instead of moving one cell, move i2 cells from the point of
collision, where i is the number of attempts to resolve the collision. Example:
• Example: Assume a table has 10 slots. Using Quadratic probing, insert the
following elements in the given order.
89,18, 49, 58 and 69 are inserted into a hash table.
89% 10 = 9
18% 10 = 8
49% 10 = 9
49 = 9 is occupied, so 1 attempt
needed. Insert at9+l2 = 10 = 0
location.
58 % 10 = 8 3 attempts - 32 = 9 spots
5 8 = 8 is occupied, and next 3
locations are occupied.
So 8 + 32 = 17 = 7 location.
69 % 10 = 9 2 attempts - 22 = 4 spots
\
Read More