Formula Sheets: Hashing | Algorithms - Computer Science Engineering (CSE) PDF Download

Download, print and study this document offline
Please wait while the PDF view is loading
 Page 1


Hashing F orm ula Sheet
Hashing Basics
• Hash F unction : h(k) = k mod m , where k is k ey , m is table size (preferably prim e).
• Load F actor : a =
n
m
, where n is n um b er of k eys, m is table size.
• Exp ected Collisions : E[ collisions]˜
n(n-1)
2m
for uniform hashing.
• T able Size : m= n to minimize collisions, ideally m˜ 2n for op en addressing.
Separate Chaining (Collision Resolution)
• Data Structure : Link ed list p er slot.
• Time Complexit y :
– A v erage Case (successful searc h): O(1+a) , where a is load factor.
– A v erage Case (unsuccessful searc h): O(a) .
– W orst Case: O(n) (all k eys in one c hain).
• Space Complexit y : O(n+m) , where n is n um b er of k eys, m is table size.
• Chain Length : L
a vg
= a , L
max
= O(n) in w orst case.
Op en A ddressing (Collision Resolution)
• Probing T ec hniques :
– Linear Probing: h(k,i) = (h(k)+i) mod m , where i is prob e n um b er.
– Quadratic Probing: h(k,i) = (h(k)+c
1
i+c
2
i
2
) mod m , where c
1
,c
2
are constan ts.
– Double Hashing: h(k,i) = (h
1
(k)+i·h
2
(k)) mod m , where h
2
(k) is second hash function.
• Time Complexit y :
– A v erage Case (successful searc h): O
(
1
1-a
)
.
– A v erage Case (unsuccessful searc h): O
(
1
(1-a)
2
)
.
– W orst Case: O(n) (table nearly full).
• Space Complexit y : O(m) , where m is table size.
• Cluster F ormation (Linear Probing) : E[ cluster-size]?
1
(1-a)
2
.
Extendible Hashing
• Directory Size : 2
d
, where d is global depth.
• Buc k et Split : Occurs when buc k et o v erflo ws, lo cal depth l= d .
• Time Complexit y :
– Searc h/Insert/Delete: O(1) a v erage, O(logn) w orst case due to directory expansion.
• Space Complexit y : O(n+2
d
) , where n is n um b e r of k eys, 2
d
is directory size.
• Directory Doubling : When l > d , d = d+1 , S
directory
= 2·S
previous
.
1
Page 2


Hashing F orm ula Sheet
Hashing Basics
• Hash F unction : h(k) = k mod m , where k is k ey , m is table size (preferably prim e).
• Load F actor : a =
n
m
, where n is n um b er of k eys, m is table size.
• Exp ected Collisions : E[ collisions]˜
n(n-1)
2m
for uniform hashing.
• T able Size : m= n to minimize collisions, ideally m˜ 2n for op en addressing.
Separate Chaining (Collision Resolution)
• Data Structure : Link ed list p er slot.
• Time Complexit y :
– A v erage Case (successful searc h): O(1+a) , where a is load factor.
– A v erage Case (unsuccessful searc h): O(a) .
– W orst Case: O(n) (all k eys in one c hain).
• Space Complexit y : O(n+m) , where n is n um b er of k eys, m is table size.
• Chain Length : L
a vg
= a , L
max
= O(n) in w orst case.
Op en A ddressing (Collision Resolution)
• Probing T ec hniques :
– Linear Probing: h(k,i) = (h(k)+i) mod m , where i is prob e n um b er.
– Quadratic Probing: h(k,i) = (h(k)+c
1
i+c
2
i
2
) mod m , where c
1
,c
2
are constan ts.
– Double Hashing: h(k,i) = (h
1
(k)+i·h
2
(k)) mod m , where h
2
(k) is second hash function.
• Time Complexit y :
– A v erage Case (successful searc h): O
(
1
1-a
)
.
– A v erage Case (unsuccessful searc h): O
(
1
(1-a)
2
)
.
– W orst Case: O(n) (table nearly full).
• Space Complexit y : O(m) , where m is table size.
• Cluster F ormation (Linear Probing) : E[ cluster-size]?
1
(1-a)
2
.
Extendible Hashing
• Directory Size : 2
d
, where d is global depth.
• Buc k et Split : Occurs when buc k et o v erflo ws, lo cal depth l= d .
• Time Complexit y :
– Searc h/Insert/Delete: O(1) a v erage, O(logn) w orst case due to directory expansion.
• Space Complexit y : O(n+2
d
) , where n is n um b e r of k eys, 2
d
is directory size.
• Directory Doubling : When l > d , d = d+1 , S
directory
= 2·S
previous
.
1
Rabin-Karp Algorithm (String Matc hing)
• Hash F unction : h(s) =
?
m-1
i=0
s[i]·b
m-1-i
mod q , where s is substring, b is base (e.g., 256), q
is prime mo dulus.
• Rolling Hash : h(s
i+1
) = (h(s
i
)·b-s[i]·b
m
+s[i+m]) mod q .
• Time Complexit y :
– A v erage Case: O(n+m) , where n is text length, m is pattern length.
– W orst Case: O(n·m) (spurious hits).
• Space Complexit y : O(1) .
• F alse P ositiv e Probabilit y : P
false
˜
1
q
, reduced b y large q .
Hash T able Op erations (STL Con text)
• Insert/Searc h/Delete (C++ STL unordered
m
ap) :
• A v erage C ase: O(1) .
• W orst Cas e: O(n) due to collisions.
Rehashing : T riggered when a > a
max
(e.g., 1.0 in STL), T
rehash
= O(n) .
Buc k et Coun t : m˜ n initially , gro ws as m
new
= 2·m
old
.
Applications (e.g., Coun t P airs with Giv en Sum)
• Problem : Find pairs (a
i
,a
j
) in arra y suc h that a
i
+a
j
= sum.
• Hash-Based Solution :
– Time Complexit y: O(n) , using hash table to store frequency of elem en ts.
– Space Complexit y: O(n) .
– F requency Lo okup: f(x) = coun t[ sum-x] , where coun t is hash table.
– P air Coun t: C =
?
f(a
i
)·f( sum-a
i
) (adjust for i?= j ).
P e rformance Metrics
• Load F actor Impact :
– Separate Chaining: a= 1 for optimal p erformance.
– Op en A ddressing: a= 0.7 to a v oid clustering.
• Collision Probabilit y : P
collision
˜ 1-e
-
n
2
2m
, for n k eys in m slots.
• Searc h Time Comparison :
– Hash T able: O(1) a v erage vs. BST: O(logn) .
– Hash T able: O(n) w orst case vs. BST: O(n) w orst case (un balanced).
• Space E?iciency : S
hash
= O(n+m) vs. S
BST
= O(n) .
• Rehashing Ov erhead: T
rehash
? n , amortized O(1) p er insertion.
2
Read More
81 videos|113 docs|34 tests
Related Searches

Previous Year Questions with Solutions

,

pdf

,

study material

,

Viva Questions

,

mock tests for examination

,

Semester Notes

,

past year papers

,

Objective type Questions

,

MCQs

,

Formula Sheets: Hashing | Algorithms - Computer Science Engineering (CSE)

,

Formula Sheets: Hashing | Algorithms - Computer Science Engineering (CSE)

,

Free

,

practice quizzes

,

Summary

,

video lectures

,

Important questions

,

ppt

,

Formula Sheets: Hashing | Algorithms - Computer Science Engineering (CSE)

,

shortcuts and tricks

,

Sample Paper

,

Extra Questions

,

Exam

;