Open App

Computer Science Engineering (CSE) Exam > Computer Science Engineering (CSE) Notes > Algorithms > Hashing

Hashing | Algorithms - Computer Science Engineering (CSE) PDF Download

Q: 1. What is hashing in computer science engineering?

Ans. Hashing is a technique used in computer science engineering to convert data into a fixed-size numerical value or key. It is commonly used to index and retrieve data in a database or to implement data structures such as hash tables. The process involves applying a hash function to the input data, which generates a unique hash code or hash value.

Q: 2. How does hashing work in computer science engineering?

Ans. In computer science engineering, hashing works by applying a hash function to the input data. The hash function takes the input and generates a fixed-size hash code or hash value, which is usually a numerical representation. This hash value is used as an index to store and retrieve data in a hash table or database. The goal of hashing is to minimize collisions, where different input data generates the same hash value.

Q: 3. What are the advantages of using hashing in computer science engineering?

Ans. There are several advantages of using hashing in computer science engineering. Firstly, it allows for efficient storage and retrieval of data, as the hash value serves as an index to quickly locate the desired data. Secondly, hashing provides a constant-time complexity for search, insert, and delete operations in hash tables when collisions are minimized. Additionally, it allows for faster data access compared to other data structures like arrays or linked lists.

Q: 4. How does collision handling work in hashing?

Ans. Collision handling is an important aspect of hashing to deal with situations where different input data generates the same hash value. There are various techniques to handle collisions, such as separate chaining and open addressing. In separate chaining, each hash table index stores a linked list of elements with the same hash value. In open addressing, if a collision occurs, the algorithm probes the next available index in the hash table until an empty slot is found.

Q: 5. What is a hash table in computer science engineering?

Ans. A hash table is a data structure used in computer science engineering to store and retrieve data efficiently. It consists of an array of fixed-size slots or buckets, where each slot is associated with a unique hash value. The hash value serves as an index to store and locate the data in the hash table. Hash tables are commonly used in various applications, such as database indexing, caches, and symbol tables, due to their fast access and search operations.

Introduction

Suppose we want to design a system for storing employee records keyed using phone numbers. And we want following queries to be performed efficiently:

Insert a phone number and corresponding information.
Search a phone number and fetch the information.
Delete a phone number and related information.

We can think of using the following data structures to maintain information about different phone numbers.

Array of phone numbers and records.
Linked List of phone numbers and records.
Balanced binary search tree with phone numbers as keys.
Direct Access Table.

For arrays and linked lists, we need to search in a linear fashion, which can be costly in practice. If we use arrays and keep the data sorted, then a phone number can be searched in O(Logn) time using Binary Search, but insert and delete operations become costly as we have to maintain sorted order.
With balanced binary search tree, we get moderate search, insert and delete times. All of these operations can be guaranteed to be in O(Logn) time.
Another solution that one can think of is to use a direct access table where we make a big array and use phone numbers as index in the array. An entry in array is NIL if phone number is not present, else the array entry stores pointer to records corresponding to phone number. Time complexity wise this solution is the best among all, we can do all operations in O(1) time. For example to insert a phone number, we create a record with details of given phone number, use phone number as index and store the pointer to the created record in table.
This solution has many practical limitations. First problem with this solution is extra space required is huge. For example if phone number is n digits, we need O(m * 10n) space for table where m is size of a pointer to record. Another problem is an integer in a programming language may not store n digits.
Due to above limitations Direct Access Table cannot always be used. Hashing is the solution that can be used in almost all such situations and performs extremely well compared to above data structures like Array, Linked List, Balanced BST in practice. With hashing we get O(1) search time on average (under reasonable assumptions) and O(n) in worst case.
Hashing is an improvement over Direct Access Table. The idea is to use hash function that converts a given phone number or any other key to a smaller number and uses the small number as index in a table called hash table.

Hash Function: A function that converts a given big phone number to a small practical integer value. The mapped integer value is used as an index in hash table. In simple terms, a hash function maps a big number or string to a small integer that can be used as index in hash table.

A good hash function should have following properties:

Efficiently computable.
Should uniformly distribute the keys (Each table position equally likely for each key)

For example for phone numbers a bad hash function is to take first three digits. A better function is consider last three digits. Please note that this may not be the best hash function. There may be better ways.

Hash Table: An array that stores pointers to records corresponding to a given phone number. An entry in hash table is NIL if no existing phone number has hash function value equal to the index for the entry.

Collision Handling: Since a hash function gets us a small number for a big key, there is possibility that two keys result in same value. The situation where a newly inserted key maps to an already occupied slot in hash table is called collision and must be handled using some collision handling technique. Following are the ways to handle collisions:

Chaining: The idea is to make each cell of hash table point to a linked list of records that have same hash function value. Chaining is simple, but requires additional memory outside the table.
Open Addressing: In open addressing, all elements are stored in the hash table itself. Each table entry contains either a record or NIL. When searching for an element, we one by one examine table slots until the desired element is found or it is clear that the element is not in the table.

Separate Chaining

What is Collision?
Since a hash function gets us a small number for a key which is a big integer or string, there is a possibility that two keys result in the same value. The situation where a newly inserted key maps to an already occupied slot in the hash table is called collision and must be handled using some collision handling technique.

What are the chances of collisions with large table?
Collisions are very likely even if we have big table to store keys. An important observation is Birthday Paradox. With only 23 persons, the probability that two people have the same birthday is 50%.

How to handle Collisions?
There are mainly two methods to handle collision:

Separate Chaining
Open Addressing

1. Separate Chaining: The idea is to make each cell of hash table point to a linked list of records that have same hash function value.
Let us consider a simple hash function as “key mod 7” and sequence of keys as 50, 700, 76, 85, 92, 73, 101.

C++ program for hashing with chaining

Advantages:

Simple to implement.
Hash table never fills up, we can always add more elements to the chain.
Less sensitive to the hash function or load factors.
It is mostly used when it is unknown how many and how frequently keys may be inserted or deleted.

Disadvantages:

Cache performance of chaining is not good as keys are stored using a linked list. Open addressing provides better cache performance as everything is stored in the same table.
Wastage of Space (Some Parts of hash table are never used)
If the chain becomes long, then search time can become O(n) in the worst case.
Uses extra space for links.

Performance of Chaining: Performance of hashing can be evaluated under the assumption that each key is equally likely to be hashed to any slot of table (simple uniform hashing).
m = Number of slots in hash table
n = Number of keys to be inserted in hash table
Load factor α = n/m
Expected time to search = O(1 + α)
Expected time to delete = O(1 + α)
Time to insert = O(1)
Time complexity of search insert and delete is
O(1) if α is O(1)

Data Structures For Storing Chains:

Linked lists
- Search: O(l) where l = length of linked list
- Delete: O(l)
- Insert: O(l)
- Not cache friendly
Dynamic Sized Arrays ( Vectors in C++, ArrayList in Java, list in Python)
- Search: O(l) where l = length of array
- Delete: O(l)
- Insert: O(l)
- Cache friendly
Self Balancing BST ( AVL Trees, Red Black Trees)
- Search: O(log(l))
- Delete: O(log(l))
- Insert: O(l)
- Not cache friendly

2. Open Addressing: Like separate chaining, open addressing is a method for handling collisions. In Open Addressing, all elements are stored in the hash table itself. So at any point, the size of the table must be greater than or equal to the total number of keys (Note that we can increase table size by copying old data if needed).
Insert(k): Keep probing until an empty slot is found. Once an empty slot is found, insert k.
Search(k): Keep probing until slot’s key doesn’t become equal to k or an empty slot is reached.
Delete(k): Delete operation is interesting. If we simply delete a key, then the search may fail. So slots of deleted keys are marked specially as “deleted”.
The insert can insert an item in a deleted slot, but the search doesn’t stop at a deleted slot.
Open Addressing is done in the following ways:
(i) Linear Probing: In linear probing, we linearly probe for next slot. For example, the typical gap between two probes is 1 as seen in the example below.
Let hash(x) be the slot index computed using a hash function and S be the table size
If slot hash(x) % S is full, then we try (hash(x) + 1) % S
If (hash(x) + 1) % S is also full, then we try (hash(x) + 2) % S
If (hash(x) + 2) % S is also full, then we try (hash(x) + 3) % S
..................................................
..................................................
Let us consider a simple hash function as “key mod 7” and a sequence of keys as 50, 700, 76, 85, 92, 73, 101.
Hashing | Algorithms - Computer Science Engineering (CSE) Challenges in Linear Probing:

Primary Clustering: One of the problems with linear probing is Primary clustering, many consecutive elements form groups and it starts taking time to find a free slot or to search for an element.
Secondary Clustering: Secondary clustering is less severe, two records only have the same collision chain (Probe Sequence) if their initial position is the same.

(ii) Quadratic Probing We look for i2‘th slot in i’th iteration.
Let hash(x) be the slot index computed using hash function.
If slot hash(x) % S is full, then we try (hash(x) + 1 * 1) % S
If (hash(x) + 1 * 1) % S is also full, then we try (hash(x) + 2 * 2) % S
If (hash(x) + 2 * 2) % S is also full, then we try (hash(x) + 3 * 3) % S
..................................................
..................................................
(iii) Double Hashing We use another hash function hash2(x) and look for i*hash2(x) slot in i’th rotation.
Let hash(x) be the slot index computed using hash function.
If slot hash(x) % S is full, then we try (hash(x) + 1*hash2(x)) % S
If (hash(x) + 1*hash2(x)) % S is also full, then we try (hash(x) + 2*hash2(x)) % S
If (hash(x) + 2*hash2(x)) % S is also full, then we try (hash(x) + 3*hash2(x)) % S
..................................................
..................................................

Comparison of above three:
Linear probing has the best cache performance but suffers from clustering. One more advantage of Linear probing is easy to compute.
Quadratic probing lies between the two in terms of cache performance and clustering.
Double hashing has poor cache performance but no clustering. Double hashing requires more computation time as two hash functions need to be computed.
Hashing | Algorithms - Computer Science Engineering (CSE)

Performance of Open Addressing: Like Chaining, the performance of hashing can be evaluated under the assumption that each key is equally likely to be hashed to any slot of the table (simple uniform hashing)
m = Number of slots in the hash table
n = Number of keys to be inserted in the hash table
Load factor α = n / m ( < 1 )
Expected time to search/insert/delete < 1 / (1 - α)
So Search, Insert and Delete take (1 / (1 - α)) time

The document Hashing | Algorithms - Computer Science Engineering (CSE) is a part of the Computer Science Engineering (CSE) Course Algorithms.

All you need of Computer Science Engineering (CSE) at this link: Computer Science Engineering (CSE)

	Algorithms 81 videos\|80 docs\|33 tests

Algorithms

81 videos|80 docs|33 tests

Join Course for Free

Top Courses for Computer Science Engineering (CSE)

View all

FAQs on Hashing - Algorithms - Computer Science Engineering (CSE)

1. What is hashing in computer science engineering?

Ans. Hashing is a technique used in computer science engineering to convert data into a fixed-size numerical value or key. It is commonly used to index and retrieve data in a database or to implement data structures such as hash tables. The process involves applying a hash function to the input data, which generates a unique hash code or hash value.

2. How does hashing work in computer science engineering?

Ans. In computer science engineering, hashing works by applying a hash function to the input data. The hash function takes the input and generates a fixed-size hash code or hash value, which is usually a numerical representation. This hash value is used as an index to store and retrieve data in a hash table or database. The goal of hashing is to minimize collisions, where different input data generates the same hash value.

3. What are the advantages of using hashing in computer science engineering?

Ans. There are several advantages of using hashing in computer science engineering. Firstly, it allows for efficient storage and retrieval of data, as the hash value serves as an index to quickly locate the desired data. Secondly, hashing provides a constant-time complexity for search, insert, and delete operations in hash tables when collisions are minimized. Additionally, it allows for faster data access compared to other data structures like arrays or linked lists.

4. How does collision handling work in hashing?

Ans. Collision handling is an important aspect of hashing to deal with situations where different input data generates the same hash value. There are various techniques to handle collisions, such as separate chaining and open addressing. In separate chaining, each hash table index stores a linked list of elements with the same hash value. In open addressing, if a collision occurs, the algorithm probes the next available index in the hash table until an empty slot is found.

5. What is a hash table in computer science engineering?

Ans. A hash table is a data structure used in computer science engineering to store and retrieve data efficiently. It consists of an array of fixed-size slots or buckets, where each slot is associated with a unique hash value. The hash value serves as an index to store and locate the data in the hash table. Hash tables are commonly used in various applications, such as database indexing, caches, and symbol tables, due to their fast access and search operations.

Related Exams

Computer Science Engineering (CSE)

About this Document

	4.68/5 Rating
	Dec 22, 2024 Last updated

Document Description: Hashing for Computer Science Engineering (CSE) 2024 is part of Algorithms preparation. The notes and questions for Hashing have been prepared according to the Computer Science Engineering (CSE) exam syllabus. Information about Hashing covers topics like Introduction and Hashing Example, for Computer Science Engineering (CSE) 2024 Exam. Find important definitions, questions, notes, meanings, examples, exercises and tests below for Hashing.

Introduction of Hashing in English is available as part of our Algorithms for Computer Science Engineering (CSE) & Hashing in Hindi for Algorithms course. Download more important topics related with notes, lectures and mock test series for Computer Science Engineering (CSE) Exam by signing up for free. Computer Science Engineering (CSE): Hashing | Algorithms - Computer Science Engineering (CSE)

Description

Full syllabus notes, lecture & questions for Hashing | Algorithms - Computer Science Engineering (CSE) - Computer Science Engineering (CSE) | Plus excerises question with solution to help you revise complete syllabus for Algorithms | Best notes, free PDF download

Information about Hashing

In this doc you can find the meaning of Hashing defined & explained in the simplest way possible. Besides explaining types of Hashing theory, EduRev gives you an ample number of questions to practice Hashing tests, examples and also practice Computer Science Engineering (CSE) tests

	Algorithms 81 videos\|80 docs\|33 tests

Algorithms

81 videos|80 docs|33 tests

Join Course for Free

Download as PDF

Explore Courses for Computer Science Engineering (CSE) exam

Top Courses for Computer Science Engineering (CSE)

Explore Courses

Signup for Free!

Signup to see your scores go up within 7 days! Learn & Practice with 1000+ FREE Notes, Videos & Tests.

Start learning for Free

10M+ students study on EduRev

past year papers

Hashing | Algorithms - Computer Science Engineering (CSE)

ppt

Important questions

video lectures

Hashing | Algorithms - Computer Science Engineering (CSE)

Free

Sample Paper

MCQs

study material

Extra Questions

mock tests for examination

Viva Questions

Hashing | Algorithms - Computer Science Engineering (CSE)

practice quizzes

Semester Notes

Objective type Questions

Exam

pdf

Summary

shortcuts and tricks

Previous Year Questions with Solutions

;

Additional Information about Hashing for Computer Science Engineering (CSE) Preparation

Hashing Free PDF Download

The Hashing is an invaluable resource that delves deep into the core of the Computer Science Engineering (CSE) exam. These study notes are curated by experts and cover all the essential topics and concepts, making your preparation more efficient and effective. With the help of these notes, you can grasp complex subjects quickly, revise important points easily, and reinforce your understanding of key concepts. The study notes are presented in a concise and easy-to-understand manner, allowing you to optimize your learning process. Whether you're looking for best-recommended books, sample papers, study material, or toppers' notes, this PDF has got you covered. Download the Hashing now and kickstart your journey towards success in the Computer Science Engineering (CSE) exam.

Importance of Hashing

The importance of Hashing cannot be overstated, especially for Computer Science Engineering (CSE) aspirants. This document holds the key to success in the Computer Science Engineering (CSE) exam. It offers a detailed understanding of the concept, providing invaluable insights into the topic. By knowing the concepts well in advance, students can plan their preparation effectively. Utilize this indispensable guide for a well-rounded preparation and achieve your desired results.

Hashing Notes

Hashing Notes offer in-depth insights into the specific topic to help you master it with ease. This comprehensive document covers all aspects related to Hashing. It includes detailed information about the exam syllabus, recommended books, and study materials for a well-rounded preparation. Practice papers and question papers enable you to assess your progress effectively. Additionally, the paper analysis provides valuable tips for tackling the exam strategically. Access to Toppers' notes gives you an edge in understanding complex concepts. Whether you're a beginner or aiming for advanced proficiency, Hashing Notes on EduRev are your ultimate resource for success.

Hashing Computer Science Engineering (CSE) Questions

The "Hashing Computer Science Engineering (CSE) Questions" guide is a valuable resource for all aspiring students preparing for the Computer Science Engineering (CSE) exam. It focuses on providing a wide range of practice questions to help students gauge their understanding of the exam topics. These questions cover the entire syllabus, ensuring comprehensive preparation. The guide includes previous years' question papers for students to familiarize themselves with the exam's format and difficulty level. Additionally, it offers subject-specific question banks, allowing students to focus on weak areas and improve their performance.

Study Hashing on the App

Students of Computer Science Engineering (CSE) can study Hashing alongwith tests & analysis from the EduRev app, which will help them while preparing for their exam. Apart from the Hashing, students can also utilize the EduRev App for other study materials such as previous year question papers, syllabus, important questions, etc. The EduRev App will make your learning easier as you can access it from anywhere you want. The content of Hashing is prepared as per the latest Computer Science Engineering (CSE) syllabus.

Education Revolution