Hashing

Hash Table is a data structure which stores data in an associative manner. In a hash table, data are stored in an array, where each data value is placed at an index computed from its key. Access to data becomes very fast if we know the index of the desired data.

Hash tables provide very fast insertion and search operations irrespective of the size of the data, when a suitable hashing method and collision-resolution scheme are used. A hash table uses an array as storage and a hash function to compute the index at which an element should be inserted or searched for.

Hashing

Hashing is a technique to convert a (typically large) range of key values into a (smaller) range of array indices. A simple and commonly used hash function for integer keys is the division method, which uses the modulo operator:

index = key % SIZE

Example: consider a hash table of size 20. The following items (in the format (key, value)) are to be stored.

Hashing
  • (1, 20)
  • (2, 70)
  • (42, 80)
  • (4, 25)
  • (12, 44)
  • (14, 32)
  • (17, 11)
  • (13, 78)
  • (37, 98)
Sr. No.KeyHashArray Index
111 % 20 = 11
222 % 20 = 22
34242 % 20 = 22
444 % 20 = 44
51212 % 20 = 1212
61414 % 20 = 1414
71717 % 20 = 1717
81313 % 20 = 1313
93737 % 20 = 1717

Collision

A collision occurs when two different keys map to the same array index. Collisions are inevitable when the key space is larger than the table size. A collision-resolution strategy is required to handle such cases and still provide efficient operations.

Linear Probing (Open Addressing)

Linear probing is an open addressing collision-resolution method. If the computed index is already occupied, linear probing searches the next cell (index+1), then index+2, and so on, wrapping around the table, until an empty cell is found.

Continuing the previous example, using linear probing to place items results in the following final indices:

Sr. No.KeyHashArray IndexAfter Linear Probing, Array Index
111 % 20 = 111
222 % 20 = 222
34242 % 20 = 223
444 % 20 = 444
51212 % 20 = 121212
61414 % 20 = 141414
71717 % 20 = 171717
81313 % 20 = 131313
93737 % 20 = 171718

Notes about linear probing:

  • Linear probing can cause primary clustering: long runs of consecutive occupied slots form and may increase probe lengths.
  • Insertion and search probe sequences are identical for a given key, so maintaining deleted slots requires care (see Delete below).

Other Collision-Resolution Methods

  • Chaining (Separate Chaining) - each table index stores a pointer to a linked list (or other container) of entries that hash to the same index. Insertions append to the list. This allows the table size to remain fixed while handling arbitrary numbers of keys.
  • Quadratic probing - a form of open addressing where the probe sequence is index + c1·i + c2·i² (i = probe number). It reduces primary clustering but may have other trade-offs and requires careful choice of table size.
  • Double hashing - uses a second hash function to compute the probe step size; probe sequence becomes index + i·h2(key). Double hashing tends to produce good probe distribution and avoids clustering better than linear probing.

Basic Operations

The primary operations supported by a hash table are:

  • Search - find an element given its key.
  • Insert - add an element (key, value) to the table.
  • Delete - remove an element with a given key from the table.

DataItem

We can represent an entry (data item) stored in the hash table as a struct containing the key and associated data:

struct DataItem { int data; int key; };

Hash Method

A simple hash function using the division method:

int hashCode(int key){ return key % SIZE; }

Search Operation

To search for an element with a given key, compute its hash index and check the slot. If the key at that slot does not match, use the chosen collision-resolution probing sequence (for example, linear probing) to continue searching until the element is found or an empty slot is reached.

Example (Linear Probing Search)

struct DataItem *search(int key) { // get the hash int hashIndex = hashCode(key); // move in array until an empty while(hashArray[hashIndex] != NULL) { if(hashArray[hashIndex]->key == key) return hashArray[hashIndex]; // go to next cell ++hashIndex; // wrap around the table hashIndex %= SIZE; } return NULL; }

Insert Operation

To insert a new (key, data) pair, compute the hash index for the key. If the computed index is empty or marked deleted, place the item there. Otherwise, follow the probing sequence until an available slot is found.

Example (Linear Probing Insert)

void insert(int key,int data) { struct DataItem *item = (struct DataItem*) malloc(sizeof(struct DataItem)); item->data = data; item->key = key; // get the hash int hashIndex = hashCode(key); // move in array until an empty or deleted cell while(hashArray[hashIndex] != NULL && hashArray[hashIndex]->key != -1) { // go to next cell ++hashIndex; // wrap around the table hashIndex %= SIZE; } hashArray[hashIndex] = item; }

Delete Operation

Deletion must preserve the property that probe sequences used during search still find items inserted earlier. One common approach in open addressing is to place a special dummy item (a tombstone) at the deleted position rather than setting it to NULL. This allows searches that would otherwise skip over reinserted elements to continue correctly.

Example (Linear Probing Delete)

struct DataItem* delete(struct DataItem* item) { int key = item->key; // get the hash int hashIndex = hashCode(key); // move in array until an empty while(hashArray[hashIndex] != NULL) { if(hashArray[hashIndex]->key == key) { struct DataItem* temp = hashArray[hashIndex]; // assign a dummy item at deleted position hashArray[hashIndex] = dummyItem; return temp; } // go to next cell ++hashIndex; // wrap around the table hashIndex %= SIZE; } return NULL; }

Alternatives to using tombstones are:

  • Re-inserting items that follow the deleted slot (requires additional work but can remove tombstones), and
  • Using chaining, where deletion is simply removing the node from the chain.

Load Factor, Complexity and Rehashing

Load factor, usually denoted α, is defined as:

α = number_of_elements / table_size

It helps determine expected performance. Typical guidelines:

  • Keep α reasonably small (often < 0.7 for open addressing) to ensure constant-time average operations.
  • If using chaining, average-case time for search/insert/delete is O(1 + α) because each bucket holds on average α elements.
  • For open addressing, average-case time is O(1) when α is bounded away from 1; worst-case time is O(n) when many collisions occur or when table is nearly full.
  • When α grows beyond a threshold, perform rehashing: allocate a larger table (often about twice the size, preferably a prime number for division-method hashing) and reinsert all existing elements using the new hash function/size.

Properties of a Good Hash Function

  • Deterministic: same key always produces same index.
  • Uniform distribution: spreads keys evenly across table indices to minimise collisions.
  • Fast to compute.
  • Avoids clustering and patterns present in input data.
  • For integer keys, common methods are the division method (key mod m) and the multiplication method. For strings or complex keys, combine character contributions (for example polynomial rolling hash) and then reduce to index.

Applications

  • Symbol tables in compilers and interpreters (mapping names to information).
  • Implementing associative arrays, dictionaries and maps.
  • Caches and memoisation tables.
  • Databases and indexing where quick key-based lookup is required.
  • Sets and membership testing.

Summary: Hash tables provide efficient key-based storage and retrieval by converting keys to array indices using hash functions. Collisions must be handled using strategies such as chaining or open addressing (linear probing, quadratic probing, double hashing). Proper choice of hash function, table size and load factor management (rehashing) are essential to maintain average-case O(1) performance for insert, search and delete operations.

The document Hashing is a part of the Computer Science Engineering (CSE) Course Algorithms.
All you need of Computer Science Engineering (CSE) at this link: Computer Science Engineering (CSE)

FAQs on Hashing

1. What is hashing in computer science engineering (CSE)?
Ans. Hashing in computer science engineering (CSE) is a technique used to map data to a fixed-size array called a hash table. It is used to efficiently store and retrieve data by converting the data into an index of an array using a hash function. This allows for quick access and retrieval of data based on its unique key.
2. How does hashing work in computer science engineering (CSE)?
Ans. Hashing works by taking an input, such as a data value or key, and applying a hash function to it. The hash function converts the input into a fixed-size hash code, which is used as an index in the hash table. The data is then stored at this index, allowing for fast retrieval in constant time.
3. What are the advantages of using hashing in computer science engineering (CSE)?
Ans. There are several advantages of using hashing in computer science engineering (CSE): - Fast data retrieval: Hashing allows for constant-time retrieval of data, as the data can be directly accessed using the hash code as an index. - Efficient storage: Hashing allows for efficient storage of data, as it eliminates the need for sequential searching or sorting of data. - Collision resolution: Hashing techniques also provide methods to handle collisions, which occur when two different inputs produce the same hash code. These collisions can be resolved using techniques such as chaining or open addressing. - Security: Hash functions are widely used in computer security, such as password hashing, digital signatures, and data integrity checks.
4. What are the different types of hash functions used in computer science engineering (CSE)?
Ans. There are various types of hash functions used in computer science engineering (CSE), including: - Division method: This method calculates the hash code by dividing the key by the size of the hash table and taking the remainder as the hash code. - Multiplication method: This method involves multiplying the key by a constant value between 0 and 1, and then taking the fractional part of the result as the hash code. - Folding method: This method involves dividing the key into several parts and performing some arithmetic operations on those parts to obtain the hash code. - Digit extraction method: This method extracts certain digits from the key and uses them to calculate the hash code. - Mid-square method: This method involves squaring the key and then extracting a portion of the resulting value as the hash code.
5. How is collision resolution handled in hashing in computer science engineering (CSE)?
Ans. Collision resolution in hashing is the process of handling situations where two different inputs produce the same hash code. There are various methods to handle collisions, including: - Chaining: In this method, each index of the hash table contains a linked list of data elements. When a collision occurs, the data is stored in the linked list at the corresponding index. - Open addressing: In this method, when a collision occurs, the hash function is applied again to find the next available index in the hash table. This process continues until an empty slot is found. - Linear probing: This is a type of open addressing where the next available index is searched linearly, i.e., by incrementing the index by a fixed value until an empty slot is found. - Quadratic probing: This is another type of open addressing where the next available index is searched using a quadratic function, which involves incrementing the index by a quadratic value until an empty slot is found. - Double hashing: This method involves applying a second hash function to the key when a collision occurs, which helps in finding the next available index in the hash table.
Explore Courses for Computer Science Engineering (CSE) exam
Get EduRev Notes directly in your Google search
Related Searches
Free, Previous Year Questions with Solutions, Extra Questions, Objective type Questions, practice quizzes, study material, past year papers, Hashing, Sample Paper, Semester Notes, pdf , Hashing, Summary, shortcuts and tricks, Viva Questions, Important questions, Hashing, video lectures, MCQs, mock tests for examination, ppt, Exam;