Software Development Exam  >  Software Development Notes  >  String Hashing

String Hashing - Software Development PDF Download

Introduction

String hashing is a powerful technique used in competitive programming to efficiently process and compare strings. It involves converting a string into a numeric value called a hash, which can be used for various tasks like duplicate detection, substring matching, and more. In this article, we will explore the concept of string hashing in detail, including its calculation, applications, and algorithms.

Calculation of the Hash of a String

To calculate the hash of a string, we assign a unique numeric value to each character in the string. There are various methods to achieve this, but one common approach is to use the ASCII values of the characters. We can assign a prime number to each character, multiply it with the ASCII value, and sum up these values to obtain the hash of the string.

Here's an example algorithm to calculate the hash of a string:

function calculateHash(string):

    primeNumbers = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103]

    hashValue = 0

    for i in range(length(string)):

        hashValue += primeNumbers[i] * ASCII(string[i])

    return hashValue

In this algorithm, 'primeNumbers' is an array containing prime numbers. We use these primes to multiply with the ASCII values of the characters in the string. The function 'ASCII(char)' returns the ASCII value of the character 'char'.

Example Tasks

Search for Duplicate Strings in an Array of Strings

String hashing can efficiently detect duplicate strings in an array. We can calculate the hash for each string and store it in a hash table or a set data structure. If we encounter a hash collision (two strings with the same hash), we can compare the actual strings to confirm the duplication.
Here's an example implementation in Python:

def findDuplicateStrings(strings):

    seenHashes = set()

    duplicates = set()

    for string in strings:

        stringHash = calculateHash(string)

        if stringHash in seenHashes:

            duplicates.add(string)

        else:

            seenHashes.add(stringHash)

    return duplicates

In this code, strings is the array of 'strings' we want to search for duplicates. We maintain a set 'seenHashes' to store the hashes we have seen so far. If we encounter a hash that is already present in 'seenHashes', we add the string to the 'duplicates' set.

Fast Hash Calculation of Substrings of a Given String

String hashing can also be used to calculate the hash of substrings efficiently. Instead of recomputing the hash of a substring from scratch, we can utilize the hash of the original string and some mathematical properties.
Consider a string 'S' of length 'n'. Let 'hash(S)' be the hash of the whole string. To calculate the hash of a substring 'S[i:j]' (inclusive), we can use the following formula:

hash(S[i:j]) = hash(S[0:j]) - hash(S[0:i-1]) * prime^(j-i+1)

Here, 'hash(S[0:j])' represents the hash of the prefix ending at index 'j', and 'hash(S[0:i-1])' represents the hash of the prefix ending at index 'i-1'. By subtracting the product of the prefix hash and the appropriate power of the prime, we can get the hash of the substring.

Applications of Hashing

String hashing finds applications in various problem scenarios in competitive programming:

  • Anagram detection: Hashing can be used to detect if two strings are anagrams by comparing their character frequencies.
  • Pattern matching: By hashing patterns and sliding windows over strings, we can quickly find matches in linear time complexity.
  • Substring matching: Hashing allows efficient substring matching by calculating hashes for substrings and comparing them.
  • Cryptography: Hash functions play a vital role in cryptographic algorithms to ensure data integrity and security.

Determine the Number of Different Substrings in a String

Given a string, we can determine the number of different substrings it contains using string hashing. We calculate the hash of all substrings of the string and store them in a set data structure. Since a set only keeps unique elements, the size of the set will give us the count of different substrings.

def countDifferentSubstrings(string):

    substrings = set()

    n = len(string)

    for i in range(n):

        currentHash = 0

        for j in range(i, n):

            currentHash = calculateHash(string[i:j+1])

            substrings.add(currentHash)

    return len(substrings)

In this code, we iterate over all possible substrings of the input string and calculate their hashes. We add each hash to the 'substrings' set to ensure uniqueness. Finally, we return the count of different substrings by evaluating the size of the set.

Improve No-Collision Probability

To reduce the chances of hash collisions, it is recommended to choose a large prime number as the base for the hashing calculation. Using a larger prime number reduces the likelihood of two different strings producing the same hash value.

Practice Problems with Solutions

Problem 1: Given an array of strings, find all pairs of strings that are anagrams of each other.

Calculate the hash of each string after sorting its characters. Use a hash table to group anagrams based on their calculated hashes.

Problem 2: Given a string 'S' and a pattern 'P', find all occurrences of the pattern in the string.

Calculate the hash of the pattern 'P' and use the sliding window technique to calculate the hash of substrings of the string 'S'. Compare the hashes to find matches.

Problem 3: Given a string 'S', find the longest palindromic substring within 'S'.

Calculate the hash of all substrings of 'S' and check if the substring is a palindrome by comparing characters. Keep track of the longest palindromic substring encountered.

Note: Remember to implement the necessary string hashing functions and choose appropriate data structures to solve these problems efficiently.

Conclusion

String hashing is a powerful technique for processing and comparing strings in competitive programming. By assigning unique numeric values to characters, we can calculate hashes and perform various operations efficiently. From duplicate detection to substring matching, string hashing finds applications in a wide range of problem scenarios. Understanding the concepts and implementing the algorithms discussed in this article will enable you to leverage the power of string hashing in your competitive programming journey.

The document String Hashing - Software Development is a part of Software Development category.
All you need of Software Development at this link: Software Development
Download as PDF

Top Courses for Software Development

Related Searches

MCQs

,

shortcuts and tricks

,

String Hashing - Software Development

,

Semester Notes

,

Sample Paper

,

Extra Questions

,

Important questions

,

Previous Year Questions with Solutions

,

String Hashing - Software Development

,

Objective type Questions

,

video lectures

,

Free

,

practice quizzes

,

Viva Questions

,

pdf

,

past year papers

,

String Hashing - Software Development

,

Summary

,

ppt

,

study material

,

Exam

,

mock tests for examination

;