Table of contents |
|
Introduction |
|
Calculation of the Hash of a String |
|
Example Tasks |
|
Applications of Hashing |
|
Practice Problems with Solutions |
|
String hashing is a powerful technique used in competitive programming to efficiently process and compare strings. It involves converting a string into a numeric value called a hash, which can be used for various tasks like duplicate detection, substring matching, and more. In this article, we will explore the concept of string hashing in detail, including its calculation, applications, and algorithms.
To calculate the hash of a string, we assign a unique numeric value to each character in the string. There are various methods to achieve this, but one common approach is to use the ASCII values of the characters. We can assign a prime number to each character, multiply it with the ASCII value, and sum up these values to obtain the hash of the string.
Here's an example algorithm to calculate the hash of a string:
function calculateHash(string):
primeNumbers = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103]
hashValue = 0
for i in range(length(string)):
hashValue += primeNumbers[i] * ASCII(string[i])
return hashValue
In this algorithm, 'primeNumbers' is an array containing prime numbers. We use these primes to multiply with the ASCII values of the characters in the string. The function 'ASCII(char)' returns the ASCII value of the character 'char'.
String hashing can efficiently detect duplicate strings in an array. We can calculate the hash for each string and store it in a hash table or a set data structure. If we encounter a hash collision (two strings with the same hash), we can compare the actual strings to confirm the duplication.
Here's an example implementation in Python:
def findDuplicateStrings(strings):
seenHashes = set()
duplicates = set()
for string in strings:
stringHash = calculateHash(string)
if stringHash in seenHashes:
duplicates.add(string)
else:
seenHashes.add(stringHash)
return duplicates
In this code, strings is the array of 'strings' we want to search for duplicates. We maintain a set 'seenHashes' to store the hashes we have seen so far. If we encounter a hash that is already present in 'seenHashes', we add the string to the 'duplicates' set.
String hashing can also be used to calculate the hash of substrings efficiently. Instead of recomputing the hash of a substring from scratch, we can utilize the hash of the original string and some mathematical properties.
Consider a string 'S' of length 'n'. Let 'hash(S)' be the hash of the whole string. To calculate the hash of a substring 'S[i:j]' (inclusive), we can use the following formula:
hash(S[i:j]) = hash(S[0:j]) - hash(S[0:i-1]) * prime^(j-i+1)
Here, 'hash(S[0:j])' represents the hash of the prefix ending at index 'j', and 'hash(S[0:i-1])' represents the hash of the prefix ending at index 'i-1'. By subtracting the product of the prefix hash and the appropriate power of the prime, we can get the hash of the substring.
String hashing finds applications in various problem scenarios in competitive programming:
Given a string, we can determine the number of different substrings it contains using string hashing. We calculate the hash of all substrings of the string and store them in a set data structure. Since a set only keeps unique elements, the size of the set will give us the count of different substrings.
def countDifferentSubstrings(string):
substrings = set()
n = len(string)
for i in range(n):
currentHash = 0
for j in range(i, n):
currentHash = calculateHash(string[i:j+1])
substrings.add(currentHash)
return len(substrings)
In this code, we iterate over all possible substrings of the input string and calculate their hashes. We add each hash to the 'substrings' set to ensure uniqueness. Finally, we return the count of different substrings by evaluating the size of the set.
To reduce the chances of hash collisions, it is recommended to choose a large prime number as the base for the hashing calculation. Using a larger prime number reduces the likelihood of two different strings producing the same hash value.
Problem 1: Given an array of strings, find all pairs of strings that are anagrams of each other.
Calculate the hash of each string after sorting its characters. Use a hash table to group anagrams based on their calculated hashes.
Problem 2: Given a string 'S' and a pattern 'P', find all occurrences of the pattern in the string.
Calculate the hash of the pattern 'P' and use the sliding window technique to calculate the hash of substrings of the string 'S'. Compare the hashes to find matches.
Problem 3: Given a string 'S', find the longest palindromic substring within 'S'.
Calculate the hash of all substrings of 'S' and check if the substring is a palindrome by comparing characters. Keep track of the longest palindromic substring encountered.
Note: Remember to implement the necessary string hashing functions and choose appropriate data structures to solve these problems efficiently.
String hashing is a powerful technique for processing and comparing strings in competitive programming. By assigning unique numeric values to characters, we can calculate hashes and perform various operations efficiently. From duplicate detection to substring matching, string hashing finds applications in a wide range of problem scenarios. Understanding the concepts and implementing the algorithms discussed in this article will enable you to leverage the power of string hashing in your competitive programming journey.