- How open addressing works: There are several ways of doing open addressing. Methods like linear probing and quadratic probing use a linear or quadratic functions to find a sequence of slots, for example a linear function, h(k) = (h’(k) + i )mod m finds sequences starting from h’(k) and linearly incrementing by i. However this has problems when 2 keys have the same initial h’(k), they will have the same probing sequence.
- Another alternative is to use double hashing, where the probing sequence is found by a function of two hash functions, for example: h(k) = (h1(k) + ih2(k)) mod m, since the probing sequence depends on two hash functions, the sequence would be less likely the same for two keys resulting in better search and insertion time.
Universal Hashing. For a fixed hash function, an adversary can easily choose a set of keys that can hash to the same bucket. To overcome this problem, universal hashing is performed. In universal hashing, the hash function is chosen randomly from a set of hash functions before the data is inserted. For example: If m is the number of keys, then the hash function could be something like, h(k) = (a.k + b) mod m where a and b are chosen randomly before insertion. The idea behind universal hashing is that, by choosing the hash function randomly, the probability that the chosen hash function would be bad resulting in undesirable collisions would be low.
Perfect Hashing. In case of perfect hashing, the linear time search of linked lists during collision is reduced/avoided by using hashtable of hashtables. Both the primary and secondary hashtables use universal hashing to decide the hashfunction. Therefore the primary hashtable will be having one randomly chosen hashfunction and each of the secondary hashtables will be having one randomly chosen hashfunction.
Idea behind Perfect hashing. It has been proved that if n is the number of keys and m is the number of slots in a hashtable, then if m = n^2 and if the hash function is chosen randomly, then the probability of collisions of two keys in this hashtable would be lesser than ½ which is low.
Why we need two levels of hashtables in Perfect hashing. Since it is not affordable to have an hashtable of capacity n^2, we use a secondary hashtable. The number of elements colliding onto the same secondary hashtable is low since we use universal hashing, and therefore we can afford to have nj ^ 2 capacity secondary hashtables where nj is the number of elements hashing to the same secondary hashtable Sj.
What will be the memory requirement for perfect hashing. It has been proven that, when both primary and secondary hashtables use universal hashing and when secondary hashtables have nj^2 capacity, then the total memory slots required for perfect hashing would be upper bounded by 2n.
Note. Collisions on secondary hashtables although have low probability, still needs to be resolved by linked lists.