Hashing Tutorial

Section 6.4 - Double Hashing

Both pseudo-random probing and quadratic probing eliminate primary clustering, which is the name given to the the situation when keys share substantial segments of a probe sequence. If two keys hash to the same home position, however, then they will always follow the same probe sequence for every collision resolution method that we have seen so far. The probe sequences generated by pseudo-random and quadratic probing (for example) are entirely a function of the home position, not the original key value. This is because function p ignores its input parameter K for these collision resolution methods. If the hash function generates a cluster at a particular home position, then the cluster remains under pseudo-random and quadratic probing. This problem is called secondary clustering.

To avoid secondary clustering, we need to have the probe sequence make use of the original key value in its decision-making process. A simple technique for doing this is to return to linear probing by a constant step size for the probe function, but to have that constant be determined by a second hash function, h2. Thus, the probe sequence would be of the form p(K, i) = i * h2(K). This method is called double hashing.

Use this applet to try out double hashing for yourself. Insert several values that all hash to the same slot. You should see that they follow different probe sequences.

A good implementation of double hashing should ensure that all of the probe sequence constants are relatively prime to the table size M. This can be achieved easily. One way is to select M to be a prime number, and have h2 return a value in the range 1 <= h2(K) <= M-1. Another way is to set M = 2m for some value m and have h2 return an odd value between 1 and 2m.