# Hashing Tutorial

## Section 5 - Collision Resolution

We now turn to the most commonly used form of hashing: closed hashing with no bucketing, and a collision resolution policy that can potentially use any slot in the hash table.

During insertion, the goal of collision resolution is to find a free
slot in the hash table when the home position for the record is
already occupied.
We can view any collision resolution method as generating a sequence
of hash table slots that can potentially hold the record.
The first slot in the sequence will be the home position for the key.
If the home position is occupied, then the collision resolution policy
goes to the next slot in the sequence.
If this is occupied as well, then another slot must be found, and
so on.
This sequence of slots is known as the
**probe sequence**, and it is generated by some
**probe function** that we will call **p**.
Insertion works as follows:

// Insert e into hash table HT void hashInsert(const Key& k, const Elem& e) { int home; // Home position for e int pos = home = h(k); // Init probe sequence for (int i=1; EMPTYKEY != (HT[pos]).key(); i++) { pos = (home + p(k, i)) % M; // probe if (k == HT[pos].key()) { cout << "Duplicates not allowed\n"; return; } } HT[pos] = e; }

Method "hashInsert" first checks to see if the home slot for the
key is empty.
If the home slot is occupied, then we use the probe function
**p**(*k*, *i*) to locate a free slot in the table.
Function **p** has two parameters, the key *k* and a
count *i* of where in the probe sequence we wish to be.
That is, to get the first position in the probe sequence after the
home slot for key *K*, we call **p**(*K*, 1).
For the next slot in the probe sequence, call **p**(*K*, 2).
Note that the probe function returns an offset from the original home
position, rather than a slot in the hash table.
Thus, the `for` loop in "hashInsert" is computing positions
in the table at each iteration by adding the value returned from the
probe function to the home position.
The *i*th call to **p** returns the *i*th offset to be used.

Searching in a hash table follows the same probe sequence that was followed when inserting records. In this way, a record not in its home position can be recovered. A C++ implementation for the search procedure is as follows.

// Search for the record with Key K bool hashSearch(const Key& K, Elem& e) const { int home; // Home position for K int pos = home = h(K); // Initial position is the home slot for (int i = 1; (K != (HT[pos]).key()) && (EMPTYKEY != (HT[pos]).key()); i++) pos = (home + p(K, i)) % M; // Next on probe sequence if (K == (HT[pos]).key()) { // Found it e = HT[pos]; return true; } else return false; // K not in hash table }

Both the insert and the search routines assume that at least one slot on the probe sequence of every key will be empty. Otherwise they will continue in an infinite loop on unsuccessful searches. Thus, the hash system should keep a count of the number of records stored, and refuse to insert into a table that has only one free slot.

The discussion on bucket hashing presented a simple method of
collision resolution.
If the home position for the record is occupied, then move down
the bucket until a free slot is found.
This is an example of a technique for collision resolution
known as **linear probing**.
The probe function for simple linear probing is
**p**(*K*, *i*) = *i*.
That is, the *i*th offset on the probe sequence is just *i*,
meaning that the *i*th step is simply to move down *i*
slots in the table.
Once the bottom of the table is reached, the probe sequence
wraps around to the beginning of the table.
Linear probing has the virtue that all slots in the table will be
candidates for inserting a new record before the probe sequence
returns to the home position.

Use this applet to try out linear probing for yourself. Can you see any reason why this might not be the best approach to collision resolution?

While linear probing is probably
the first idea that comes to mind when considering collision
resolution policies, it is not the only one possible.
Probe function **p** allows us many options for how to do collision
resolution.
In fact, linear probing is one of the worst collision resolution
methods.
The main problem is illustrated by the figure below.
Here, we see a hash table of ten slots used to store four-digit
numbers.
The hash function used is **h**(*K*) = *K* mod 10.
The four values 1001, 9050, 9877, and 2037 are inserted into the table.
Then the value 1059 is added to the hash table.

The ideal behavior for a collision resolution mechanism is that each empty slot in the table will have equal probability of receiving the next record inserted (assuming that every slot in the table has equal probability of being hashed to initially). In this example, assume that the hash function gives each slot (roughly) equal probability of being the home position for the next key. However, consider what happens to the next record if its key has its home position at slot 0. Linear probing will send the record to slot 2. The same will happen to records whose home position is at slot 1. A record with home position at slot~2 will remain in slot 2. Thus, the probability is 3/10 that the next record inserted will end up in slot 2. In a similar manner, records hashing to slots 7 or 8 will end up in slot 9. However, only records hashing to slot 3 will be stored in slot 3, yielding one chance in ten of this happening. Likewise, there is only one chance in ten that the next record will be stored in slot 4, one chance in ten for slot 5, and one chance in ten for slot 6. Thus, the resulting probabilities are not equal.

To make matters worse, if the next record ends up in slot 9
(which already has a higher than normal chance of happening),
then the following record will end up in slot 2 with probability
6/10.
This is illustrated in the right side of the figure.
This tendency of linear probing to cluster items together is known as
**primary clustering**.
Small clusters tend to merge into big clusters, making the problem
worse.
The objection to primary clustering is that it leads to
long probe sequences.

Now go back up to the applet, and enter the numbers shown in the figure. Be sure to click the "Show Count" button before you do this, so that you can see how many home slots will probe to a given free slot.