# Hashing Tutorial

## Section 6 - Improved Collision Resolution Methods

## Section 6.1 - Linear Probing by Steps

How can we avoid primary clustering?
One possible improvement might be to use linear probing,
but to skip slots by some constant *c* other than 1.
This would make the probe function
**p**(*K*, *i*) = *ci*,
and so the *i*th slot in the probe sequence will be
**h**(*K*) + *ic*) mod *M*.
In this way, records with adjacent home positions will not follow the
same probe sequence.
For example, if we were to skip by twos, then our offsets from the home
slot would be 2, then 4, then 6, and so on.

One quality of a good probe sequence is that it will cycle through
all slots in the hash table before returning to the home position.
Clearly linear probing (which "skips" slots by one each time) does this.
Unfortunately, not all values for *c* will make this happen.
For example, if *c* = 2 and the table contains an even number of
slots, then any key whose home position is in an even slot will have
a probe sequence that cycles through only the even slots.
Likewise, the probe sequence for a key whose home position is in an
odd slot will cycle through the odd slots.
Thus, this combination of table size and linear probing constant
effectively divides the records into two sets stored in two
disjoint sections of the hash table.
So long as both sections of the table contain the same number of records,
this is not really important.
However, just from chance it is likely that one section will become
fuller than the other, leading to more collisions and poorer
performance for those records.
The other section would have fewer records, and thus better
performance.
But the overall system performance will be degraded,
as the additional cost to the side that is more full outweighs the
improved performance of the less-full side.

Constant *c* must be relatively prime to *M* to generate a
linear probing sequence that visits all slots in the table
(that is, *c* and *M* must share no factors).
For a hash table of size *M* = 10, if *c*
is any one of 1, 3, 7, or 9,
then the probe sequence will visit all slots for any key.
When *M* = 11, any value for *c* between 1 and 10 generates a
probe sequence that visits all slots for every key.

This applet will let you try different step sizes for collision resolution.
As you try this, be sure to pick keys to insert that will generate lots of
collisions so that you can try this out.
A good way to do this is to pick keys that are mostly multiples of the table
size, which will hash to slot 0 in the table.
Try some values for *c* that will visit every slot in the table
if there are enough collisions.
Try some values for *c* that will not visit every slot in the table.
What happens when we pick a bad value for *c* and the subset of slots
reachable are all full?

This applet shows one performance effect of using a collision resolution method that generates disjoint tables. The graph shows the probability that an insert operation will fail as the hash table fills up. When the probe function visits every slot of the table, insertion can never fail unless the hash table is full. But if the probe function does not visit every slot (such as when the step size for linear probing is 2 and the table size is even), then it is possible for one of the sub-tables to become full before the entire table is full. In this graph, the step size is always 2.

Consider the situation where *c* = 2 and we wish to insert a record
with key *k*_{1} such that
**h**(*k*_{1}) = 3.
The probe sequence for *k*_{1} is 3, 5, 7, 9, and so on.
If another key *k*_{2} has home position at slot 5,
then its probe sequence will be 5, 7, 9, and so on.
The probe sequences of *k*_{1} and *k*_{2}
are linked together in a manner that contributes to clustering.
In other words, linear probing with a value of *c* > 1 does not
solve the problem of primary clustering.
We would like to find a probe function that does not link
keys together in this way.
We would prefer that the probe sequence for *k*_{1}
after the first step on the sequence should not be identical to the
probe sequence of *k*_{2}.
Instead, their probe sequences should diverge.