Published by Patrick Mutisya · 8 days ago
Hashing is a technique for locating a record directly by applying a hash function to a key value. The result of the function is an address (or bucket number) in a file, allowing constant‑time average access.
A hash function \$h\$ maps a key \$k\$ from a large universe \$U\$ to an index in a smaller range \$[0, m-1]\$, where \$m\$ is the number of buckets (or slots) in the file.
Typical forms include:
Because \$m\$ is usually much smaller than \$|U|\$, different keys can produce the same hash value. Two main families of techniques handle these collisions.
All records are stored directly in the hash table. When a collision occurs, the algorithm probes other slots according to a systematic sequence until an empty slot is found.
Each bucket contains a linked list (or another secondary structure) of all records that hash to the same index.
| Method | Space Overhead | Average Search Cost (successful) | Average Search Cost (unsuccessful) | Typical Use Cases |
|---|---|---|---|---|
| Linear probing | Low (single array) | \$\frac{1}{2}\left(1+\frac{1}{1-\alpha}\right)\$ | \$\frac{1}{2}\left(1+\frac{1}{(1-\alpha)^2}\right)\$ | When memory is tight and load factor \$\alpha < 0.7\$ |
| Quadratic probing | Low | Similar to linear probing, but reduces primary clustering | Similar to linear probing | When clustering must be limited but array size is fixed |
| Double hashing | Low | Close to optimal for open addressing | Close to optimal for open addressing | High performance requirements with moderate load factor |
| Separate chaining | Higher (pointers or secondary structures) | \$1 + \frac{\alpha}{2}\$ | \$1 + \alpha\$ | When dynamic growth is needed or load factor may exceed 1 |
Suppose we store student records keyed by a 7‑digit student number. We choose \$m = 101\$ (a prime) and use the division method:
\$h(k) = k \bmod 101\$
For the key \$k = 1234567\$:
\$h(1234567) = 1234567 \bmod 101 = 1234567 - 101 \times \left\lfloor\frac{1234567}{101}\right\rfloor = 1234567 - 101 \times 12223 = 84\$
The record is placed in bucket 84. If another student has key \$k = 9876543\$ and also hashes to 84, a collision occurs and is resolved using the chosen strategy (e.g., separate chaining).
Question: A file uses separate chaining with \$m = 13\$ buckets. The hash function is \$h(k) = k \bmod 13\$. Insert the keys 27, 40, 13, 55, 68, and 81. Show the resulting bucket contents and calculate the load factor.
Answer outline: