Linear Hashing
In this blog post, I will give an introduction to a hashing methodology called Linear Hashing, which is one of the Dramatic Hashing methods.
Linear Hashing
Linear Hashing is a dynamic hashing scheme. It allows the hash table size to grow in a linear fashion; one bucket at a time, and that is where the method gets its name from. At any given point of time, this method works with at most two hashing functions. The hash function changes its nature underneath dynamically and the hash table algorithms take care of using the right hashing algorithm for insert(), lookup(), remove() operations on a given key. Linear Hashing uses the chaining or overflow bucket approach. This along with the fact that hash table size grows by exactly one bucket at a time are two important facts about this method. ## The mathematical fundamental of Linear Hashing
We assume that there is a sequence of KEYs: 5, 9 13 and we have the following two hash functions:
H1(KEY) = KEY \% 4
H2(KEY) = KEY \% 8
Now calculate the VALUEs with the two hash functions:
H1():
5,9,13 \% 4 = 1
H2():
5,13 \% 8 = 5
9 \% 8=1
We can conclude from the above equations that:
KEY \% n = M
KEY \% 2n = M \ or \ M \ + \ n
Highlevel points
 We start with “N” as the initial size of the hash table where “N” is some power of $2$. Note that “N” is the number of main hash buckets in the hash table and not the overflow buckets.
 Buckets are numbered through 0 to N1

N=2^{I}
, obviously I is an integer.  The hash function
H_{i}(K)
will first compute a bitstring R using the KEY K.  We use the least I bits in the bitstring R as the bucket index for the given KEY K. An example: H(2) = 010, I = 2, where we store the record [2,Value] in bucket[10].
 An attempt to insert a record into a full bucket will cause a split(or you may say resize). A split pointer S indicates which bucket to split.

load factor = (number of items in the hash table)/(number of buckets)
. Load factor indicates the degree of fullness. Usually we use this load factor as the trigger for a split.  When a resize happens, we split a particular bucket B by create a new bucket B*, and rehashing the contents of B, after which the contents will be distributed among B and B*. The split pointer is incremented to point to the subsequent bucket after every split.
 A threshold value of load factor L is set (say 75% or something). It implies that whenever the hash table is 75% full, the hash table will be resized. In case of any hashing scheme, this would mean increasing the size of the hash table by a certain number of new buckets followed by a rehash. However, in the case of linear hashing, this would imply:
 Creating a new bucket B where B = N + B; N is the then size of the hash table, and B is the bucket we are splitting.
 Rehashing the contents of bucket B (pointed to by S) between B and B*. This is what is meant by the split.
One of the most important facts about Linear Hashing that distinguishes the technique from its counterparts is the bucket that is split is not necessarily the bucket that overflowed.  The split is done in a linear bucket number order. The trigger for the split can be anything. It may even be every time we attach an overflow bucket to a particular bucket’s chain, we split. But yes, the bucket that we will split will always be the one pointed to by S which may not be the bucket that just overflowed.
 This is completely different from extendible hashing technique where we always split the bucket that overflowed.
A simple example
Let\'s start with a simple example where the hash table has 2 buckets numbered 0 and 1. We assume that each bucket can store 2 records (KV items) before we chain out and create an overflow bucket. Similarly, each overflow bucket will store only 1 record. We use MOD as the hash function and integers as the KEYs for the hash table. Following operations are carries out:
Operation  KEY  R=H(KEY) 

Put  2  010 
Put  1  001 
Put  3  011 
Put  4  100 
The hignlighted LSB in the bitstring R is really the bucket index and is also the result of hash function H1(K)=K%2. Remember that:
 The hash table\'s size is
N=2^{1}
,  so I = 1, and we use the least one bit of R as the bucket index for the given key
In order to make the representation simple, the value (say V) of each [K, V] record is hidden in the following tables.
Split pointer  Bucket number  Records  Overflow KEYs 

*  0  2,4  
1  1,3 
Now we Get(2), H(2) = 0 => bucket 0 Let\'s look into split(). To trigger split, let\'s do one more Put():
Operation  KEY  R=H(KEY) 

Put  5  101 
H(5)=5%2=1 (or the least one bit of R is 1). Bucket 1 is full, so create an overflow bucket and allocate memory for it, then store this KV pair in this new bucket. Link the new bucket to the overflow chain of bucket
 Now the hash table looks like:
Split pointer  Bucket number  Records  Overflow KEYs 

*  0  2,4  
1  1,3  5 
We split the bucket pointed by split pointer S (bucket 0) before returning from Put().  Create a new bucket B*
Split pointer  Bucket number  Records  Overflow KEYs 

*  0 (B)  2,4  
1  1,3  5  
x (B*) 
 Rehash items in B
 This is done using an alternate hash function H2(Key)=Key%(2*N). This is because we no longer have 2 buckets but 3 now and so I should be 2.
 Another representation of H2(Key)=Key%(2*N): using one more bit of bitstring of the Key.
 2=(010), 4=(100), so store [2,V] in bucket[10] and [4,V] in bucket[00].
 Increment S and now it points to bucket[1].
Split pointer  Bucket number  Records  Overflow KEYs 

00  4  
*  1  1,3  5 
10  2 
Let\'s talk further about split().
Operation  KEY  R=H(KEY) 

Put  14  1110 
Wait, which is our choice? H1() or H2()? H1(14)=0 (00). The result is smaller than(before) split pointer. So H2(14)=2 (10). The [14, V] goes to bucket[10] then.
Split pointer  Bucket number  Records  Overflow KEYs 

  00  4  
*  1  1,3  5 
  10  2,14 
New request comes:
Operation  KEY  R=H(KEY) 

Get  2  0010 
H1(2)=0 (00) is smaller than(before) split pointer. So H2(2) = (10). Bucket[10] stores [2, V]! Hence, the split pointer not only indicates which bucket to split but also is a flag which tells us what should we choose between H1() or H2(). Let\'s do one more Put().
Operation  KEY  R=H(KEY) 

Put  15  1111 
H1(15)=1 (01). Why we just use H1()? It is because the resulting bucket[01] has not been split(in the current round).
 We create a new overflow bucket, allocate memory for it, and then link it to the chain.
Split pointer  Bucket number  Records  Overflow KEYs 

00  4  
*  1  1,3  [5], [15] 
10  2,14 
Note that [5] and [15] are stored in two overflow buckets.  Split bucket[1]
 Now split pointer points to bucket B
 Create a new bucket B*
 Rehash items in bucket B
 We have 4 buckets now, I = 2 is enough. So H2() is still valid now.
 H2(1)=01, H2(3)=11, H2(5)=01, H2(15)=11
Spliting:
Split pointer  Bucket number  Records  Overflow KEYs 

00  4  
*  1 (B)  1,3  [5], [15] 
10  2,14  
x (B*) 
After split:
Split pointer  Bucket number  Records  Overflow KEYs 

*  00  4  
01 (B)  1,5  
10  2,14  
11 (B*)  3,15 
We started out with N=2 buckets. Now both the buckets are split, and hash table size grew to 4 buckets. This completes a single round of linear hashing. Now N = 4, and the split pointer points to bucket[00]. We can start a new round. In the new round, H1(Key) = old H2(Key) = Key%4, H2(Key) = Key%8. That means we use 2 LSBs(before split) or 4 LSBs(after split) as bucket index. Now have a look of another round.
Operation  KEY  R=H(KEY) 

Put  8  1000 
Put  12  1100 
Split pointer  Bucket number  Records  Overflow KEYs 

*  00  4,8  12 
  01  1,5  
  10  2,14  
  11  3,15 
Split
Split pointer  Bucket number  Records  Overflow KEYs 

*  000 (B)  4,8  12 
01  1,5  
10  2,14  
11  3,15  
x (B*) 
4: 01008: 100012:1100
Split pointer  Bucket number  Records  Overflow KEYs 

000 (B)  4  
*  01  1,5  
10  2,14  
11  3,15  
100 (B*)  4,12 
This round of linear hashing will continue until buckets[01] [10] [11] are split, and the table at the end would be 8 buckets.