Data Structures- Hashing

HASHING
BY
B.HEMALATHA , AP-CSE
VELAMMAL ENGINEERING COLLEGE

Topics to be discussed
•HASHING
•HASH FUNCTION
•COLLISION
•COLLISION HANDLING
•REHASHING
•EXTENDIBLE HASHING
•APPLICATIONS
2

Hashing
• Hashing is the process of indexing and retrieving element (data) in a
data structure to provide a faster way of finding the element using a
hash key or hash value generated using hash function.
3

Example 1: Hashing - Phone book
• Hash table size m = 5
• Hash function h(k) = (length of the key k) mod 5
4

Example 2: Hashing
• Keys k = 89, 64, 35,100, 47
• Hash table size m = 10
• Hash function h(k) = (key k) mod 10
5
Key Hash function
h(k) = k % 10
89 9
64 4
35 5
100 0
47 7
0 100
1
2
3
4 64
5 35
6
7 47
8
9 89
5

Why hashing?
• Many applications deal with lots of data
 eg. Search engines and web pages
Requirement : Time Critical Look Ups
• Implemented with Data structures like
a. Arrays and Lists
b. BST
c. Hash Tables
Solution: Hash tables with Hashing improves searching
with CONSTANT TIME
6
linear time for look ups O(n)
look-ups in near constant time
O(1)
linear time for look ups O(n)

Hashing revisited
Keys
• Elements to be
stored
Hash Function
• Maps keys to
hash value
Hash value or
Hash key
• Index in range 0
to m-1
Hash Table
• Data structure to
store elements
(array of size m)
7

Hash Function
• Mapping of keys to indices of a hash table is called hash function
Keys Hash key in range 0 to TableSize m-1
• Comprises of 2 maps
Hash code map
Compression map
Key Integer Hash Index in range (0…,m-1)
where m is size of hash table
8
mapping
Hash code
map
Compression
map

Hash Function
• A hash function h maps keys of a given type to integers in a
fixed interval [0,……,m - 1]
h(k) hash value of k
9

Good Hash Function
• Quick to compute
• Map equal keys to equal indices
• Distributes keys uniformly throughout the table
• Minimises probability of COLLISION
10
KEY HASH
FUNCTION
HASH KEY
KEY 1
HASH
FUNCTION
SAME
HASH KEY
KEY 2

Hash Function
• Deal with non-integer keys
• Integer cast: interpret the bits of the key as integer
• Sum of ASCII value of characters in string as integer
• Component sum: partition the bits of the key into parts of fixed length
combine the components to one integer using sum
11

Hash Function
• Mid-square method: pick m bits from the middle of k2
• Division method : h(k) = k mod m
where k = key and m=TableSize
Note: If m is prime it ensures uniform
distribution
12

Hash Function for Division method
13

Hash Table
For TableSize = m and hashing function h(k) = k mod m
• m - prime (good) ensures uniform distribution
• m – power of 2 (bad) gives keys with same ending with same hash
value
LOAD FACTOR - measure of how full the table is
• α = 𝑛
𝑚
• Load factor mostly α < 1
• α grows - hash table becomes slower
• α bounded – maintains O(1) 14

Collision
• Two keys map to the same hash value
15
KEY 1
HASH
FUNCTION
SAME
HASH KEY
KEY 2

Example - Collision
Insert keys 89, 18, 49, 58, 69
16
Index Keys
0
1
2
3
4
5
6
7
8
9 89
Index Keys
0
1
2
3
4
5
6
7
8 18
9 89
Index Keys
0
1
2
3
4
5
6
7
8 18
9 89
Insert 89 Insert 18 Insert 49
h(k)= k mod Tablesize
= k % 10
h(89)=89 % 10
= 9
h(18) = 8 % 10
= 8
h(49) = 9 % 10
= 9
Collision occurs as
Slot 9 occupied by
89

1.Open Hashing - Separate Chaining
• Collision handled by
• Elements with same hash value
are kept in a list
• Each cell of the hash table points to a
linked list of elements mapped with
same hash value
18

Example - Separate Chaining
Insert keys 89, 27, 49, 55, 69 ,45
Key Hash function
h(k) = k % 10
89 9
27 7
49 9
55 5
69 9
45 5
19
h(k)= k mod Tablesize
= k % 10
0
1
2
3
4
5
6
7
8
9
45
49 69
55
27
89

Separate Chaining - Operations
• Search - hash function h(k) determines which list to traverse
- search the appropriate list
• Insert - hash function h(k) determines which list to insert
- check the list
- new element inserted at the front of the list
- duplicate element : an extra data member kept and
incremented
• Delete - hash function h(k) determines which list to traverse
- search the appropriate list
- delete the node in the list
20

Separate Chaining
• Advantage - Insert more elements
- Simple to implement
• Disadvantage
• Search an element in linked list O(n)
• Expensive - extra data structure, links, more unused
memory
• Cache performance of chaining is not good as keys are
stored using a linked list.
21

2. Closed Hashing or Open Addressing
• All elements are stored in the hash table (n<m)
• Each table entry contains either element or null
• Collision handled by : Systematically Probing to find
alternative empty slot
• Modify hash function taking probe i as second parameter
22

Open Addressing or Closed Hashing
• When collision occurs probing is done
Modify hash function for probing
hi(k) =( h( k ) + f ( i ) ) mod Tablesize with f(0) = 0
• Function f is the collision resolution strategy
• Probing : Slots h0(k), h1(k), h2(k), . . . are tried in succession
to find alternative slot until an empty slot is found
23

Open
Addressing
Linear
Probing
Quadratic
Probing
Double
Hashing
24

Linear Probing
Collision resolution strategy
Function f(i) = i where i is the probe parameter
Hashing function
hi(k) = [ h(k) + f(i) ] mod TableSize
= [ h(k) + i ] mod TableSize
Probe sequence: i iterating from 0 until alternative empty slot
0th probe = h(k) mod TableSize
1th probe = [ h(k) + 1] mod TableSize
2th probe = [ h(k) + 2] mod TableSize
. . .
ith probe = [ h(k) + i ]mod TableSize 25

Linear probing
Insert keys 89, 18, 49, 58, 69
26
Index Keys
0
1
2
3
4
5
6
7
8
9 89
Index Keys
0
1
2
3
4
5
6
7
8 18
9 89
Index Keys
0 49
1
2
3
4
5
6
7
8 18
9 89
hi(k) =[ h( k ) + i ] mod Tablesize
= [ h( k ) + i ] % 10
i=0
h0(89)
=[ h(89)+0 ] % 10
=[ 9+0 ] % 10
= 9
i=0
h0(18)
=[ h(18)+0 ] % 10
=[ 8+0 ] % 10
= 8
i=0
h0(49)
=[ h(49)+0 ] % 10
=[ 9+0 ] % 10
= 9
i=1
h1(49)
=[ h(49)+1 ]%10
=[9 +1] % 10
= 0
Collision occurs as
Slot 9 occupied by 89

Linear probing ………….. Contd.
Insert keys 89, 18, 49, 58, 69
27
Index Keys
0 49
1 58
2
3
4
5
6
7
8 18
9 89
Index Keys
0 49
1 58
2 69
3
4
5
6
7
8 18
9 89
Insert 58 Insert 69
i=0
h0(58)
=[ h(58)+0] % 10
=[ 8+0 ] % 10
= 8
(Collision)
i=0
h0(69)
=[ h(69)+0 ] % 10
= 9
(Collision)
i=1
h1(58)
=[ h(58)+1 ] % 10
=[ 8+1 ] % 10
= 9
(Collision)
i=2
h2(58)
=[ h(58)+2 ] % 10
=[ 8+2 ] % 10
= 0
(Collision)
i=3
h3(58)
=[ h(58)+3) % 10
=[ 8+3 ] % 10
= 1
i=1
h1(69)
=[ h(69)+1 ] % 10
= 0
(Collision)
i=2
h2(69)
=[ h(69)+2 ] % 10
= 1
(Collision)
i=3
h3(69)
=[ h(69)+3 ] % 10
= 2
hi(k) =[ h( k ) + i ] mod Tablesize
= [ h( k ) + i ] % 10

Insertion Routine
LinearProbeInsert(k)
if (table is full) error
probe = h(k) // probe= location
while (table [probe] occupied)
probe = (probe+1) mod m
table [probe] = k
28

Lookup in linear probing
• Continue looking at successive locations (Probing)
till k is successfully found
an empty location encountered
Search 55 : h(55) = 5
Search 6 : h(6) = 6
29
65 46 17 55
0 1 2 3 4 5 6 7 8 9
65 46 17 55
0 1 2 3 4 5 6 7 8 9
FOUND 55
EMPTY
UNSUCCESSFUL
SEARCH

Search Routine
LinearProbeSearch(k)
if (table is empty) error
probe = h(k) // probe= location
while (table [probe] occupied and table [probe]!=k )
probe = (probe+1) mod m
if table [probe] = k
return probe
else
not found
30

Deletion in Linear Probing
• Search for key to be deleted
• Delete the key
• Set location with marker / flag (X)
Rehash if more markers
Delete 15
31
65 46 15 58
0 1 2 3 4 5 6 7 8 9
65 46 X 58
0 1 2 3 4 5 6 7 8 9
h(k)+1 h(k)+2

Linear Probing
• Advantage - Uses less memory than chaining
- Simple to implement
- Best cache performance
- For any α < 1, successful insertion
• Disadvantage – Primary clustering leads to more no. of
probes
- Performance quickly degrades for α > ½
for look ups
32
0 30
1 90
2 41
3
4
5 55
6
7
8 68
9 49

Quadratic Probing
Collision resolution strategy
Function f(i) = i2 where i is the probe parameter
Hashing function
hi(k) = [ h(k) + f(i) ] mod TableSize
= [ h(k) + i2 ] mod TableSize
Probe sequence: i iterating from 0
1th probe = [ h(k) + 1 ] mod TableSize
2th probe = [ h(k) + 4 ] mod TableSize
3rd probe = [ h(k) + 9 ] mod TableSize
. . . ith probe = [ h(k) + i2
] mod TableSize 33

Quadratic Probing
Insert keys 89, 18, 49, 58, 69
34
Index Keys
0
1
2
3
4
5
6
7
8
9 89
Index Keys
0
1
2
3
4
5
6
7
8 18
9 89
Index Keys
0 49
1
2
3
4
5
6
7
8 18
9 89
hi(k) = [ h ( k ) + i2 ] mod Tablesize
= [ h ( k ) + i2 ] % 10
i=0
h0(89)
=[ h(89)+ 02]%10
=[ 9 + 0] % 10
= 9
i=0
h0(18)
=[ h(18)+ 02]%10
=[ 8 + 0] % 10
= 8
i=0
h0(49)
=[ h(49)+ 02
]%10
= 9
i=1
h1(49]
=[ h(49)+ 12
]%10
= 0
Collision occurs as
Slot 9 occupied by 89

Quadratic probing ………….. Contd.
Insert keys 89, 18, 49, 58, 69
35
Index Keys
0 49
1
2 58
3
4
5
6
7
8 18
9 89
Index Keys
0 49
1
2 58
3 69
4
5
6
7
8 18
9 89
Insert 58 Insert 69
i=0
h0(58)= [ h(58)+ 02]%10
= 8
(Collision)
i=0
h0(69) = [ h(69)+ 02]%10
= 9
(Collision)
i=1
h1(58) = [ h(58)+ 12]%10
= 9
(Collision)
i=2
h2(58)= [ h(58)+ 22]%10
= 2
i=1
h1(69) = [ h(69)+ 12
]%10
= 0
(Collision)
i=2
h2(69) = [ h(69)+ 22]%10
= 3
hi(k) = [ h ( k ) + i2 ] mod Tablesize
= [ h ( k ) + i2 ] % 10

Lookup in Quadratic Probing
• Continue looking at offset locations (Probing)
till k successfully found
an empty location encountered
Search 55 : h(55) = 5
Search 6 : h(6) = 6
36
65 46 17 55
0 1 2 3 4 5 6 7 8 9
65 46 17 55
0 1 2 3 4 5 6 7 8 9
FOUND 55
EMPTY
UNSUCCESSFUL
SEARCH

Deletion in Quadratic Probing
• Search for key to be deleted
• Delete the key
• Set location with marker/flag (x)
Rehash if more markers
Delete 15
37
65 46 58 15
0 1 2 3 4 5 6 7 8 9
65 46 58 X
0 1 2 3 4 5 6 7 8 9
h(k)+1
h(k)+4

Quadratic Probing
• Advantage
• Avoids Primary clustering
• Disadvantage
• Secondary clustering – probing the same sequence in looking
for an empty location
• If table size is not a prime number, probes will not try all locations in
the table
38

Double Hashing
• Uses 2 hash functions h1(k) and h2(k)
• h1(k) is first position to check keys
h1(k) = k mod TableSize
• h2(k) determines offset
h2(k) = R – (k * mod R) where R is a prime smaller than
TableSize
• Collision resolution strategy
Function f(i) = i ∗ h2(k)
• Hashing function
hi(k)= [ h1(k) + f(i) ] mod TableSize
hi(k)= [ h1(k) + i ∗ h2(k) ] mod TableSize
39
hi(k)= [ h1(k) + f(i) ] mod TableSize

Double Hashing
Hashing function
where h1(k) = k mod TableSize and h2(k)=R – (k * mod R)
Probe sequence: i iterating from 0
1th probe = [ h1(k) + 1∗ h2(k) ] mod TableSize
2th probe = [ h1(k) + 2 ∗ h2(k) ] mod TableSize
3rd probe = [ h1(k) + 3 ∗ h2(k) ] mod TableSize
. . .
ith probe = [ h1(k) + i ∗ h2(k) ] mod TableSize
40

Double Hashing
Insert keys 89, 18, 49, 58, 69
41
= [ h1(k) + i ∗ h2(k) ] % 10
KEY 89 18 49 58 69
h1(k)=k % 10 9 8 9 8 9
h2(k) = R – ( k mod R )
=7 – ( k % 7 )
2 3 7 5 1
hi(k) = ( h1(k) + i * h2(k) ) % 10
For i=0
h0(89)
= (9+0*2) % 10
= 9
h0(18)
= (8+0*3) % 10
= 8
h0(49)
= (9+0*7) % 10
= 9
h0(58)
= (8+0*7) % 10
= 8
h0(69)
= (9+0*7) % 10
= 9
i=1
h1(49)
= (9+1*7) % 10
= 6
h1(58)
= (8+1*7) % 10
= 3
h1(69)
= (9+1*7) % 10
= 0
0 1 2 3 4 5 6 7 8 9
69 58 49 18 89
HASH TABLE

Double Hashing
DoubleHashingInsert(k)
if (table is full) error
probe=h1(k) ; offset=h2(k) // probe= location
while (table[probe] occupied)
probe=(probe+offset) mod m
table[probe]=k
42

Double Hashing
• If the table size is not prime, it is possible to run out of alternative
locations prematurely
• Advantages
• Distributes key more uniformly than linear probing
• Reduces clustering
• Allows for smaller tables (higher load factors) than linear or
quadratic probing, but at the expense of higher costs to compute
the next probe
• Disadvantage
• As table fills up performance degrades
• Time-consuming to compute two hash functions
• Poor cache performance
43

Rehashing
• Rehashing done when
• Table is mostly full operations are getting slow
• Insertion fails
• Load factor exceeds its bound
• Steps for rehashing
• Build another Hash table with increased TableSize
• Hash code regenerated with hash function
44

Example - Rehashing
45
TableSize m= 17
Hash table with linear probing
with input 13, 15, 6, 24
Hash table with linear
probing
after 23 is inserted
TableSize m= 7
AFTER
REHASHING

Extendible Hashing
• When the table gets too full
• Rehashing done - expensive
• Extendible hashing can be done
• Extendible hashing
• Allows search in 2 disk accesses
• Insertions also require few disk
accesses
• Dynamic hashing method Uses
• Directory
• Buckets
46

Extendible Hashing
• Directory
• Array with 2𝑑 entries where d is dictionary levels called the global
depth
• Global depth d - # of bits used from each hash value
• d no. of bits are used to choose the directory entry for key
insertion and searching
• Can grow, but its size is always a power of 2
• Entry has bucket address (pointers) which is used to access buckets
• Multiple directory entries may point to the same bucket
• Bucket
• has a local depth d’ that indicates how many of the d bits of the hash
value are actually used to indicate membership in the bucket
• Keys are stored in buckets
48

Example – Extendible Hashing Searching
49
4 Directory
entries
pointers
d = global
depth
𝑑′= local depth hash function h(k)=k mod 4
To search 15
h(k)=15% 4 = 3 (11 in b)
which points to bucket D

Extendible Hashing Insertion
• Assume each hashed key is a sequence of four binary digits.
➯Store values 0001, 1001, 1100
As d= 1 first bit of key is used
for choosing directory
look up
0001, 1001, 1100
50
Bucket A
Bucket B

Extendible Hashing Insertion Contd…
51
Bucket A
Bucket B

Extendible Hashing Insertion Contd…
52
Insert 1111 Directory grows one level

Overflow Handling during Insertion
53

Overflow Handling during Insertion
• If overflow occurs
• Case 1 : Local depth of the overflown bucket = Global depth before
split
• Directory doubles (grows) and global depth incremented (d ++)
• Bucket is split into two and local depth incremented (d′ ++)
• Keys redistributed in the split buckets
• Case 2 : Local depth of the overflown bucket < Global depth before
split
• Bucket is split into two and local depth incremented (d′ ++)
• No change in directory ( d remains same)
54

Example - Overflow Handling during Insertion
d = global depth
incremented
𝑑′
= local depth incremented
𝑑′= local depth incremented
h(63)= 63 % 4 = 3 ( 11 in b) which points to bucket D which overflows
As d=d’
Case 1 : Directory doubled and bucket D is split
BUCKET
D is split
Inserting 63
h(63)= 63 % 8 = 7 ( 111 in b)
which points to bucket D′

Example - Extendible Hashing Insertion
56
After
inserting 17
and 13
h(13) = 13 % 8 =5 (101)
Points bucket B’
h(17) = 17 % 8 =1 (001)
Points bucket B

Extendible Hashing Deletion
• If deletions cause a bucket to be substantially less than
full
•Find a buddy bucket to collapse
•Two buckets are buddies if:
• They are at the same depth.
• Their initial bit strings are the same.
• Collapsing them will fit all records in one bucket
• Collapse if a bucket is empty
57

Example - Extendible Hashing Deletion
58

Extendible Hashing
• Advantages
• Key search takes only one disk access if the directory can be
kept in RAM, otherwise it takes two
• Disadvantages
• Doubling the directory is a costly operation
• Directory may outgrow main memory
59

Applications
• Compilers use hash tables to keep track of declared variables
• On-line spell checkers
• “hash” an entire dictionary
• Quickly check if words are spelled correctly in constant
time
60

Applications
61
Password checkers

Data Structures- Hashing

More Related Content

What's hot (20)

Similar to Data Structures- Hashing (20)

Recently uploaded (20)

Data Structures- Hashing