H A S H I N G By  Abdul Ghaffar Khan
Contents Basic Concepts Hashing Functions Collision detection techniques
Hashing-The Basic Idea We would build a data structure for which both the insertion and find operations are  O (1) in the worst case.  If we cannot guarantee  O (1) performance in the  worst case , then we make it our design objective to achieve  O (1) performance  in the average case .  In order to meet the performance objective of constant time insert and find operations, we need a way to do them  without performing a search . I.e., given an item  x , we need to be able to determine directly from  x  the array position where it is to be stored.
Hashing-The Basic Idea Hash tables are widely used data structures, because some operations can be implemented to perform with a constant average time (insertion, deletion, and search). The general model of a hash table is:
Hashing-The Basic Idea Items in a hash table have two parts: a  Key  used for indexing and one or more  data fields . Typically, one or more data fields are used to create key. The  number of cells  in the table is  TableSize-1 . Note that the table might be empty. The  number of items  is the actual number of cells being used. As with arrays, each item within the table is indexed by a number, 0…TableSize-1 . The index number is obtained using a mapping function known as the hash function , which ideally should provide a fast method for computing a  unique key  for each  cell  in the table.
Hash function Must return a valid  table location .  Easy to implement. Should be 1-to-1 mapping.  (avoid collision)   If key1 != key2 then hash(key1) != hash(key2) A  collision  occurs when two distinct keys hash to the same location in the array Should distribute the keys evenly Any key value  k  is equally likely to hash to any of the  m  array locations.
Standard  Hash Function hashValue  =  key  ( mod )  TableSize Example: 4112041 :   12041 mod 1000  =  41 4163490 :   63490 mod 1000  =  490 TableSize should be a  prime number  for even distribution
Hash Function  Examples Typically the keys are of type string, using either ASCII or UNICODE. Here is a  typical hash functions: p ublic static int  Hash(  string  key,  int  tableSize) { int  hashVal = 0; char  c; for  (  int  i=0; i < key.Length; i++ ) { c = key[i]; hashVal += ( int ) c; } return  hashVal % tableSize; }
Hash Function  Examples   hash = (k 0  +  2 7k 1  +  2 7 2 k 2  + . . . ) mod TableSize Example: 3-character key hash = (k 0  +  2 7k 1  +  2 7 2 k 2 ) mod TableSize hash = k 0  +  2 7 * (k 1  + 2 7 * (k 2 )) mod TableSize public static int  HashFig53(  string  key,  int  tableSize) { int  aNumber; aNumber = key[0] + 27*key[1] + 729*key[2]; aNumber = aNumber % tableSize; return  aNumber; }
Hash Function  Examples Example: We wish to implement a searchable container which will be used to contain character strings from the set of strings  K ,  Suppose we define a function  as given by the table:  Then, we can implement a searchable container using  a table  of  length  n =12. To insert item  x , we simply store  it a position  h ( x )-1 of the table. Similarly, to locate item  x , we simply check to see if it is found at position  h ( x )-1
Collision When an element is inserted, if it hashes to the same value as an already inserted element, then we have a  collision . Collision resolving techniques Separate Chaining Open Addressing Linear Probling, Quadratic Probling, Double Hashing
Separate Chaining This is a technique used to avoid collisions. The idea is to store the   items that hash to the same value in a sorted list. This is a very nice example of a data structure that is actually   implemented as a combination of two data structures: hash table and a   set of sorted linked lists.   Therefore, the operations are implemented in terms of those data   structures. For example, to  find  an item in the table, 1. Find the ith linked list from the index in the Table 2. Traverse the list to find the element. Assuming that hash(x) = x mod(10),
Separate Chaining Load factor   = number of elements / table size average length of list =     successful search cost 1 + (   link traversals cost depends on  
Open Addressing No linked-list.  All items are in the array If a collision occurs, alternative locations are tried until an empty cell is found try  h 0 (x),  h 1 (x),  h 2 (x), … h i (x) = (hash(x) +  f (i)) mod TableSize f(i)   is a collision resolution strategy Require bigger table,    should be below 0.5
Linear Probing If a collision occurs, try the next cell sequentially  f(i) = i h i (x) = (hash(x) + i) mod TableSize Try  hash(x) mod TableSize, (hash(x) + 1) mod TableSize, (hash(x) + 2) mod TableSize, (hash(x) + 3) mod TableSize, . . .
Linear Probing Insert:  89, 18, 49, 58, 69 89 is directly inserted into cell 9 18 is directly inserted into cell 8 49 has a collision at cell 9 and finally put into cell 0 58 has collisions at cell 8, 9, 0  and finally put into cell 1 69 has a collisions at cell 9, 0, 1 and finally put into cell 2 0 1 2 3 4 5 6 7 8 9 49 58 69 18 89
Primary Clustering Forming of blocks of occupied cells (called clusters) A collision occurs if a key is hashed into  anywhere  in a cluster.  Then there may be several attempts to resolve the collision before a free space is found.  The new data is added into the cluster.
Linear Probing :   ( Problem s) Primary Clustering  Normal deletion cannot be performed :   (some following find operations will fail because the link of collisions that leads to the data is cut)   Use  lazy deletion Insertion cost =  number of probes to find an empty cell  = 1/(fraction of empty cells)  = 1/(1-   )
Quadratic Probing Eliminate primary clustering f(i) = i 2 h i (x) = (hash(x) + i 2 ) mod TableSize Try   hash(x) mod TableSize,  hash(x)+1 2  mod TableSize, hash(x)+2 2  mod TableSize,  hash(x)+3 2  mod TableSize,  . . . Table must be at most half full and table size must be prime, otherwise insertion may fail (always have a collision)
Quadratic Probing Insert:  89, 18, 49, 58, 69 Insert 89, try cell 9 Insert 18, try cell 8 Insert 49, try cell 9, 0 Insert 58, try cell 8, 9, 2 Insert 69, try cell 9, 0, 3 0 1 2 3 4 5 6 7 8 9 49 58 18 89 69
Quadratic Probing Insert:  10, 20, 30, 40, 50, 60, 70 Insert 10, try cell 0 Insert 20, try cell 0, 1 Insert 30, try cell 0, 1, 4 Insert 40, try cell 0, 1, 4, 9 Insert 50, try cell 0, 1, 4, 9, 6 (16) Insert 60, try cell 0, 1, 4, 9, 6 (16), 5 (25) Insert 70, try cell 0, 1, 4, 9, 6 (16), 5 (25), 6 (36), 9 (49), 4 (64), 1 (81), 0 (100), 1 (121), 4 (144), 9 (169), 6 (196), . . .  20 30 50 60 0 1 2 3 4 5 6 7 8 9 10 40
Quadratic Probing Secondary clustering elements that hash to the same position will probe the same alternative cells and put into the next available space, forming a cluster. In the first example, inserting 89, 49, 69 forms a secondary cluster.  Inserting 18, 58 forms another secondary cluster.
Double Hashing f(i) = i * hash 2 (x) h i (x) = (hash(x) + i   * hash 2 (x)) mod TableSize Try  hash(x) mod TableSize,  (hash(x) + hash 2 (x)) mod TableSize,  (hash(x) + 2*hash 2 (x)) mod TableSize, . . . Example:  hash 2 (x) = R - (x mod R) R is a prime number smaller than TableSize
Double Hashing Insert:  89, 18, 49, 58, 69, 23 hash 2 (49) = 7-(49 mod 7) = 7 hash 2 (58) = 7-(58 mod 7) = 5 hash 2 (69) = 7-(69 mod 7) = 1 hash 2 (23) = 7-(23 mod 7) = 5 Insert 49, try 9, (9+7) mod 10 = 6 Insert 58, try 8, (8+5) mod 10 = 3 Insert 69, try 9, (9+1) mod 10 = 0 Insert 23, try 3, (3 + 5) mod 10 = 8, (3 + 10) mod 10 = 3, (3+15) mod 10 = 8, . . . 0 1 2 3 4 5 6 7 8 9 69 18 89 58 49
Rehashing When the table is too full, create a new table at least twice as big (and size is prime), compute the new hash value of each element, insert it into the new table. Rehash when the table is half full, or when an insertion fails, or when a certain load factor is reached. Because of lazy deletion, deleted cells are also counted when the load factor is calculated. Rehashing time is O(N).  But the cost is shared by preceding N/2 insertions.  So, it adds constant cost to each insertion.
Rehashing

More Related Content

PPTX
Tree - Data Structure
PPT
Data Structure and Algorithms Hashing
PPT
Hashing PPT
PPTX
Hashing algorithms and its uses
PPTX
Hashing In Data Structure
PDF
Linear search
PPTX
Linear search-and-binary-search
Tree - Data Structure
Data Structure and Algorithms Hashing
Hashing PPT
Hashing algorithms and its uses
Hashing In Data Structure
Linear search
Linear search-and-binary-search

What's hot (20)

PPTX
Hashing
PPTX
Hashing and Hashtable, application of hashing, advantages of hashing, disadva...
PPTX
B and B+ tree
PPT
Heap sort
PDF
PPTX
Quadratic probing
PPTX
Line Drawing Algorithms - Computer Graphics - Notes
PPTX
Data Structures : hashing (1)
PPTX
Double Linked List (Algorithm)
PPT
Extensible hashing
PPTX
Data Structures- Hashing
PDF
Hashing and Hash Tables
PPTX
Merge sort algorithm power point presentation
PPT
Lec 17 heap data structure
PDF
Sorting Algorithms
PPT
Hash tables
PPTX
Data Structures - Lecture 8 [Sorting Algorithms]
PPT
Data Structure and Algorithms Binary Search Tree
PPTX
Hashing Technique In Data Structures
Hashing
Hashing and Hashtable, application of hashing, advantages of hashing, disadva...
B and B+ tree
Heap sort
Quadratic probing
Line Drawing Algorithms - Computer Graphics - Notes
Data Structures : hashing (1)
Double Linked List (Algorithm)
Extensible hashing
Data Structures- Hashing
Hashing and Hash Tables
Merge sort algorithm power point presentation
Lec 17 heap data structure
Sorting Algorithms
Hash tables
Data Structures - Lecture 8 [Sorting Algorithms]
Data Structure and Algorithms Binary Search Tree
Hashing Technique In Data Structures
Ad

Similar to Hashing (20)

PPTX
Lecture14_15_Hashing.pptx
PPT
Hashing Techniques in Data Strucures and Algorithm
PPTX
session 15 hashing.pptx
PPTX
Hashing a searching technique in data structures
PPTX
hashing1.pptx Data Structures and Algorithms
PPT
Hashing in Data Structure and analysis of Algorithms
PPTX
hashing in data strutures advanced in languae java
PDF
L21_Hashing.pdf
PPTX
Hashing using a different methods of technic
PPTX
8. Hash table
PPT
Hash presentation
PPTX
Introduction to Hash Tables | What is a HashTable in Algorithm
PDF
Hashing components and its laws 2 types
PDF
08 Hash Tables
PPTX
Hashing.pptx
PPT
computer notes - Data Structures - 36
PPT
Ch17 Hashing
PDF
hashtableeeeeeeeeeeeeeeeeeeeeeeeeeee.pdf
PPT
Lecture14_15_Hashing.pptx
Hashing Techniques in Data Strucures and Algorithm
session 15 hashing.pptx
Hashing a searching technique in data structures
hashing1.pptx Data Structures and Algorithms
Hashing in Data Structure and analysis of Algorithms
hashing in data strutures advanced in languae java
L21_Hashing.pdf
Hashing using a different methods of technic
8. Hash table
Hash presentation
Introduction to Hash Tables | What is a HashTable in Algorithm
Hashing components and its laws 2 types
08 Hash Tables
Hashing.pptx
computer notes - Data Structures - 36
Ch17 Hashing
hashtableeeeeeeeeeeeeeeeeeeeeeeeeeee.pdf
Ad

More from Ghaffar Khan (20)

PPT
World is beautiful ... ...
PPTX
My Presentation On Ajax
PPT
Sorting
PPT
How A Computer Works
PPT
For Loop
PPT
Exponential and Logarthmic funtions
PPT
Exponential and Logarthmic funtions (1)
PPT
Functions
PPT
Quadratic And Polinomial Function
PPT
Quadratic And Polinomial Function
PPT
Exponentioal And Logarthmic Functions
PPT
Internet Protocol
PPT
Introduction to Computer Networks
PPT
Network Layer
PPT
Control Structures
PPT
Input And Output
PPT
Surfaces
PPT
Vector Tools
PPT
Drawing Tools
PPT
Drawing Figures
World is beautiful ... ...
My Presentation On Ajax
Sorting
How A Computer Works
For Loop
Exponential and Logarthmic funtions
Exponential and Logarthmic funtions (1)
Functions
Quadratic And Polinomial Function
Quadratic And Polinomial Function
Exponentioal And Logarthmic Functions
Internet Protocol
Introduction to Computer Networks
Network Layer
Control Structures
Input And Output
Surfaces
Vector Tools
Drawing Tools
Drawing Figures

Recently uploaded (20)

PPTX
How to use fields_get method in Odoo 18
PDF
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
PPTX
How to Convert Tickets Into Sales Opportunity in Odoo 18
PDF
Lung cancer patients survival prediction using outlier detection and optimize...
PDF
NewMind AI Weekly Chronicles – August ’25 Week IV
PDF
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
PDF
SaaS reusability assessment using machine learning techniques
PDF
Early detection and classification of bone marrow changes in lumbar vertebrae...
PDF
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
PPTX
AQUEEL MUSHTAQUE FAKIH COMPUTER CENTER .
PDF
Launch a Bumble-Style App with AI Features in 2025.pdf
PPTX
Report in SIP_Distance_Learning_Technology_Impact.pptx
PDF
Streamline Vulnerability Management From Minimal Images to SBOMs
PDF
Identification of potential depression in social media posts
PDF
Examining Bias in AI Generated News Content.pdf
PDF
Altius execution marketplace concept.pdf
PDF
Connector Corner: Transform Unstructured Documents with Agentic Automation
PDF
IT-ITes Industry bjjbnkmkhkhknbmhkhmjhjkhj
PDF
Human Computer Interaction Miterm Lesson
PDF
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
How to use fields_get method in Odoo 18
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
How to Convert Tickets Into Sales Opportunity in Odoo 18
Lung cancer patients survival prediction using outlier detection and optimize...
NewMind AI Weekly Chronicles – August ’25 Week IV
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
SaaS reusability assessment using machine learning techniques
Early detection and classification of bone marrow changes in lumbar vertebrae...
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
AQUEEL MUSHTAQUE FAKIH COMPUTER CENTER .
Launch a Bumble-Style App with AI Features in 2025.pdf
Report in SIP_Distance_Learning_Technology_Impact.pptx
Streamline Vulnerability Management From Minimal Images to SBOMs
Identification of potential depression in social media posts
Examining Bias in AI Generated News Content.pdf
Altius execution marketplace concept.pdf
Connector Corner: Transform Unstructured Documents with Agentic Automation
IT-ITes Industry bjjbnkmkhkhknbmhkhmjhjkhj
Human Computer Interaction Miterm Lesson
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf

Hashing

  • 1. H A S H I N G By Abdul Ghaffar Khan
  • 2. Contents Basic Concepts Hashing Functions Collision detection techniques
  • 3. Hashing-The Basic Idea We would build a data structure for which both the insertion and find operations are O (1) in the worst case. If we cannot guarantee O (1) performance in the worst case , then we make it our design objective to achieve O (1) performance in the average case . In order to meet the performance objective of constant time insert and find operations, we need a way to do them without performing a search . I.e., given an item x , we need to be able to determine directly from x the array position where it is to be stored.
  • 4. Hashing-The Basic Idea Hash tables are widely used data structures, because some operations can be implemented to perform with a constant average time (insertion, deletion, and search). The general model of a hash table is:
  • 5. Hashing-The Basic Idea Items in a hash table have two parts: a Key used for indexing and one or more data fields . Typically, one or more data fields are used to create key. The number of cells in the table is TableSize-1 . Note that the table might be empty. The number of items is the actual number of cells being used. As with arrays, each item within the table is indexed by a number, 0…TableSize-1 . The index number is obtained using a mapping function known as the hash function , which ideally should provide a fast method for computing a unique key for each cell in the table.
  • 6. Hash function Must return a valid table location . Easy to implement. Should be 1-to-1 mapping. (avoid collision) If key1 != key2 then hash(key1) != hash(key2) A collision occurs when two distinct keys hash to the same location in the array Should distribute the keys evenly Any key value k is equally likely to hash to any of the m array locations.
  • 7. Standard Hash Function hashValue = key ( mod ) TableSize Example: 4112041 : 12041 mod 1000 = 41 4163490 : 63490 mod 1000 = 490 TableSize should be a prime number for even distribution
  • 8. Hash Function Examples Typically the keys are of type string, using either ASCII or UNICODE. Here is a typical hash functions: p ublic static int Hash( string key, int tableSize) { int hashVal = 0; char c; for ( int i=0; i < key.Length; i++ ) { c = key[i]; hashVal += ( int ) c; } return hashVal % tableSize; }
  • 9. Hash Function Examples hash = (k 0 + 2 7k 1 + 2 7 2 k 2 + . . . ) mod TableSize Example: 3-character key hash = (k 0 + 2 7k 1 + 2 7 2 k 2 ) mod TableSize hash = k 0 + 2 7 * (k 1 + 2 7 * (k 2 )) mod TableSize public static int HashFig53( string key, int tableSize) { int aNumber; aNumber = key[0] + 27*key[1] + 729*key[2]; aNumber = aNumber % tableSize; return aNumber; }
  • 10. Hash Function Examples Example: We wish to implement a searchable container which will be used to contain character strings from the set of strings K , Suppose we define a function as given by the table: Then, we can implement a searchable container using a table of length n =12. To insert item x , we simply store it a position h ( x )-1 of the table. Similarly, to locate item x , we simply check to see if it is found at position h ( x )-1
  • 11. Collision When an element is inserted, if it hashes to the same value as an already inserted element, then we have a collision . Collision resolving techniques Separate Chaining Open Addressing Linear Probling, Quadratic Probling, Double Hashing
  • 12. Separate Chaining This is a technique used to avoid collisions. The idea is to store the items that hash to the same value in a sorted list. This is a very nice example of a data structure that is actually implemented as a combination of two data structures: hash table and a set of sorted linked lists. Therefore, the operations are implemented in terms of those data structures. For example, to find an item in the table, 1. Find the ith linked list from the index in the Table 2. Traverse the list to find the element. Assuming that hash(x) = x mod(10),
  • 13. Separate Chaining Load factor  = number of elements / table size average length of list =  successful search cost 1 + (  link traversals cost depends on 
  • 14. Open Addressing No linked-list. All items are in the array If a collision occurs, alternative locations are tried until an empty cell is found try h 0 (x), h 1 (x), h 2 (x), … h i (x) = (hash(x) + f (i)) mod TableSize f(i) is a collision resolution strategy Require bigger table,  should be below 0.5
  • 15. Linear Probing If a collision occurs, try the next cell sequentially f(i) = i h i (x) = (hash(x) + i) mod TableSize Try hash(x) mod TableSize, (hash(x) + 1) mod TableSize, (hash(x) + 2) mod TableSize, (hash(x) + 3) mod TableSize, . . .
  • 16. Linear Probing Insert: 89, 18, 49, 58, 69 89 is directly inserted into cell 9 18 is directly inserted into cell 8 49 has a collision at cell 9 and finally put into cell 0 58 has collisions at cell 8, 9, 0 and finally put into cell 1 69 has a collisions at cell 9, 0, 1 and finally put into cell 2 0 1 2 3 4 5 6 7 8 9 49 58 69 18 89
  • 17. Primary Clustering Forming of blocks of occupied cells (called clusters) A collision occurs if a key is hashed into anywhere in a cluster. Then there may be several attempts to resolve the collision before a free space is found. The new data is added into the cluster.
  • 18. Linear Probing : ( Problem s) Primary Clustering Normal deletion cannot be performed : (some following find operations will fail because the link of collisions that leads to the data is cut) Use lazy deletion Insertion cost = number of probes to find an empty cell = 1/(fraction of empty cells) = 1/(1-  )
  • 19. Quadratic Probing Eliminate primary clustering f(i) = i 2 h i (x) = (hash(x) + i 2 ) mod TableSize Try hash(x) mod TableSize, hash(x)+1 2 mod TableSize, hash(x)+2 2 mod TableSize, hash(x)+3 2 mod TableSize, . . . Table must be at most half full and table size must be prime, otherwise insertion may fail (always have a collision)
  • 20. Quadratic Probing Insert: 89, 18, 49, 58, 69 Insert 89, try cell 9 Insert 18, try cell 8 Insert 49, try cell 9, 0 Insert 58, try cell 8, 9, 2 Insert 69, try cell 9, 0, 3 0 1 2 3 4 5 6 7 8 9 49 58 18 89 69
  • 21. Quadratic Probing Insert: 10, 20, 30, 40, 50, 60, 70 Insert 10, try cell 0 Insert 20, try cell 0, 1 Insert 30, try cell 0, 1, 4 Insert 40, try cell 0, 1, 4, 9 Insert 50, try cell 0, 1, 4, 9, 6 (16) Insert 60, try cell 0, 1, 4, 9, 6 (16), 5 (25) Insert 70, try cell 0, 1, 4, 9, 6 (16), 5 (25), 6 (36), 9 (49), 4 (64), 1 (81), 0 (100), 1 (121), 4 (144), 9 (169), 6 (196), . . . 20 30 50 60 0 1 2 3 4 5 6 7 8 9 10 40
  • 22. Quadratic Probing Secondary clustering elements that hash to the same position will probe the same alternative cells and put into the next available space, forming a cluster. In the first example, inserting 89, 49, 69 forms a secondary cluster. Inserting 18, 58 forms another secondary cluster.
  • 23. Double Hashing f(i) = i * hash 2 (x) h i (x) = (hash(x) + i * hash 2 (x)) mod TableSize Try hash(x) mod TableSize, (hash(x) + hash 2 (x)) mod TableSize, (hash(x) + 2*hash 2 (x)) mod TableSize, . . . Example: hash 2 (x) = R - (x mod R) R is a prime number smaller than TableSize
  • 24. Double Hashing Insert: 89, 18, 49, 58, 69, 23 hash 2 (49) = 7-(49 mod 7) = 7 hash 2 (58) = 7-(58 mod 7) = 5 hash 2 (69) = 7-(69 mod 7) = 1 hash 2 (23) = 7-(23 mod 7) = 5 Insert 49, try 9, (9+7) mod 10 = 6 Insert 58, try 8, (8+5) mod 10 = 3 Insert 69, try 9, (9+1) mod 10 = 0 Insert 23, try 3, (3 + 5) mod 10 = 8, (3 + 10) mod 10 = 3, (3+15) mod 10 = 8, . . . 0 1 2 3 4 5 6 7 8 9 69 18 89 58 49
  • 25. Rehashing When the table is too full, create a new table at least twice as big (and size is prime), compute the new hash value of each element, insert it into the new table. Rehash when the table is half full, or when an insertion fails, or when a certain load factor is reached. Because of lazy deletion, deleted cells are also counted when the load factor is calculated. Rehashing time is O(N). But the cost is shared by preceding N/2 insertions. So, it adds constant cost to each insertion.