Things to be discussed here,
 Introduction
 Hash Table
 Implementation
 Practice Problem
Introduction
If you have programming experience and don't know what a Hash Table is,
chances are you use them regularly without realizing it. They are so useful and
practical that many programming languages have a hash table construct built
into the syntax.
They are referred to as associative arrays (PHP) or dictionaries
(Python) or hashes (PERL). By creating your hash table, you can tailor it to
your specific needs with optimal efficiency. In this article, I'll try to
provide a basic understanding of hash tables. In the end, I'll provide a full
implementation of a hash table in C++.
“If I were stranded
on a desert island and could only take one data structure with me, it would be
a hash table.”
 Peter Van Der Linden, Author of Deep C Secrets
Hash Table
So, what is a hash table? Think of it as arrays on steroids. It’s a
versatile data structure that can solve many kinds of problems, and it can also
store a huge variety of data types and quantities super efficiently.
A hash table is made up of two different parts: An array (which
you're already familiar with) and a hash function (which is probably a
new concept!). So why these parts? The array is what holds our data and the
hash function decides where exactly our data goes in the array  and how
we'll get it out when the time comes.
Hashing is a technique that is used to uniquely identify a specific object from a group of similar objects. Some examples of how hashing is used in
our lives include:
 In universities, each the student is assigned a unique roll number that can be used to retrieve
information about them.
 In libraries, each book is
assigned a unique number that can be used to determine information about
the book, such as its exact position in the library or the users it has
been issued to etc.
In both these examples, the students and books were hashed to a unique
number.
More formally, the Hash table is a data structure that represents data
in the form of keyvalue pairs. Each key is mapped to a value in the
hash table. The keys are used for indexing the values/data.
Hash Function
Any function that can be used to map a dataset of arbitrary size to a
dataset of a fixed size that falls into the hash table is called a hash
function. The values returned by a hash function are called hash values, hash
codes, hash sums, or simply hashes.
To achieve a good hashing mechanism, it is important to have a good hash
function with the following basic requirements:
 Easy to compute: It should
be easy to compute and must not become an algorithm.
 Uniform distribution: It
should provide a uniform distribution across the hash table and should not
result in clustering.
 Fewer collisions: Collisions
occur when pairs of elements are mapped to the same hash value. These
should be avoided.
Note: Irrespective of how good a hash function is, collisions are bound to
occur. Therefore, to maintain the performance of a hash table, it is important
to manage collisions through various collision resolution techniques.
Operations

Average Case

Worst Case

space

O(n)

O(n)

insert

O(1)

O(n)

lookup

O(1)

O(n)

delete

O(1)

O(n)

Strengths
 Fast lookups. Lookups take O(1) time on average.
 Flexible keys. Most data types can be
used for keys if they're hashable.
Weaknesses
 Slow worstcase lookups. Lookups take O(n) time in the worst case.
 Unordered. Keys aren't stored in a
special order. If you're looking for the smallest key, the largest key, or
all the keys in a range, you'll need to look through every key to find it.
 Singledirectional lookups. While you can look up the value
for a given key in O(1)
time, looking up the keys for a given value requires looping
through the whole dataset  O(n) time.
 Not cachefriendly. Many hash table
implementations use linked lists, which don't put data next to each other
in memory.
Collision Problem
Let’s just take a quick pause to think about hash functions for a second
 since we’re converting data of any length x into data of fixed size y,
dataset x will be way larger than dataset y. There are infinite
entries in dataset x, but limited ones in dataset y because all
the numbers must be a certain size or length. In other words, there could be
two or more pieces of data in dataset x that will hash to the same
integer in dataset y.
This is called the collision problem, and here are some ways to
deal with it:
 Separate Chaining
 Linear probing
 Quadratic Probing
 Double Hashing
Separate Chaining
When two or more data collide and hash to the same index, we
could simply link the data up into a linked list at that index of the array, as
you can see above.
Of course, this is the kind of situation we’re going to want to avoid
because instead of using constant time O(1) to find a key and its corresponding
information in the array, we end up needing O(n) time to walk through the linked
lists to find our actual values.
Linear Probing
Another way to deal with collisions is to walk down the array from the
collided index one by one until you get to the next free slot.
Quadratic Probing
Quadratic probing is like linear probing and the only difference is
instead of going one by one we follow an arbitrary polynomial to get to the
next unoccupied index.
Double hashing
Double hashing is like linear probing and the only difference is the
interval between successive jumps. Here, the interval between jumps is computed
by using two hash functions.
Dynamic array
resizing
Suppose we keep adding more items to our hash map. As the number of keys
and values in our hash map exceeds the number of indices in the underlying
array, hash collisions become inevitable.
To mitigate this, we could expand our underlying array whenever things
start to get crowded. That requires allocating a larger array and rehashing all
our existing keys to figure out their new position in O(n) time.
Applications
Hash tables are implemented where
 Constant time lookup and
insertion is required.
 Cryptographic applications
are used.
 Indexing data is required.
For example,
 Associative arrays: Hash tables are commonly
used to implement many types of inmemory tables. They are used to
implement associative arrays (arrays whose indices are arbitrary strings
or other complicated objects).
 Database indexing: Hash tables may also be
used as diskbased data structures and database indices (such as in DBM).
 Caches: Hash tables can be used to
implement caches i.e. auxiliary data tables that are used to speed up
access to data, which is primarily stored in slower media.
 Object representation: Several dynamic languages,
such as Perl, Python, JavaScript, and Ruby use hash tables to implement
objects.
 Hash Functions are used in
various algorithms to make their computing faster.
Implementation in C++
The implementation of Hash Table using separate chaining to resolve
collisions is given here,
If you have any doubts or issues in the implementation of other types of
hashing, feel free to comment below.
Practice Problems
 Bear and Three Musketeers ( 574B )
 Gargari and Bishops ( 463C )
 King's Path ( 242C )
 Train and Peter ( 8A )
 Correcting Mistakes ( 533E )
If you have any doubt or any content related to the abovediscussed topics, feel free to drop in the comment section
Thank You for Reading
Thank You for Reading
Comments
Post a Comment