In this blog post, we will investigate the use of hashing algorithms to quickly locate a record in a large database.
Let’s consider the database of members of a social network such as Instagram, Twitter or Facebook.
Every time the user uses the website or their smartphone app to access the social network, the server retrieves the user’s login name and password. It then has to find the record matching the given username to access their data and verify their password.
Large social networks have millions of members. Locating a single record in a table that contains millions of records can be time consuming especially if you perform a linear search to locate the record. (Linear Big O Notation). Alternative algorithms such as a binary search can speed up the process (Logarithmic Big O Notation), however a binary search only works when the data is sorted in alphabetical order. This is unlikely to be the case for a social network members table as members will have joined the network at different dates, hence the data will not be sorted in alphabetical order of their usernames. The assumption is that the records are stored in chronological order (based on the date they first signed in).
Using a Hashing Algorithm
To solve this problem, social networks use hashing algorithms when storing and accessing data in a large database.
A hashing algorithm is a complex mathematical calculation that takes an input (a.k.a. the key) (in this case the username of the member) and returns a value called a hash value or hash. When used for memory addressing the hash value generated is the memory location of where the record is stored.
Let’s consider a very basic hashing algorithm used to identify the memory location of a new member who has just signed up.
Our hashing algorithm will add up all the ASCII values of each character of their username. We will assume our database will contain around 50 memory locations. So our hashing algorithm will use the remainder of dividing the total ASCII value by 50 to get a unique number between 0 and 50.
For instance, for a user called James Bond whose username is “Bond”, the resulting hash value would be:
hashKey("Bond") = (ASCII("B") + ASCII("o") + ASCII ("n") + ASCII("d")) MOD 50 = ( 66 + 111 + 110 + 100 ) MOD 50 = 387 MOD 50 = 37
The details can be stored on location 37 of the members table (hash table) provided below.
Every time the user James Bond accesses the social network, he will provide his username. As part of the login process, the server will use the hashing algorithm on this username to quickly calculate the memory location of where James Bond’s record is stored. This is a very efficient approach to access the data. (Constant Big O notation).
Collisions
The hash values generated by a hashing algorithm should be fairly unique. However there will be occasions where two input values return the same hash value. For instance, let’s consider what would happen if a new user, Austin Powers signs in and their username is aPowers:
hashkey("aPowers") = (ASCII("a") + ASCII("P") + ASCII ("o") + ASCII("w") + ASCII("e") + ASCII("r") + ASCII("s")) MOD 50 = ( 97 + 80 + 111 + 119 + 101 + 114 + 115 ) MOD 50 = 737 MOD 50 = 37
This hash is the same as the hash generated for James Bond’s username. This create a collision. In this case, instead of overwritting the content of memory location 37 (already used by James Bond) the algorithm will jump to location 37 and carry on a linear search from there to find the next empty memory location. This could be memory location 38 if it’s empty, 39 if not, and so on till an empty location is found.
This will slightly slow down the process when trying to locate Austin Powers’ record but would still be far more efficient than without using a hashing algorithm.
Complex hashing algorithms are designed the minimize the risk of collisions: The effectiveness of a hashing algorithm is based on the total number of unique values the algorithm will generate to minimize the risks of collisions.
Hashing Algorithm in Action!
We have implemented our basic hashing algorithm as described above.
Your task is to use it to generate the hash values for these new members of our social network.
Using these hash values, you will then be able to add their information on the members hash table (see “Members Table” tab below). You will notice that a few members have already been stored in this table.
Warning: On occasion the generated hash may generate a collision with an existing member. In this case you you will have to use the next available location to store the members detail in the members table.
The new members to add are:
- Jason Bourne, username: jBourne
- Johny English, username: English
- Lara Croft, username: lCroft
Also, a few users are logging in. Can you use the hashing algorithm to quickly locate their records in the members hash table and access their details.
The users are:
- jBauer
- eHunt
- Holmes
Memory Location | Username | Firstname | Lastname |
0 | |||
1 | |||
2 | |||
3 | |||
4 | |||
5 | |||
6 | |||
7 | |||
8 | |||
9 | |||
10 | |||
11 | |||
12 | |||
13 | |||
14 | |||
15 | |||
16 | |||
17 | |||
18 | |||
19 | |||
20 | |||
21 | |||
22 | |||
23 | |||
24 | |||
25 | |||
26 | |||
27 | |||
28 | |||
29 | |||
30 | |||
31 | |||
32 | |||
33 | |||
34 | |||
35 | |||
36 | |||
37 | |||
38 | |||
39 | |||
40 | |||
41 | |||
42 | |||
43 | |||
44 | |||
45 | |||
46 | |||
47 | |||
48 | |||
49 | |||
50 |