We learnt in an earlier article that encryption transforms a plaintext into ciphertext by applying an algorithm. The length of the ciphertext will usually be equal to or more than the length of the plaintext. This is used to hide the plaintext altogether. On the other hand, a hash is a slightly different concept. In hashing, when a hashing algorithm or hashing function is applied to a plaintext, the output is usually a value of fixed size. This output value is unique for unique input text (Though there are some hashing algorithms which may generate duplicate hash values for different inputs. We will see such a simple example below).
Let us try to understand hashing by way of an example. Suppose there is a hash function that takes an input and returns the sum of ASCII values of the characters in the input strings. Therefore:
- Aman will yield a result of 381 (A=65, m=109, a=97, n=110)
- John will yield a result of 399 (J=74, o=111, h=104, n=110)
Now if we change Aman to Amam, the result will change to 380 (A=65, m=109, a=97, n=109). It is to be noted that the above hash function (i.e. taking sum of ASCII values) is not a reliable one as it will generate same values for Aman, Anam, Aamn, Aanm, etc. It was told only for illustrative purpose. Real world algorihms are much more complex and generally used values do not generate same hash values for different inputs.
Thus we see that any change in the input will lead to a change in the calculated hash value. This feature is used to check the authenticity of data by means of sharing the hash value along with the original message. The recipient will generate a fresh hash value from the received message and compare it with the has value received with the message. If both match, the message is not tampered, and if they differ, the message has been tampered with.
Similarly, passwords can be stored in hash form so that even if there is a breach and a hacker gets access to internal database of the website’s users, he will not get access to the user’s passwords. Logging into the system can be established by comparing the hash of the password supplied by the user and the hash stored in the database.
They are also used in various programming languages where unique data needs to be mapped onto other data (e.g. dictionaries in Python programming language)
It is to noted that unlike encryption where original message can be obtained after decryption, original message cannot be retrieved from a hash value.
For Wikipedia entry on Hash Function, click here.
For more posts on Cybersecurity, click here.
For more posts in The Cyber Cops project, click here.