What Is a Hashing Algorithm? A Look at Hash Functions
Hashing functions play a vital role in digital security — hashing functions do everything from providing tamper resistance for email communications to securing your software supply chain
Just look back at your day so far and try to think how many times you have accessed a website, received an online message, or sent an email. Did you worry that someone could intercept your message and tamper with it? For most of us, the answer would be no. The major credit here goes to the use of hashing algorithms — a technological invention that plays a key role in taking cryptography where it is today.
Do you want to know the importance of hashing algorithms to the world of security? Just ask Yahoo. In 2016, it was discovered that the hackers breached and obtained data of 500 million users. Research from Venafi shows that “a surprising number” of Yahoo’s digital certificates used weak hashing algorithms (i.e., MD5), which could certainly result in data breaches and other security issues.
But what is a hashing algorithm? How do companies use hashing functions and why is hashing important to data security? Let’s break it all down, starting with a quick hashing definition. Then we’ll move on to talking about some of the ways you can use hashing within your organization.
What Is Hashing? A Hashing Definition
In cryptography, hashing is a process that allows you to take data of any size and apply a mathematical process to it that creates an output that’s a unique string of characters and numbers of the same length. Thus, no matter what size or length of the input data, you always get a hash output of the same length.
More specifically, hashing takes input data (such as a password or a file) and pairs it with a hashing algorithm (i.e., a mathematical algorithm or hashing function) to create a unique output of a specific length. This output, called a hash value or a hash digest, represents the original data without making that original data known or available to access. This can be used for everything from data organization to file integrity verification.
Because this process is virtually irreversible, it means that hashing is a one-way cryptographic function.
Hashing Algorithms Create Outputs of Fixed Lengths
Let’s see hashes of two different words on application of SHA-256 hashing algorithm:
Dinosaur — bc6435e5ee1f976d1a61d19bc3f42164e4d0be09fdeac62b992b67b71e731203
Dog — 0eb129bf94594aaeee66e38361d7be212cd927c3df4dd92e3ded2e0da0c7ad88
As you can see, despite the obvious difference in the size (of the text!), you get the output of the same size. But what happens if you just make a tiny change to one of the words? Let’s these changes affect the hash values for the same word:
Original input: Dinosaur — bc6435e5ee1f976d1a61d19bc3f42164e4d0be09fdeac62b992b67b71e731203
Input variation #1: DInosaur — 9e4a334d6d23566ddedbba353057fe8b1ff6d8999cc3eff2222beec170de1558
Input variation #2: DinosauR — a5ed440e539385d0fad5ae3debb994ccb0092f20375c05b16a7860a9bbb739fb
Input variation #3: DinosAur — 05ca80d5308ce846f13eb4e9ee31deb7f5ea93eb347b87c83d5cfe573b6c2ac0
Do you see how the hashes are completely different for the same word? The only difference, as you can see, is that we’ve capitalized some of the letters. So, the conclusion here is that hashes cannot be the same for different inputs — they’re always unique.
Where You Can Find Hashing In Action
Because of this unique functionality, hashes are used in identifying or comparing files or databases. For example, when you create a password for a new email account, the email service provider saves the hash value of your password on its server. So, whenever you try to log in, it compares the hash value of the password you just entered to the hash value it has saved on its server. It grants you access if both pieces of data match.
Hashing functions are often used in conjunction with digital signatures. Some of the uses of hashing we’ll talk about later include:
- Storing passwords on servers,
- Protecting the integrity of specific data in SSL/TLS handshakes,
- Providing data integrity assurance in emails and messaging apps, and
- Using hashing functions as part of code signing certificates & digital signature processes.
Why Hashing (and Hash Functions) Matter
Hash functions are important to data security because they allow users to verify the integrity of the files or online communications they receive. Hashing helps you protect your files against unauthorized changes and alterations so that your users can realize that they’ve been changed in some way.
You can think of hashing as the digital equivalent of tamper-resistant product packaging; if anyone tries to mess with it, then the person who buys the package will recognize that it’s been altered. No matter the input or size, hashing makes sure that it’s packed in the same sized packaging, although no packaging is identical.
What Makes a Strong Hash Algorithm (Hash Function)
Hashing functions play a significant role in cryptography as they carry important characteristics that help in data authentication and keep sensitive data (like passwords) secure. These characteristics include the following:
Irreversibility
Hashing algorithms are one-way functions — you can’t figure out the original input data using the hash value. This means that you can easily convert an input into a hash, but you can’t derive the input from its hash value (so long as the hashing algorithm isn’t broken).
This is important because hashes are used in many functionalities, such as storing passwords on public servers. Because it’s irreversible, bad guys can’t reverse engineer the password from the hash even if they get their hands on the password hash database. Now you get why companies (good ones) don’t store your passwords in plaintext.
Determinism
The output length of all hashing algorithms should be the same, regardless of the length of the input size. It comes in handy for allocating space for the digest in a data structure, file format, or network protocol field, as you know how much length it’ll require. It also helps to keep hackers from knowing how large the original input was because all outputs, regardless of how long or short the original input is, are fixed length and don’t vary.
Collision Resistance
In hashing, a collision is said to have occurred when hash values of two different inputs produce identical outputs. In such a case, cybercriminals can fool the computer into believing that they have the original input data, even if they don’t have it. This is known as a hash collision attack.
Let’s say attackers have found out that the hash value of your password and another input share the same hash value. Now, they can use the hash value of the input – which is not the same thing as the input (password) itself – as the hash value of your password to log into your account even if they don’t have your password in the first place.
Another concern is rainbow tables. Bad guys can create massive numbers of precomputed password-hash combination “chains” and then winnow the information down to storing just the first and last password-hash combination, creating what’s known as a rainbow table. This table enables them to quickly look up unsalted password hashes to figure out the original password inputs.
This is why all hashing algorithms must be resistant to collisions. One way to mitigate collisions in password hashing and mitigate the risk of rainbow table attacks is to use salting. (We’ll speak about salting more in a minute.)
Avalanche Effect
As we saw in the above-given example of the word ‘dinosaur,’ even the smallest change in the input results in a significant change in the hash value output. This ensures that no one can decipher the original text. This is known as the ‘avalanche effect’ because it’s similar to the concept of how the tiniest change or shift in the snow build-ups on a mountainside can trigger an avalanche.
Speed
When a use enters the password of his/her account, they expect to log in in microseconds. This can only happen if the hashing function performs at extremely high speed in creating hashes. However, not all hashing functions are supposed to be quick. Some functionalities require hashing functions to be slow. This is seen in the calculation of a password hash. In this situation, you want the calculation to be slower to make it harder (i.e., more time consuming) for attackers to brute-force users’ passwords (if the password hash database gets stolen) or carry out rainbow table attacks.
Types of Hashing Algorithms
When it comes to types of hashing algorithms, the major ones are classified in their families. Let’s have a look at three of the most prevalent families of hashing algorithms.
- Secure Hash Algorithm (SHA) — This family of hashes is one of the most widely used algorithms today. It encompasses SHA-1, SHA-2 and SHA-3 algorithms, which also have their own sub-families. Today SHA-2 and SHA-3 are in use as SHA-1 has been deprecated. SHA-2 family contains includes SHA-224, SHA-256, SHA-384, and SHA-512 and SHA-3 includes SHA3-224, SHA3-256, SHA3-384, and SHA3-512.
- Message Digest (MD) — Once quite a popular algorithm, MD is no longer used as it’s been broken. MD family contains MD2, MD4, MD5, and MD6. MD5 was one of the most used one but no longer as it’s no longer collision-proof.
- Microsoft LANMAN — Microsoft LANMAN is a LAN Manager hashing algorithm that legacy Windows systems use to store passwords. It relied on the DES algorithm to execute hashing but its implementation wasn’t secure enough, which led to vulnerabilities. As a result, this algorithm has been deprecated.
- Windows NTHash — Windows NTHash, also known as a Unicode hash or NTLM, is commonly deployed in Windows systems. NTHash also comes with its share of vulnerabilities but it’s still a crucial part of Windows systems. NTLMv2 is the latest version of NTHash being used today as NTLMv1 has mostly been deprecated.
Apart from these, some other popular hashing algorithms include BLAKE 3, RIPEMD-160, and WHIRLPOOL, etc.
Hashing and Encryption Are NOT the Same
People often get confused between hashing and encryption. We can’t blame them, seeing as how they seem quite similar when it comes to their functionalities. However, there’s a stark difference between the two of them:
- Hashing is a one-way function or process. What this means is that once an input gets hashed, there’s no way back. It’s one of the things that make hashes so unique.
- Encryption, on the other hand, is a two-way method. This is an entirely different process that can’t be reversed or decrypted (because there’s nothing to decrypt). It means that when something is encrypted, it’s supposed to be decrypted, meaning it’s essentially reverted to its original form.
Now that you know what a hashing function is and how hash functions are beneficial to businesses, let’s explore how they work.
How Do Hashing Algorithms Work?
As we touched on earlier, a hashing function takes a piece of data and runs it through a mathematical algorithm. The input data size doesn’t matter — whether you’re hashing a password or the entire Encyclopaedia Britannica, that hashing algorithm will generate a fixed-length output. This means that both inputs will result in outputs of equal length.
In many algorithms, this is made possible by dividing the input data into several small blocks of the same size. If the size is less, then it’s compensated by adding a string of 0s and 1s, known as padding.
In some methods of hashing, that original data input is broken up into smaller blocks of equal size. If there isn’t enough data in any of the blocks for it to be the same size, then padding (1s and 0s) can be used to fill it out.
Here’s a slightly more advanced graphic that illustrates how hashing works when taking an input and dividing it into multiple smaller data blocks:
Okay, having this foundational knowledge about hashing is good. But how does hashing play a role in your everyday data security?
4 Ways Your Company Can Use Hashing Algorithms to Secure Data
Since its inception, hashing has been a revelation for modern-day computing. Not only has it had a transformative impact on security, but it also has found uses in many other functions. Let’s explore three use applications of hashing algorithms in the world of security.
1. Ensures Data Integrity in SSL/TLS Handshake
Hashing plays a crucial role in SSL/TLS handshakes, which are used when it comes to data encryption and identity verification in websites. In the secure TLS protocol, the client creates a digital signature by creating one-way hash from a random value generated during the handshake. This data is encrypted by the private key of the SSL/TLS certificate and then verified through decryption. This helps to ensure the integrity of data that’s part of the handshake.
2. Allows Users to Securely Store Their Passwords on Your Site
Thanks to its irreversibility and collision resistance, hashing algorithms come quite handy when it comes to storing passwords on a server. Whenever a user submits his/her user ID and password, the former is stored in plaintext while the latter is stored in its ‘salted,’ hashed version in the server database. Whenever a user tries to log in, its user ID is searched in the database. If it matches, then the hash values of entered password and the stored password are also compared. If that also matches, the user is given access to the respective account.
But what do we mean by salting? Here, the term ‘salting’ means that the website adds a random integer to the end of the password (like changing the password “TweetyBird” to “TweetyBird1”). This changes the entire hash of the password, thereby increasing the unpredictability of the hash to mitigate collisions and protect it against hash table lookups and rainbow table attacks.
3. Assures Data Integrity in Emails & Messaging Apps
Your data is the most vulnerable when it’s being transmitted. A perpetrator might want to intercept and tamper with the data for their benefit. This is where hashing algorithm comes into play.
Let’s say you’re sending a message to someone but want to ensure that the receiver receives the message in its original unaltered format. To do that, you can send both the message and its hash value to the recipient. On the other hand, the recipient can compare the hash value of the message with the hash value sent by you. Thus, ensuring that no modification is done while being transmitted.
Such data integrity check is usually implemented in emails and messaging apps, although they run in the background within the email or chat clients, so users don’t see them happening.
4. Ensures File Integrity Assurance Through Code Signing Certificates & Digital Signatures
Another major use of hashing algorithms comes in code signing certificates. Code signing certificates are used to provide unique identity through digital signature for various files such as applets, macros, plug-ins, codes, and other executable files before publishing on the internet. This combination of hashing and digital signatures helps you to provide identity assurance and data integrity assurance:
- Verifiable identity assures users that the software creator is who they claim to be, and
- Data integrity lets the user know that the software hasn’t been tampered with since it was signed.
Here, a hashing algorithm is applied to the code/executable, which then results in creating a hash digest. That resulting hash value is then signed using the certificate’s private key, which results in creating the digital signature.
A digital signature, code signing certificate, and hash function information form a signature block, which is placed in the software that an end-user receives. Thus, the end user’s computer first checks the authenticity of the certificate and then of the hash value using the hash function — providing two-layered authentication.
Be sure to check out our resource on code signing best practices.
What Users See When They Install Your Digitally Signed Software
When users download or try to install software applications, one of the two following warning messages typically appears. Here’s a side-by-side comparison of what users see when they try to install unsigned software versus digitally signed software:
Code signing enables your verified organization’s name to display as the verified publisher in Windows User Access Control (UAC) pop-ups.1
All unsigned software and executables that are signed using standard code signing certificates trigger Windows Defender SmartScreen warnings. Here’s an example of an unsigned software application:
But what if you don’t want Windows Defender SmartScreen warning to trigger? Certain types of code signing certificates (i.e., extended validation code signing certificates) can get rid of the SmartScreen messages altogether because Windows operating systems and browsers automatically trust them. This means that these warning messages will no longer display when users download and install your software.
Concluding Thoughts on Hashing Algorithms and Functions
Whether we’re browsing the internet, buying something online, or signing documents digitally, virtually all internet users are using hashing algorithms at one moment or another (even when we don’t know we’re using them). A hash algorithm
If you’re a website administrator, a developer, or a digital business owner, it becomes really important for you to understand the significance these algorithms hold. Thus, you must ensure that you’re using the right hashing algorithms and cryptographic tools in all the right places — be it in the form of an SSL/TLS certificate, code signing certificate, document signing certificate, or password hashing process.
We hope this article has helped you gain a deeper understanding of what a hashing algorithm is and how it helps you protect your data and your organization’s reputation.