A hash function as you all know takes any data and produces a digest of the same data, that “represents” it.
Many think of a hash function as a one-to-one function, meaning the same original value will produce the same digest value every time, and the same digest value will be produced by the same original data everytime.
This is only half-true. The same original value indeed produces the same digest value all of the time – this is why you can store the hash value of a password in the DB and validate this value against the entered password all of the time. But the same is not true for the second part of the sentence above: The same digest value can be produced by different original values in theory.
Why? Because not matter what the size of the original data is, you will always get a digest value of the same length – depends on the algorithm that is being used.
SHA256 for example always produces digest value of the size of 256 bits.
And if you take infinite length long data and infinite arrays of data, you will eventually get collisions.
So why is this only true mostly in theory? Because the digest key length is so large, that even for non-secured algorithms such as MD5, it is extremely hard to find 2 original values that will produce the same digest value. So it’s practically infeasible to find 2 identical values that will produce the same hash digest value, making it true mostly in theory.
In practice, collisions should never occur for secure hash functions.
About a month and a half ago, Google announced a shocking discovery: They have managed to find a mathematical weakness in the SHA1 algorithm, and produced a practical way to produce 2 different values that will produce the same SHA1 digest value.
They nicknamed that collision attack SHAttered.
What are the consequences? As Google say in their article: “The attacker could then use this collision to deceive systems that rely on hashes into accepting a malicious file in place of its benign counterpart. For example, two insurance contracts with drastically different terms”
It does not give you a practical way to easily brute-force SHA1 as “the SHA-1 shattered attack is still more than 100,000 times faster than a brute force attack which remains impractical.”
The research took 2 years to complete, and included some staggering amount of computational resources:
· Nine quintillion (9,223,372,036,854,775,808) SHA1 computations in total
· 6,500 years of CPU computation to complete the attack first phase
· 110 years of GPU computation to complete the second phase
Google announced that they will publish a tool that will allow anyone to create 2 different PDF files that produces the same SHA1 digest value.
What does that mean in our everyday PT lives? SHA1 is not safe anymore. It is not only weakened as previously believed, it is significantly weakened and should be avoided altogether.
Although it is still hard to brute-force SHA1 passwords, it should not be used for passwords nor for signing verification.
If you do encounter SHA1 being used to validate the authenticity and validity of a file, put a medium-high finding in your report saying this is obsolete and invalid.
Have a great day