What are the important points about cryptographic hash functions?

一曲冷凌霜 提交于 2019-11-28 20:48:47
Accipitridae

You may be confused, because the answer to the question you cite is confusing. One of the requirements for a cryptographic hash function is that it should be preimage resistant. That is, if you know MD5(x) but not the message x, then it is difficult to find any x' (either equal x or different from x) such that MD5(x') = MD5(x).

Being preimage resistant is a different property than being reversible. A function is reversible if given y = f(x) there is exactly one x which fits (whether this is easy or not). For example define f(x) = x mod 10. Then f is not reversible. From f(x) = 7 you can't determine whether x was 17, 27 or something else. But f is not preimage resistant, since values x' such that f(x) = 7 are easy to find. x' = 17, 27, 12341237 etc all work.

When doing crypto you usually need functions that are preimage resistant (and other properties such as collision resistance), not just something that is not reversible.

Warning: Long answer

I think all of these answers are missing a very important property of cryptographic hash functions: Not only is it impossible to compute the original message that was hashed to get a given hash, it's impossible to compute any message that would hash to a given hash value. This is called preimage resistance.

(By "impossible" - I mean that no one knows how to do it in less time than it takes to guess every possible message until you guess the one that was hashed into your hash.)

(Despite popular belief in the insecurity of MD5, MD5 is still preimage resistant. Anyone who doesn't believe me is free to give me anything that hashes to 2aaddf751bff2121cc51dc709e866f19. What MD5 doesn't have is collision resistance, which is something else entirely.)

Now, if the only reason you can't "work backwards" in a cryptographic hash function was because the hash function discards data to create the hash, then it would not guarantee preimage resistance: You can still "work backwards", and just insert random data wherever the hash function discards data, and while you wouldn't come up with the original message, you'd still come up with a message that hashes to the desired hash value. But you can't.

So the question becomes: Why not? (Or, in other words, how do you make a function preimage resistant?)

The answer is that cryptographic hash functions simulate chaotic systems. They take your message, break it into blocks, mix those blocks around, have some of the blocks interact with each other, mix those blocks around, and repeat that a lot of times (well, one cryptographic hash function does that; others have their own methods). Since the blocks interact with each other, block C not only has to interact with block D to produce block A, but it has to interact with block E to produce block B. Now, sure, you can find values of blocks C, D, E that would produce the blocks A and B in your hash value, but as you go further back, suddenly you need a block F that interacts with C to make D, and with E to make B, and no such block can do both at the same time! You must have guessed wrong values for C, D, and E.

While not all cryptographic hash functions are exactly as described above with block interaction, they have the same idea: That if you try to "work backwards", you're going to end up with a whole lot of dead ends, and the time it takes for you to try enough values to generate a preimage is on the order of hundreds to millions of years (depending on the hash function), not much better than the time it would take just to try messages until you find one that works.

1: The primary purpose of a hash is to map a very, very large space to a smaller but still very large space (e.g., MD5, which will take 'anything' and convert it into a space of size 2^128 -- big, but not nearly as big as aleph-0.)

In addition to other features, good hashes fill the destination space homogeneously. Bad hashes fill the space in a clumpy fashion, coming up with the same hash for many common inputs.

Imagine the idiotic hash function sum(), which just adds all the digits of the input number: it succeeds in mapping down, but there are a bunch of collisions (inputs with the same output, like 3 and 12 and 21) at the low end of the output space and the upper end of the space is nearly empty. As a result it makes very poor use of the space, is easy to crack, etc.

So a good hash that makes even use of the destination space will make it difficult to find two inputs with the same output, just by the odds: if MD5 were perfect, the odds that two inputs would have the same output would be 2^-128. That's pretty decent odds: the best you can do without resorting to a larger output space. (In truth MD5 isn't perfect, which is one of the things that makes it vulnerable.)

But it will still be true that a huge number of inputs will map to any given hash, because the input space is 'infinite', and dividing infinity by 2^128 still gives you infinity.

2: Yes, hashes always cause data loss, except in the case where your output space is the same as, or larger than, your input space -- and in that case you probably didn't need to hash!

3: For smaller inputs, best practice is to salt the input. Actually, that's good practice for any cryptographic hashing, because otherwise an attacker can feed you specific inputs and try to figure out which hash you are using. 'Salt' is just a set of additional information that you append (or prepend) to your input; you then hash the result.

edit: In cryptography, it is also important that the hash function is resistant to preimage attacks, intuitively, that is hard to guess the input for a given output even knowing many other input/output pairs. The "sum" function could probably be guessed rather easily (but since it destroys data still might not be easy to reverse).

These are the properties of hash functions in general.

A word of caution though, MD5 shouldn't be used anymore because of vulnerabilities that have been found in it. Check the 'Vulnerabilities' section and external links detailing these attacks. http://en.wikipedia.org/wiki/Md5 You can make an MD5 collision by changing only 128 bits in a message.

SHA-1 is safe for simple hashing although there are some attacks that would make it weaker against well-funded entities (Governments, large corporations)

SHA-256 is a safe starting point against technology for the next couple decades.

Yet the consensus answer to the question "why aren't MD5 hash values reversible?" is because "an infinite number of input strings will generate the same output."

This is true for any hash function, but it is not the essence of a cryptographic hash function.

For short input strings such as passwords it is theoretically possible to reverse a cryptographic hash function, but it ought to be computationally infeasible. I.e. your computation would run too long to be useful.

The reason for this infeasibility is that the input is so thoroughly "mixed together" in the hash value that it becomes impossible to disentangle it with any less effort than the brute force attack of computing the hash value for all inputs

"why aren't MD5 hash values reversible?" is because "an infinite number of input strings >will generate the same output"

this is the reason that it isn't possible to reverse the hash function (get the same input). cryptographic hash functions are collision resistant, that means that it's also hard to find another input value that maps to the same output (if your hash function was mod 2 : 134 mod 2 = 0; now you can't get the 134 back from the result, but we can stil find number 2 with the same output value (134 and 2 collide)).

When the input is smaller than the block size, padding is used to fit it to the block size.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!