I was reading this tutorial for a simple PHP login system.
In the end it recommends that you should encrypt your password using md5().
Though I know this is a be
You're missing the important step - the salt. This is a unique (per user, ideally) bit of extra data that you add to the password before hashing it.
http://en.wikipedia.org/wiki/Salt_%28cryptography%29
Your idea (salting) is well known and is actually well-implemented in the PHP language. If you use the crypt() function it allows you to specify a string to hash, a method to encrypt (in some cases), and a salt. For example,
$x = crypt('insecure_password', $salt);
Returns a hashed and salted password ready for storage. Passwords get cracked the same way that we check if they're right: we check the hash of what the user inputs against the hash of their password in the database. If they match, they're authenticated (AFAIK this is the most common way to do this, if not the only). Insecure passwords (like password
) that use dictionary words can be cracked by comparing their hash to hashes of common passwords. Secure passwords cannot be cracked this way, but can still be cracked. Adding a salt to the password makes it much more difficult to crack: since the hacker most likely doesn't know what the salt is, his dictionary attack won't work.
md5
(or better put: hash algorithms in general) are used to safely store passwords in database. The most important thing to know about hashes is: Hashes are not encryptions per se. (they are one-way-encryptions at most). If you encrypt something, you can get the data back with the key you used. A hash generates a fixed-length value from an arbitrary input (like a string), which can be used to see if the same input was used.
Hashes are used to store sensitive, repeatly entered data in a storage device. Doing this, nobody can recreate the original input from the hash data, but you can hash an incoming password and compare it to the value in the database, and see if both are the same, if so, the password was correct.
You already pointed out, that there possibilites to break the algorithm, either by using a database of value/hash pairs or producing collisions (different values resulting in the hash value). You can obscure this a bit by using a salt, thus modifying the algorithm. But if the salt is known, it can be used to break the algorithm again.
In addition to providing salt (or seed), the md5 is a complex hashing algorithm which uses mathematical rules to produce a result that is specifically not reversable because of the mathematical changes and dataloss in throughput.
http://en.wikipedia.org/wiki/Cryptographic_hash_function
Firstly, "hashing" (using a cryptographic one way function) is not "encrypting". In encryption, you can reverse the process (decryption). In hashing, there is (theoretically) no feasible way of reversing the process.
A hash is some function f such that v cannot be determined from f(v) easily.
The point of using hashing for authentication is that you (or someone seeing the hash value) do not have any feasible way (again, theoretically) of knowing the password. However, you can still verify that the user knows his password. (Basically, the user proves that he knows v such that f(v) is the stored hash).
The weakness of simply hashing (aside from weak hash functions) is that people can compile tables of passwords and their corresponding hash and use them to (effectively) get the inverse of the hash function. Salting prevents this because then a part of the input value to the hash is controlled and so tables have to be compiled for that particular salt.
So practically, you store a salt and a hash value, and authenticate by hashing a combination of the salt and the password and comparing that with your hash value.
For a decent hash the attacker won't be reversing the hash, they'll be using a rainbow table, which is essentially a brute-force method made useful if everyone uses the same hash function.
The idea of a rainbow table is that since hashing is fast I can hash every possible value you could use as a password, store the result, and have a map of which hash connects to which password. If everyone just takes their passwords and hashes them with MD5 then my hash table is good for any set of password hashes I can get my hands on!
This is where salting comes in. If I take the password the user enters and add some data which is different for every user, then that list of pre-determined hashes is useless since the hash is of both the password and some random data. The data for the salt could be stored right beside the password and even if I get both it doesn't help me get the password back since I still have to essentially brute force the hash separately for every single user - I can't form a single rainbow table to attack all the hashes at once.
Of course, ideally an attacker won't get the list of hashed passwords in the first place, but some employees will have access so it's not possible to secure the password database entirely.