When is it appropriate to use CRC for error detection versus more modern hashing functions such as MD5 or SHA1? Is the former easier to implement on embedded hardware?
Only use CRC if computation resources are very tight (i.e. some embed environments) or you need to store/transport many output values and space/bandwidth is tight (as CRCs are usually 32-bit where an MD5 output is 128-bit, SHA1 160 bit, and other SHA variants up to 512 bit).
Never use CRC for security checks as a CRC is very easy to "fake".
Even for accidental error detection (rather than malicious change detection) hashes are better than a simple CRC. Partly because of the simple way a CRC is calculated (and partly because CRC values are usual shorter than common hash outputs so have a much smaller range of possible values) it is much more likely that, in a situation where there are two or more errors, one error will mask another so you end up with the same CRC despite two errors.
In short: unless you have reason not to use a decent hash algorithm, avoid simple CRCs.
I came across a use of CRC recently which was smart. The author of the jdupe file duplication identification and removal tool (the same author of the popular exif tool jhead) uses it during the first pass through the files. A CRC is computed on the first 32K of each file to mark files that appear to be the same, also the files must have the same size. These files are added to a list of files on which to do a full binary comparison. It speeds up checking large media files.
Lets start with the basics.
In Cryptography, a hashing algorithm converts many bits to fewer bits through a digest operation. Hashes are used to confirm integrity of messages and files.
All hashing algorithms generate collisions. A collision is when several many-bit combinations produce the same fewer bit output. The cryptographic strength of a hashing algorithm is defined by the inability for an individual to determine what the output is going to be for a given input because if they could they could construct a file with a hash that matches a legitimate file and compromise the assumed integrity of the system. The difference between CRC32 and MD5 is that MD5 generates a larger hash that's harder to predict.
When you want to implement message integrity - meaning the message hasn't been tampered with in transit - the inability to predict collisions is an important property. A 32-bit hash can describe 4 billion different messages or files using 4 billion different unique hashes. If you have 4 billion and 1 files, you are guaranteed to have 1 collision. 1 TB Bitspace has the possibility for Billions of Collisions. If I'm an attacker and I can predict what that 32 bit hash is going to be, I can construct an infected file that collides with the target file; that has the same hash.
Additionally if I'm doing 10mbps transmission then the possibility of a packet getting corrupted just right to bypass crc32 and continue along the to the destination and execute is very low. Lets say at 10mbps I get 10 errors\second. If I ramp that up to 1gbps, now I'm getting 1,000 errors per second. If I ram up to 1 exabit per second, then I have an error rate of 1,000,000,000 errors per second. Say we have a collision rate of 1\1,000,000 transmission errors, Meaning 1 in a million transmission errors results in the corrupt data getting through undetected. At 10mbps I'd get error data being sent every 100,000 seconds or about once a day. At 1gbps it'd happen once every 5 minutes. At 1 exabit per second, we're talking several times a second.
If you pop open Wireshark you'll see your typical Ethernet header has a CRC32, your IP header has a CRC32, and your TCP Header has a CRC32, and that's in addition to the what the higher layer protocols may do; e.g. IPSEC might use MD5 or SHA for integrity checking in addition to the above. There are several layers of error checking in typical network communications, and they STILL goof now and again at sub 10mbps speeds.
Cyclic Redundancy Check (CRC) has several common versions and several uncommon but generally is designed to just tell when a message or file has been damaged in transit (multiple bits flipping). CRC32 by itself is not a very good error checking protocol by today's standards in large, scalar enterprise environments because of the collision rate; the average users hard-drive can have upwards of 100k files, and file-shares on a company can have tens of millions. The ratio of hash-space to the number of files is just too low. CRC32 is computationally cheap to implement whereas MD5 isn't.
MD5 was designed to stop intentional use of collisions to make a malicious file look benign. It's considered insecure because the hashspace has been sufficiently mapped to enable some attacks to occur, and some collisions are predictable. SHA1 and SHA2 are the new kids on the block.
For file verification, Md5 is starting to be used by a lot of vendors because you can do multigigabyte files or multiterrabyte files quickly with it and stack that on top of the general OS's use and support of CRC32's. Do not be surprised if within the next decade filesystems start using MD5 for error checking.
CRC is designed against unintentional changes in the data. That is, it's good for detecting unintentional errors, but will be useless as a way of making sure a data was not maliciously handled.
Also see this.
CRC32 is faster and the hash is only 32bits long.
Use it when you just want a quick and light checksum. CRC is used in ethernet.
If you need more reliability it's preferable to use a modern hashing function.
CRC32 is way faster and sometimes has hardware support (i.e. on Nehalem processors). Really, the only time you'd use it is if you're interfacing with hardware, or if you're really tight on performance