How many random elements before MD5 produces collisions?

前端未结

关注

 8  1805

I\'ve got an image library on Amazon S3. For each image, I md5 the source URL on my server plus a timestamp to get a unique filename. Since S3 can\'t have subdirectories, I

相关标签:

8条回答

粉色の甜心

2020-11-22 12:31

A rough rule of thumb for collisions is the square-root of the range of values. Your MD5 sig is presumably 128 bits long, so you're going to be likely to see collisions above and beyond 2^64 images.

0 讨论(0)
发布评论:

提交评论
- 加载中...
长情又很酷

2020-11-22 12:33
So wait, is it:
```
md5(filename) + timestamp
```
or:
```
md5(filename + timestamp)
```
If the former, you are most of the way to a GUID, and I wouldn't worry about it. If the latter, then see Karg's post about how you will run into collisions eventually.
0 讨论(0)
发布评论:

提交评论
- 加载中...
陌清茗

2020-11-22 12:33

Although random MD5 collisions are exceedingly rare, if your users can provide files (that will be stored verbatim) then they can engineer collisions to occur. That is, they can deliberately create two files with the same MD5sum but different data. Make sure your application can handle this case in a sensible way, or perhaps use a stronger hash like SHA-256.

0 讨论(0)
发布评论:

提交评论
- 加载中...
南方客

2020-11-22 12:38

MD5 collision is extremely unlikely. If you have 9 trillion MD5s, there is only one chance in 9 trillion that there will be a collision.

0 讨论(0)
发布评论:

提交评论
- 加载中...
忘掉有多难

2020-11-22 12:39

S3 can have subdirectories. Just put a "/" in the key name, and you can access the files as if they were in separate directories. I use this to store user files in separate folders based on their user ID in S3.

For example: "mybucket/users/1234/somefile.jpg". It's not exactly the same as a directory in a file system, but the S3 API has some features that let it work almost the same. I can ask it to list all files that begin with "users/1234/" and it will show me all the files in that "directory".

0 讨论(0)
发布评论:

提交评论
- 加载中...
忘掉有多难

2020-11-22 12:41

Probability of just two hashes accidentally colliding is 1/2¹²⁸ which is 1 in 340 undecillion 282 decillion 366 nonillion 920 octillion 938 septillion 463 sextillion 463 quintillion 374 quadrillion 607 trillion 431 billion 768 million 211 thousand 456.

However if you keep all the hashes then the probability is a bit higher thanks to birthday paradox. To have a 50% chance of any hash colliding with any other hash you need 2⁶⁴ hashes. This means that to get a collision, on average, you'll need to hash 6 billion files per second for 100 years.

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页