hash-collision

Uniquely identifying URLs with one 64-bit number

一世执手 提交于 2019-12-09 06:28:48
问题 This is basically a math problem, but very programing related: if I have 1 billion strings containing URLs, and I take the first 64 bits of the MD5 hash of each of them, what kind of collision frequency should I expect? How does the answer change if I only have 100 million URLs? It seems to me that collisions will be extremely rare, but these things tend to be confusing. Would I be better off using something other than MD5? Mind you, I'm not looking for security, just a good fast hash

Is a hash result ever the same as the source value?

不问归期 提交于 2019-12-07 08:47:34
问题 This is more of a cryptography theory question, but is it possible that the result of a hash algorithm will ever be the same value as the source? For example, say I have a string: baf34551fecb48acc3da868eb85e1b6dac9de356 If I get the SHA1 hash on it, the result is: 4d2f72adbafddfe49a726990a1bcb8d34d3da162 In theory, is there ever a case where these two values would match? I'm not asking about SHA1 specifically here - it's just my example. I'm just wondering if hashing algorithms are built in

512 bit hash vs 4 128bit hash

余生长醉 提交于 2019-12-07 05:06:32
问题 Interestingly I haven't found enough information regarding any test or experiment of collision chances of single 512bit hash like whirlpool versus concatenation of 4 128bit hashes like md5, sha1 etc. Possibility of 4 128bit hashes to appear same seems less probable than single 512bit hash when the data on which hashing is performed is considerably of small size merely on average 100 characters. But its just an apparent guess with no basis because I haven't performed any test. What you think

Is a hash result ever the same as the source value?

走远了吗. 提交于 2019-12-05 17:53:36
This is more of a cryptography theory question, but is it possible that the result of a hash algorithm will ever be the same value as the source? For example, say I have a string: baf34551fecb48acc3da868eb85e1b6dac9de356 If I get the SHA1 hash on it, the result is: 4d2f72adbafddfe49a726990a1bcb8d34d3da162 In theory, is there ever a case where these two values would match? I'm not asking about SHA1 specifically here - it's just my example. I'm just wondering if hashing algorithms are built in such a way as to prevent this. Well, it would depend on the hashing algorithm - but I'd be surprised to

Are hash collisions with different file sizes just as likely as same file size?

限于喜欢 提交于 2019-12-04 02:49:26
I'm hashing a large number of files, and to avoid hash collisions, I'm also storing a file's original size - that way, even if there's a hash collision, it's extremely unlikely that the file sizes will also be identical. Is this sound (a hash collision is equally likely to be of any size), or do I need another piece of information (if a collision is more likely to also be the same length as the original). Or, more generally: Is every file just as likely to produce a particular hash, regardless of original file size? Depends on your hash function, but in general, files that are of the same size

What Exactly is Hash Collision

时间秒杀一切 提交于 2019-12-03 08:52:02
问题 Hash Collision or Hashing Collision in HashMap is not a new topic and I've come across several blogs and discussion boards explaining how to produce Hash Collision or how to avoid it in an ambiguous and detailed way. I recently came across this question in an interview. I had lot of things to explain but I think it was really hard to precisely give the right explanation. Sorry if my questions are repeated here, please route me to the precise answer: What exactly is Hash Collision - is it a

Uniquely identifying URLs with one 64-bit number

那年仲夏 提交于 2019-12-03 08:22:05
This is basically a math problem, but very programing related: if I have 1 billion strings containing URLs, and I take the first 64 bits of the MD5 hash of each of them, what kind of collision frequency should I expect? How does the answer change if I only have 100 million URLs? It seems to me that collisions will be extremely rare, but these things tend to be confusing. Would I be better off using something other than MD5? Mind you, I'm not looking for security, just a good fast hash function. Also, native support in MySQL is nice. EDIT : not quite a duplicate If the first 64 bits of the MD5

substr md5 collision

非 Y 不嫁゛ 提交于 2019-12-01 17:39:22
I need a 4-character hash. At the moment I am taking the first 4 characters of a md5() hash. I am hashing a string which is 80 characters long or less. Will this lead to collision? or, what is the chance of collision, assuming I'll hash less than 65,536 (16 4 ) different elements? Surprisingly high indeed. As you can see from this graph of an approximate collision probability (formula from the wikipedia page ), with just a few hundred elements your probability of having a collision is over 50%. Note, of course, if you're facing the possibility of an attacker providing the string, you can

substr md5 collision

淺唱寂寞╮ 提交于 2019-12-01 16:35:47
问题 I need a 4-character hash. At the moment I am taking the first 4 characters of a md5() hash. I am hashing a string which is 80 characters long or less. Will this lead to collision? or, what is the chance of collision, assuming I'll hash less than 65,536 (16 4 ) different elements? 回答1: Surprisingly high indeed. As you can see from this graph of an approximate collision probability (formula from the wikipedia page), with just a few hundred elements your probability of having a collision is

Does a HashMap collision cause a resize?

半城伤御伤魂 提交于 2019-12-01 08:46:45
When there is a collision during a put in a HashMap is the map resized or is the entry added to a list in that particular bucket? When you say 'collision', do you mean the same hashcode? The hashcode is used to determine what bucket in a HashMap is to be used, and the bucket is made up of a linked list of all the entries with the same hashcode. The entries are then compared for equality (using .equals()) before being returned or booted (get/put). Note that this is the HashMap specifically (since that's the one you asked about), and with other implementations, YMMV. Either could happen - it