Is this an appropriate use of python's built-in hash function?

独自空忆成欢 提交于 2019-11-29 03:24:37
phihag

Python's hash function is designed for speed, and maps into a 64-bit space. Due to the birthday paradox, this means you'll likely get a collision at about 5 billion entries (probably way earlier, since the hash function is not cryptographical). Also, the precise definition of hash is up to the Python implementation, and may be architecture- or even machine-specific. Don't use it you want the same result on multiple machines.

md5 is designed as a cryptographic hash function; even slight perturbations in the input totally change the output. It also maps into a 128-bit space, which makes it unlikely you'll ever encounter a collision at all unless you're specifically looking for one.

If you can handle collisions (i.e. test for equality between all members in a bucket, possibly by using a cryptographic algorithm like MD5 or SHA2), Python's hash function is perfectly fine.

One more thing: To save space, you should store the data in binary form if you write it to disk. (i.e. struct.pack('!q', hash('abc')) / hashlib.md5('abc').digest()).

As a side note: is is not equivalent to == in Python. You mean ==.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!