Hashing and encryption technique for a huge data set containing phone numbers

馋奶兔 提交于 2019-12-05 08:08:26

Generate a key for your data set (16 or 32 bytes) and keep it secret. Use Hmac-sha1 on your data with this key, and base 64 encode that and you have a random unique string per phonenumber that isn't reversable (without the key).

Example (Hmac-Sha1 with 256bit key) using Keyczar:

Create random secret key:

$> python keyczart.py create --location=path_to_key_set --purpose=sign
$> python keyczart.py addkey --location=path_to_key_set --status=primary

Anonymize phone number:

from keyczar import keyczar

def anonymize(phone_num):
  signer = keyczar.Signer.Read("path_to_key_set");
  return signer.Sign(phone_num)

If you're going to use cryptography, you want to apply a pseudorandom function to each phone number and throw away the key. Collision-resistant hashes such as SHA-256 do not provide the right security guarantees. Really, though, are there that many different phone numbers that you can't just construct incrementally a map representing an actually random function?

sort your data by the respective column and start counting distinct values ... replace the actual values with their respective counter value ... collision free ... one way ...

"So, basically I would need a one-way hashing function but without redundancy - Each phone number should map to unique value --Two phones numbers should not map to a same value."

This screams for a solution based on a cryptographic hash function. MD5 and SHA-1 are the best known examples, and work wonderfully for this. You will read that "MD5 has been cracked", but for your purpose that doesn't matter.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!