safe enough 8-character short unique random string

前端 未结 7 1324
甜味超标
甜味超标 2021-02-01 01:34

I am trying to compute 8-character short unique random filenames for, let\'s say, thousands of files without probable name collision. Is this method safe enough?



        
相关标签:
7条回答
  • 2021-02-01 02:09

    Which method has less collisions, is faster and easier to read?

    TLDR

    The random.choice() is a bit faster, has about 3 orders of magnitude less collisions but is IMO slightly harder to read.

    import string   
    import uuid
    import random
    
    def random_choice():
        alphabet = string.ascii_lowercase + string.digits
        return ''.join(random.choices(alphabet, k=8))
    
    def truncated_uuid4():
        return str(uuid.uuid4())[:8]
    

    Test collisions

    def test_collisions(fun):
        out = set()
        count = 0
        for _ in range(1000000):
            new = fun()
            if new in out:
                count += 1
            else:
                out.add(new)
        print(count)
    
    test_collisions(random_choice)
    test_collisions(truncated_uuid4)
    

    Results on a single run with 10 million draws of 8-char uuids from the set abcdefghijklmnopqrstuvwxyz0123456789. Random choice vs truncated uuid4:

    • collisions: 17 - 11632
    • time (seconds): 37 - 63
    0 讨论(0)
  • 2021-02-01 02:18

    You can try this

    import random
    uid_chars = ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u',
                 'v', 'w', 'x', 'y', 'z','1','2','3','4','5','6','7','8','9','0')
    uid_length=8
    def short_uid():
        count=len(uid_chars)-1
        c=''
        for i in range(0,uid_length):
            c+=uid_chars[random.randint(0,count)]
        return c
    

    eg:

    print short_uid()
    nogbomcv
    
    0 讨论(0)
  • 2021-02-01 02:23

    From Python 3.6 you should probably use the secrets module. secrets.token_urlsafe() seems to work for your case just fine, and it is guaranteed to use cryptographically safe random sources.

    0 讨论(0)
  • 2021-02-01 02:23

    I am using hashids to convert a timestamp into a unique id. (You can even convert it back to a timestamp if you want).

    The drawback with this is if you create ids too fast, you will get a duplicate. But, if you are generating them with time in-between, then this is an option.

    Here is an example:

    from hashids import Hashids
    from datetime import datetime
    hashids = Hashids(salt = "lorem ipsum dolor sit amet", alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890")
    print(hashids.encode(int(datetime.today().timestamp()))) #'QJW60PJ1' when I ran it
    
    0 讨论(0)
  • 2021-02-01 02:27

    Is there a reason you can't use tempfile to generate the names?

    Functions like mkstemp and NamedTemporaryFile are absolutely guaranteed to give you unique names; nothing based on random bytes is going to give you that.

    If for some reason you don't actually want the file created yet (e.g., you're generating filenames to be used on some remote server or something), you can't be perfectly safe, but mktemp is still safer than random names.

    Or just keep a 48-bit counter stored in some "global enough" location, so you guarantee going through the full cycle of names before a collision, and you also guarantee knowing when a collision is going to happen.

    They're all safer, and simpler, and much more efficient than reading urandom and doing an md5.

    If you really do want to generate random names, ''.join(random.choice(my_charset) for _ in range(8)) is also going to be simpler than what you're doing, and more efficient. Even urlsafe_b64encode(os.urandom(6)) is just as random as the MD5 hash, and simpler and more efficient.

    The only benefit of the cryptographic randomness and/or cryptographic hash function is in avoiding predictability. If that's not an issue for you, why pay for it? And if you do need to avoid predictability, you almost certainly need to avoid races and other much simpler attacks, so avoiding mkstemp or NamedTemporaryFile is a very bad idea.

    Not to mention that, as Root points out in a comment, if you need security, MD5 doesn't actually provide it.

    0 讨论(0)
  • 2021-02-01 02:28

    You can try the shortuuid library.

    Install with : pip install shortuuid

    Then it is as simple as :

    > import shortuuid
    > shortuuid.uuid()
    'vytxeTZskVKR7C7WgdSP3d'
    
    0 讨论(0)
提交回复
热议问题