Create (sane/safe) filename from any (unsafe) string

前端 未结 11 1205
醉酒成梦
醉酒成梦 2020-12-28 12:45

I want to create a sane/safe filename (i.e. somewhat readable, no \"strange\" characters, etc.) from some random Unicode string (mich might contain just anything).

(

相关标签:
11条回答
  • 2020-12-28 13:20

    Here is what I came with, being inspired by uglycoyote:

    import time
    
    def make_safe_filename(s):
        def safe_char(c):
            if c.isalnum() or c=='.':
                return c
            else:
                return "_"
    
        safe = ""
        last_safe=False
        for c in s:
          if len(safe) > 200:
            return safe + "_" + str(time.time_ns() // 1000000)
    
          safe_c = safe_char(c)
          curr_safe = c != safe_c
          if not last_safe or not curr_safe:
            safe += safe_c
          last_safe=curr_safe
        return safe
    

    And to test:

    print(make_safe_filename( "hello you crazy $#^#& 2579 people!!! : hi!!!hello you crazy $#^#& 2579 people!!! : hi!!!hello you crazy $#^#& 2579 people!!! : hi!!!hello you crazy $#^#& 2579 people!!! : hi!!!hello you crazy $#^#& 2579 people!!! : hi!!!hello you crazy $#^#& 2579 people!!! : hi!!!hello you crazy $#^#& 2579 people!!! : hi!!!hello you crazy $#^#& 2579 people!!! : hi!!!hello you crazy $#^#& 2579 people!!! : hi!!!hello you crazy $#^#& 2579 people!!! : hi!!!hello you crazy $#^#& 2579 people!!! : hi!!!hello you crazy $#^#& 2579 people!!! : hi!!!hello you crazy $#^#& 2579 people!!! : hi!!!hello you crazy $#^#& 2579 people!!! : hi!!!hello you crazy $#^#& 2579 people!!! : hi!!!hello you crazy $#^#& 2579 people!!! : hi!!!hello you crazy $#^#& 2579 people!!! : hi!!!hello you crazy $#^#& 2579 people!!! : hi!!!hello you crazy $#^#& 2579 people!!! : hi!!!hello you crazy $#^#& 2579 people!!! : hi!!!" ) + ".gif")
    
    0 讨论(0)
  • 2020-12-28 13:23

    Python:

    for c in r'[]/\;,><&*:%=+@!#^()|?^':
        filename = filename.replace(c,'')
    

    (just an example of characters you will want to remove) The r in front of the string makes sure the string is interpreted in it's raw format, allowing you to remove backslash \ as well

    Edit: regex solution in Python:

    import re
    re.sub(r'[]/\;,><&*:%=+@!#^()|?^', '', filename)
    
    0 讨论(0)
  • 2020-12-28 13:24

    I admit there are two schools of thought regarding DIY vs dependencies. But I come from the firm school of thought that prefers not to reinvent wheels, and to see canonical approaches to simple tasks like this. To wit I am a fan of the pathvalidate library

    https://pypi.org/project/pathvalidate/

    Which includes a function sanitize_filename() which does what you're after.

    I would preference this to any one of the numerous home baked solutions. In the ideal I'd like to see a sanitizer in os.path which is sensitive to filesystem differences and does not do unnecessary sanitising. I imagine pathvalidate takes the conservative approach and produces valid filenames that can span at least NTFS and ext4 comfortably, but it's hard to imagine it even bothers with old DOS constraints.

    0 讨论(0)
  • 2020-12-28 13:25

    There are a few reasonable answers here, but in my case I want to take something which is a string which might have spaces and punctuation and rather than just removing those, i would rather replace it with an underscore. Even though spaces are an allowable filename character in most OS's they are problematic. Also, in my case if the original string contained a period I didn't want that to pass through into the filename, or it would generate "extra extensions" that I might not want (I'm appending the extension myself)

    def make_safe_filename(s):
        def safe_char(c):
            if c.isalnum():
                return c
            else:
                return "_"
        return "".join(safe_char(c) for c in s).rstrip("_")
    
    print(make_safe_filename( "hello you crazy $#^#& 2579 people!!! : die!!!" ) + ".gif")
    

    prints:

    hello_you_crazy_______2579_people______die___.gif

    0 讨论(0)
  • 2020-12-28 13:26

    Another approach is to specify a replacement for any unwanted symbol. This way filename may look more readable.

    >>> substitute_chars = {'/':'-', ' ':''}
    >>> filename = 'Cedric_Kelly_12/10/2020 7:56 am_317168.pdf'
    >>> "".join(substitute_chars.get(c, c) for c in filename)
    'Cedric_Kelly_12-10-20207:56am_317168.pdf'
    
    0 讨论(0)
  • 2020-12-28 13:32

    My requirements were conservative ( the generated filenames needed to be valid on multiple operating systems, including some ancient mobile OSs ). I ended up with:

        "".join([c for c in text if re.match(r'\w', c)])
    

    That white lists the alphanumeric characters ( a-z, A-Z, 0-9 ) and the underscore. The regular expression can be compiled and cached for efficiency, if there are a lot of strings to be matched. For my case, it wouldn't have made any significant difference.

    0 讨论(0)
提交回复
热议问题