How can I safely encode a string in Java to use as a filename?

后端 未结 9 1925
失恋的感觉
失恋的感觉 2020-11-29 22:53

I\'m receiving a string from an external process. I want to use that String to make a filename, and then write to that file. Here\'s my code snippet to do this:



        
相关标签:
9条回答
  • 2020-11-29 23:22

    You could remove the invalid chars ( '/', '\', '?', '*') and then use it.

    0 讨论(0)
  • 2020-11-29 23:23

    This is probably not the most effective way, but shows how to do it using Java 8 pipelines:

    private static String sanitizeFileName(String name) {
        return name
                .chars()
                .mapToObj(i -> (char) i)
                .map(c -> Character.isWhitespace(c) ? '_' : c)
                .filter(c -> Character.isLetterOrDigit(c) || c == '-' || c == '_')
                .map(String::valueOf)
                .collect(Collectors.joining());
    }
    

    The solution could be improved by creating custom collector which uses StringBuilder, so you do not have to cast each light-weight character to a heavy-weight string.

    0 讨论(0)
  • 2020-11-29 23:26

    For those looking for a general solution, these might be common critera:

    • The filename should resemble the string.
    • The encoding should be reversible where possible.
    • The probability of collisions should be minimized.

    To achieve this we can use regex to match illegal characters, percent-encode them, then constrain the length of the encoded string.

    private static final Pattern PATTERN = Pattern.compile("[^A-Za-z0-9_\\-]");
    
    private static final int MAX_LENGTH = 127;
    
    public static String escapeStringAsFilename(String in){
    
        StringBuffer sb = new StringBuffer();
    
        // Apply the regex.
        Matcher m = PATTERN.matcher(in);
    
        while (m.find()) {
    
            // Convert matched character to percent-encoded.
            String replacement = "%"+Integer.toHexString(m.group().charAt(0)).toUpperCase();
    
            m.appendReplacement(sb,replacement);
        }
        m.appendTail(sb);
    
        String encoded = sb.toString();
    
        // Truncate the string.
        int end = Math.min(encoded.length(),MAX_LENGTH);
        return encoded.substring(0,end);
    }
    

    Patterns

    The pattern above is based on a conservative subset of allowed characters in the POSIX spec.

    If you want to allow the dot character, use:

    private static final Pattern PATTERN = Pattern.compile("[^A-Za-z0-9_\\-\\.]");
    

    Just be wary of strings like "." and ".."

    If you want to avoid collisions on case insensitive filesystems, you'll need to escape capitals:

    private static final Pattern PATTERN = Pattern.compile("[^a-z0-9_\\-]");
    

    Or escape lower case letters:

    private static final Pattern PATTERN = Pattern.compile("[^A-Z0-9_\\-]");
    

    Rather than using a whitelist, you may choose to blacklist reserved characters for your specific filesystem. E.G. This regex suits FAT32 filesystems:

    private static final Pattern PATTERN = Pattern.compile("[%\\.\"\\*/:<>\\?\\\\\\|\\+,\\.;=\\[\\]]");
    

    Length

    On Android, 127 characters is the safe limit. Many filesystems allow 255 characters.

    If you prefer to retain the tail, rather than the head of your string, use:

    // Truncate the string.
    int start = Math.max(0,encoded.length()-MAX_LENGTH);
    return encoded.substring(start,encoded.length());
    

    Decoding

    To convert the filename back to the original string, use:

    URLDecoder.decode(filename, "UTF-8");
    

    Limitations

    Because longer strings are truncated, there is the possibility of a name collision when encoding, or corruption when decoding.

    0 讨论(0)
提交回复
热议问题