How to remove illegal characters from path and filenames?

前端 未结 29 3042
离开以前
离开以前 2020-11-22 17:18

I need a robust and simple way to remove illegal path and file characters from a simple string. I\'ve used the below code but it doesn\'t seem to do anything, what am I miss

相关标签:
29条回答
  • 2020-11-22 17:36

    Here is a function which replaces all illegal characters in a file name by a replacement character:

    public static string ReplaceIllegalFileChars(string FileNameWithoutPath, char ReplacementChar)
    {
      const string IllegalFileChars = "*?/\\:<>|\"";
      StringBuilder sb = new StringBuilder(FileNameWithoutPath.Length);
      char c;
    
      for (int i = 0; i < FileNameWithoutPath.Length; i++)
      {
        c = FileNameWithoutPath[i];
        if (IllegalFileChars.IndexOf(c) >= 0)
        {
          c = ReplacementChar;
        }
        sb.Append(c);
      }
      return (sb.ToString());
    }
    

    For example the underscore can be used as a replacement character:

    NewFileName = ReplaceIllegalFileChars(FileName, '_');
    
    0 讨论(0)
  • 2020-11-22 17:37

    I absolutely prefer the idea of Jeff Yates. It will work perfectly, if you slightly modify it:

    string regex = String.Format("[{0}]", Regex.Escape(new string(Path.GetInvalidFileNameChars())));
    Regex removeInvalidChars = new Regex(regex, RegexOptions.Singleline | RegexOptions.Compiled | RegexOptions.CultureInvariant);
    

    The improvement is just to escape the automaticially generated regex.

    0 讨论(0)
  • 2020-11-22 17:37
    public static bool IsValidFilename(string testName)
    {
        return !new Regex("[" + Regex.Escape(new String(System.IO.Path.GetInvalidFileNameChars())) + "]").IsMatch(testName);
    }
    
    0 讨论(0)
  • 2020-11-22 17:38

    Scanning over the answers here, they all** seem to involve using a char array of invalid filename characters.

    Granted, this may be micro-optimising - but for the benefit of anyone who might be looking to check a large number of values for being valid filenames, it's worth noting that building a hashset of invalid chars will bring about notably better performance.

    I have been very surprised (shocked) in the past just how quickly a hashset (or dictionary) outperforms iterating over a list. With strings, it's a ridiculously low number (about 5-7 items from memory). With most other simple data (object references, numbers etc) the magic crossover seems to be around 20 items.

    There are 40 invalid characters in the Path.InvalidFileNameChars "list". Did a search today and there's quite a good benchmark here on StackOverflow that shows the hashset will take a little over half the time of an array/list for 40 items: https://stackoverflow.com/a/10762995/949129

    Here's the helper class I use for sanitising paths. I forget now why I had the fancy replacement option in it, but it's there as a cute bonus.

    Additional bonus method "IsValidLocalPath" too :)

    (** those which don't use regular expressions)

    public static class PathExtensions
    {
        private static HashSet<char> _invalidFilenameChars;
        private static HashSet<char> InvalidFilenameChars
        {
            get { return _invalidFilenameChars ?? (_invalidFilenameChars = new HashSet<char>(Path.GetInvalidFileNameChars())); }
        }
    
    
        /// <summary>Replaces characters in <c>text</c> that are not allowed in file names with the 
        /// specified replacement character.</summary>
        /// <param name="text">Text to make into a valid filename. The same string is returned if 
        /// it is valid already.</param>
        /// <param name="replacement">Replacement character, or NULL to remove bad characters.</param>
        /// <param name="fancyReplacements">TRUE to replace quotes and slashes with the non-ASCII characters ” and ⁄.</param>
        /// <returns>A string that can be used as a filename. If the output string would otherwise be empty, "_" is returned.</returns>
        public static string ToValidFilename(this string text, char? replacement = '_', bool fancyReplacements = false)
        {
            StringBuilder sb = new StringBuilder(text.Length);
            HashSet<char> invalids = InvalidFilenameChars;
            bool changed = false;
    
            for (int i = 0; i < text.Length; i++)
            {
                char c = text[i];
                if (invalids.Contains(c))
                {
                    changed = true;
                    char repl = replacement ?? '\0';
                    if (fancyReplacements)
                    {
                        if (c == '"') repl = '”'; // U+201D right double quotation mark
                        else if (c == '\'') repl = '’'; // U+2019 right single quotation mark
                        else if (c == '/') repl = '⁄'; // U+2044 fraction slash
                    }
                    if (repl != '\0')
                        sb.Append(repl);
                }
                else
                    sb.Append(c);
            }
    
            if (sb.Length == 0)
                return "_";
    
            return changed ? sb.ToString() : text;
        }
    
    
        /// <summary>
        /// Returns TRUE if the specified path is a valid, local filesystem path.
        /// </summary>
        /// <param name="pathString"></param>
        /// <returns></returns>
        public static bool IsValidLocalPath(this string pathString)
        {
            // From solution at https://stackoverflow.com/a/11636052/949129
            Uri pathUri;
            Boolean isValidUri = Uri.TryCreate(pathString, UriKind.Absolute, out pathUri);
            return isValidUri && pathUri != null && pathUri.IsLoopback;
        }
    }
    
    0 讨论(0)
  • 2020-11-22 17:38

    I created an extension method that combines several suggestions:

    1. Holding illegal characters in a hash set
    2. Filtering out characters below ascii 127. Since Path.GetInvalidFileNameChars does not include all invalid characters possible with ascii codes from 0 to 255. See here and MSDN
    3. Possiblity to define the replacement character

    Source:

    public static class FileNameCorrector
    {
        private static HashSet<char> invalid = new HashSet<char>(Path.GetInvalidFileNameChars());
    
        public static string ToValidFileName(this string name, char replacement = '\0')
        {
            var builder = new StringBuilder();
            foreach (var cur in name)
            {
                if (cur > 31 && cur < 128 && !invalid.Contains(cur))
                {
                    builder.Append(cur);
                }
                else if (replacement != '\0')
                {
                    builder.Append(replacement);
                }
            }
    
            return builder.ToString();
        }
    }
    
    0 讨论(0)
  • 2020-11-22 17:39

    I think the question already not full answered... The answers only describe clean filename OR path... not both. Here is my solution:

    private static string CleanPath(string path)
    {
        string regexSearch = new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars());
        Regex r = new Regex(string.Format("[{0}]", Regex.Escape(regexSearch)));
        List<string> split = path.Split('\\').ToList();
        string returnValue = split.Aggregate(string.Empty, (current, s) => current + (r.Replace(s, "") + @"\"));
        returnValue = returnValue.TrimEnd('\\');
        return returnValue;
    }
    
    0 讨论(0)
提交回复
热议问题