I know that / is illegal in Linux, and the following are illegal in Windows
(I think) *
.
\"
/
\\
[
Difficulties with defining, what's legal and not were already adressed and whitelists were suggested. But Windows supports more-than-8-bit characters. Wikipedia states, that (for example) the
modifier letter colon [(See 7. below) is] sometimes used in Windows filenames as it is identical to the colon in the Segoe UI font used for filenames. The [inherited ASCII] colon itself is not permitted.
Therefore, I want to present a much more liberal approach using Unicode characters to replace the "illegal" ones. I found the result in my comparable use-case by far more readable. Look for example into this block. Plus you can even restore the original content from that. Possible choices and research are provided in the following list:
U+002A * ASTERISK
), you can use one of the many listed, for example U+2217 ∗ (ASTERISK OPERATOR)
or the Full Width Asterisk U+FF0A *
⋅ U+22C5 dot operator
“ U+201C english leftdoublequotemark
(Alternatives see here)/ SOLIDUS U+002F
), you can use ∕ DIVISION SLASH U+2215
(others here)\ U+005C Reverse solidus
), you can use ⧵ U+29F5 Reverse solidus operator
(more)U+005B Left square bracket
) and ] (U+005D Right square bracket
), you can use for example U+FF3B[ FULLWIDTH LEFT SQUARE BRACKET
and U+FF3D ]FULLWIDTH RIGHT SQUARE BRACKET
(from here, more possibilities here)U+2236 ∶ RATIO (for mathematical usage)
or U+A789 ꞉ MODIFIER LETTER COLON
, (see colon (letter), sometimes used in Windows filenames as it is identical to the colon in the Segoe UI font used for filenames. The colon itself is not permitted) (See here)U+037E ; GREEK QUESTION MARK
(see here)U+0964 । DEVANAGARI DANDA
, U+2223 ∣ DIVIDES
or U+01C0 ǀ LATIN LETTER DENTAL CLICK
(Wikipedia). Also the box drawing characters contain various other options. , U+002C COMMA
), you can use for example ‚ U+201A SINGLE LOW-9 QUOTATION MARK
(see here)U+003F ? QUESTION MARK
), these are good candidates: U+FF1F ? FULLWIDTH QUESTION MARK
or U+FE56 ﹖ SMALL QUESTION MARK
(from here, two more from Dingbats Block, search for "question")In Windows 10 (2019), the following characters are forbidden by an error when you try to type them:
A file name can't contain any of the following characters:
\ / : * ? " < > |
In Unix shells, you can quote almost every character in single quotes '
. Except the single quote itself, and you can't express control characters, because \
is not expanded. Accessing the single quote itself from within a quoted string is possible, because you can concatenate strings with single and double quotes, like 'I'"'"'m'
which can be used to access a file called "I'm"
(double quote also possible here).
So you should avoid all control characters, because they are too difficult to enter in the shell. The rest still is funny, especially files starting with a dash, because most commands read those as options unless you have two dashes --
before, or you specify them with ./
, which also hides the starting -
.
If you want to be nice, don't use any of the characters the shell and typical commands use as syntactical elements, sometimes position dependent, so e.g. you can still use -
, but not as first character; same with .
, you can use it as first character only when you mean it ("hidden file"). When you are mean, your file names are VT100 escape sequences ;-), so that an ls garbles the output.
Instead of creating a blacklist of characters, you could use a whitelist. All things considered, the range of characters that make sense in a file or directory name context is quite short, and unless you have some very specific naming requirements your users will not hold it against your application if they cannot use the whole ASCII table.
It does not solve the problem of reserved names in the target file system, but with a whitelist it is easier to mitigate the risks at the source.
In that spirit, this is a range of characters that can be considered safe:
And any additional safe characters you wish to allow. Beyond this, you just have to enforce some additional rules regarding spaces and dots. This is usually sufficient:
This already allows quite complex and nonsensical names. For example, these names would be possible with these rules, and be valid file names in Windows/Linux:
A...........ext
B -.- .ext
In essence, even with so few whitelisted characters you should still decide what actually makes sense, and validate/adjust the name accordingly. In one of my applications, I used the same rules as above but stripped any duplicate dots and spaces.
I had the same need and was looking for recommendation or standard references and came across this thread. My current blacklist of characters that should be avoided in file and directory names are:
$CharactersInvalidForFileName = {
"pound" -> "#",
"left angle bracket" -> "<",
"dollar sign" -> "$",
"plus sign" -> "+",
"percent" -> "%",
"right angle bracket" -> ">",
"exclamation point" -> "!",
"backtick" -> "`",
"ampersand" -> "&",
"asterisk" -> "*",
"single quotes" -> "“",
"pipe" -> "|",
"left bracket" -> "{",
"question mark" -> "?",
"double quotes" -> "”",
"equal sign" -> "=",
"right bracket" -> "}",
"forward slash" -> "/",
"colon" -> ":",
"back slash" -> "\\",
"lank spaces" -> "b",
"at sign" -> "@"
};
For Windows you can check it using PowerShell
$PathInvalidChars = [System.IO.Path]::GetInvalidPathChars() #36 chars
To display UTF-8 codes you can convert
$enc = [system.Text.Encoding]::UTF8
$PathInvalidChars | foreach { $enc.GetBytes($_) }
$FileNameInvalidChars = [System.IO.Path]::GetInvalidFileNameChars() #41 chars
$FileOnlyInvalidChars = @(':', '*', '?', '\', '/') #5 chars - as a difference