I\'m making a cross-platform application that renames files based on data retrieved online. I\'d like to sanitize the Strings I took from a web API for the current platform.
As suggested elsewhere, this is not usually what you want to do. It is usually best to create a temporary file using a secure method such as File.createTempFile().
You should not do this with a whitelist and only keep 'good' characters. If the file is made up of only Chinese characters then you will strip everything out of it. We can't use a whitelist for this reason, we have to use a blacklist.
Linux pretty much allows anything which can be a real pain. I would just limit Linux to the same list that you limit Windows to so you save yourself headaches in the future.
Using this C# snippet on Windows I produced a list of characters that are not valid on Windows. There are quite a few more characters in this list than you may think (41) so I wouldn't recommend trying to create your own list.
foreach (char c in new string(Path.GetInvalidFileNameChars()))
{
Console.Write((int)c);
Console.Write(",");
}
Here is a simple Java class which 'cleans' a file name.
public class FileNameCleaner {
final static int[] illegalChars = {34, 60, 62, 124, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 58, 42, 63, 92, 47};
static {
Arrays.sort(illegalChars);
}
public static String cleanFileName(String badFileName) {
StringBuilder cleanName = new StringBuilder();
for (int i = 0; i < badFileName.length(); i++) {
int c = (int)badFileName.charAt(i);
if (Arrays.binarySearch(illegalChars, c) < 0) {
cleanName.append((char)c);
}
}
return cleanName.toString();
}
}
EDIT: As Stephen suggested you probably also should verify that these file accesses only occur within the directory you allow.
The following answer has sample code for establishing a custom security context in Java and then executing code in that 'sandbox'.
How do you create a secure JEXL (scripting) sandbox?