In Java, what is the most efficient way of removing given characters from a String? Currently, I have this code:
private static String processWord(String x)
Here's a late answer, just for fun.
In cases like this, I would suggest aiming for readability over speed. Of course you can be super-readable but too slow, as in this super-concise version:
private static String processWord(String x) {
return x.replaceAll("[][(){},.;!?<>%]", "");
}
This is slow because everytime you call this method, the regex will be compiled. So you can pre-compile the regex.
private static final Pattern UNDESIRABLES = Pattern.compile("[][(){},.;!?<>%]");
private static String processWord(String x) {
return UNDESIRABLES.matcher(x).replaceAll("");
}
This should be fast enough for most purposes, assuming the JVM's regex engine optimizes the character class lookup. This is the solution I would use, personally.
Now without profiling, I wouldn't know whether you could do better by making your own character (actually codepoint) lookup table:
private static final boolean[] CHARS_TO_KEEP = new boolean[];
Fill this once and then iterate, making your resulting string. I'll leave the code to you. :)
Again, I wouldn't dive into this kind of optimization. The code has become too hard to read. Is performance that much of a concern? Also remember that modern languages are JITted and after warming up they will perform better, so use a good profiler.
One thing that should be mentioned is that the example in the original question is highly non-performant because you are creating a whole bunch of temporary strings! Unless a compiler optimizes all that away, that particular solution will perform the worst.