Efficiently removing specific characters (some punctuation) from Strings in Java?

后端 未结 7 643
青春惊慌失措
青春惊慌失措 2020-12-10 17:14

In Java, what is the most efficient way of removing given characters from a String? Currently, I have this code:

private static String processWord(String x)          


        
7条回答
  •  醉梦人生
    2020-12-10 17:31

    Here's a late answer, just for fun.

    In cases like this, I would suggest aiming for readability over speed. Of course you can be super-readable but too slow, as in this super-concise version:

    private static String processWord(String x) {
        return x.replaceAll("[][(){},.;!?<>%]", "");
    }
    

    This is slow because everytime you call this method, the regex will be compiled. So you can pre-compile the regex.

    private static final Pattern UNDESIRABLES = Pattern.compile("[][(){},.;!?<>%]");
    
    private static String processWord(String x) {
        return UNDESIRABLES.matcher(x).replaceAll("");
    }
    

    This should be fast enough for most purposes, assuming the JVM's regex engine optimizes the character class lookup. This is the solution I would use, personally.

    Now without profiling, I wouldn't know whether you could do better by making your own character (actually codepoint) lookup table:

    private static final boolean[] CHARS_TO_KEEP = new boolean[];
    

    Fill this once and then iterate, making your resulting string. I'll leave the code to you. :)

    Again, I wouldn't dive into this kind of optimization. The code has become too hard to read. Is performance that much of a concern? Also remember that modern languages are JITted and after warming up they will perform better, so use a good profiler.

    One thing that should be mentioned is that the example in the original question is highly non-performant because you are creating a whole bunch of temporary strings! Unless a compiler optimizes all that away, that particular solution will perform the worst.

提交回复
热议问题