How to unescape a Java string literal in Java?

后端 未结 11 1807
庸人自扰
庸人自扰 2020-11-22 01:35

I\'m processing some Java source code using Java. I\'m extracting the string literals and feeding them to a function taking a String. The problem is that I need to pass the

相关标签:
11条回答
  • 2020-11-22 02:27

    Came across a similar problem, wasn't also satisfied with the presented solutions and implemented this one myself.

    Also available as a Gist on Github:

    /**
     * Unescapes a string that contains standard Java escape sequences.
     * <ul>
     * <li><strong>&#92;b &#92;f &#92;n &#92;r &#92;t &#92;" &#92;'</strong> :
     * BS, FF, NL, CR, TAB, double and single quote.</li>
     * <li><strong>&#92;X &#92;XX &#92;XXX</strong> : Octal character
     * specification (0 - 377, 0x00 - 0xFF).</li>
     * <li><strong>&#92;uXXXX</strong> : Hexadecimal based Unicode character.</li>
     * </ul>
     * 
     * @param st
     *            A string optionally containing standard java escape sequences.
     * @return The translated string.
     */
    public String unescapeJavaString(String st) {
    
        StringBuilder sb = new StringBuilder(st.length());
    
        for (int i = 0; i < st.length(); i++) {
            char ch = st.charAt(i);
            if (ch == '\\') {
                char nextChar = (i == st.length() - 1) ? '\\' : st
                        .charAt(i + 1);
                // Octal escape?
                if (nextChar >= '0' && nextChar <= '7') {
                    String code = "" + nextChar;
                    i++;
                    if ((i < st.length() - 1) && st.charAt(i + 1) >= '0'
                            && st.charAt(i + 1) <= '7') {
                        code += st.charAt(i + 1);
                        i++;
                        if ((i < st.length() - 1) && st.charAt(i + 1) >= '0'
                                && st.charAt(i + 1) <= '7') {
                            code += st.charAt(i + 1);
                            i++;
                        }
                    }
                    sb.append((char) Integer.parseInt(code, 8));
                    continue;
                }
                switch (nextChar) {
                case '\\':
                    ch = '\\';
                    break;
                case 'b':
                    ch = '\b';
                    break;
                case 'f':
                    ch = '\f';
                    break;
                case 'n':
                    ch = '\n';
                    break;
                case 'r':
                    ch = '\r';
                    break;
                case 't':
                    ch = '\t';
                    break;
                case '\"':
                    ch = '\"';
                    break;
                case '\'':
                    ch = '\'';
                    break;
                // Hex Unicode: u????
                case 'u':
                    if (i >= st.length() - 5) {
                        ch = 'u';
                        break;
                    }
                    int code = Integer.parseInt(
                            "" + st.charAt(i + 2) + st.charAt(i + 3)
                                    + st.charAt(i + 4) + st.charAt(i + 5), 16);
                    sb.append(Character.toChars(code));
                    i += 5;
                    continue;
                }
                i++;
            }
            sb.append(ch);
        }
        return sb.toString();
    }
    
    0 讨论(0)
  • 2020-11-22 02:28

    For the record, if you use Scala, you can do:

    StringContext.treatEscapes(escaped)
    
    0 讨论(0)
  • 2020-11-22 02:37

    I'm a little late on this, but I thought I'd provide my solution since I needed the same functionality. I decided to use the Java Compiler API which makes it slower, but makes the results accurate. Basically I live create a class then return the results. Here is the method:

    public static String[] unescapeJavaStrings(String... escaped) {
        //class name
        final String className = "Temp" + System.currentTimeMillis();
        //build the source
        final StringBuilder source = new StringBuilder(100 + escaped.length * 20).
                append("public class ").append(className).append("{\n").
                append("\tpublic static String[] getStrings() {\n").
                append("\t\treturn new String[] {\n");
        for (String string : escaped) {
            source.append("\t\t\t\"");
            //we escape non-escaped quotes here to be safe 
            //  (but something like \\" will fail, oh well for now)
            for (int i = 0; i < string.length(); i++) {
                char chr = string.charAt(i);
                if (chr == '"' && i > 0 && string.charAt(i - 1) != '\\') {
                    source.append('\\');
                }
                source.append(chr);
            }
            source.append("\",\n");
        }
        source.append("\t\t};\n\t}\n}\n");
        //obtain compiler
        final JavaCompiler compiler = ToolProvider.getSystemJavaCompiler();
        //local stream for output
        final ByteArrayOutputStream out = new ByteArrayOutputStream();
        //local stream for error
        ByteArrayOutputStream err = new ByteArrayOutputStream();
        //source file
        JavaFileObject sourceFile = new SimpleJavaFileObject(
                URI.create("string:///" + className + Kind.SOURCE.extension), Kind.SOURCE) {
            @Override
            public CharSequence getCharContent(boolean ignoreEncodingErrors) throws IOException {
                return source;
            }
        };
        //target file
        final JavaFileObject targetFile = new SimpleJavaFileObject(
                URI.create("string:///" + className + Kind.CLASS.extension), Kind.CLASS) {
            @Override
            public OutputStream openOutputStream() throws IOException {
                return out;
            }
        };
        //file manager proxy, with most parts delegated to the standard one 
        JavaFileManager fileManagerProxy = (JavaFileManager) Proxy.newProxyInstance(
                StringUtils.class.getClassLoader(), new Class[] { JavaFileManager.class },
                new InvocationHandler() {
                    //standard file manager to delegate to
                    private final JavaFileManager standard = 
                        compiler.getStandardFileManager(null, null, null); 
                    @Override
                    public Object invoke(Object proxy, Method method, Object[] args) throws Throwable {
                        if ("getJavaFileForOutput".equals(method.getName())) {
                            //return the target file when it's asking for output
                            return targetFile;
                        } else {
                            return method.invoke(standard, args);
                        }
                    }
                });
        //create the task
        CompilationTask task = compiler.getTask(new OutputStreamWriter(err), 
                fileManagerProxy, null, null, null, Collections.singleton(sourceFile));
        //call it
        if (!task.call()) {
            throw new RuntimeException("Compilation failed, output:\n" + 
                    new String(err.toByteArray()));
        }
        //get the result
        final byte[] bytes = out.toByteArray();
        //load class
        Class<?> clazz;
        try {
            //custom class loader for garbage collection
            clazz = new ClassLoader() { 
                protected Class<?> findClass(String name) throws ClassNotFoundException {
                    if (name.equals(className)) {
                        return defineClass(className, bytes, 0, bytes.length);
                    } else {
                        return super.findClass(name);
                    }
                }
            }.loadClass(className);
        } catch (ClassNotFoundException e) {
            throw new RuntimeException(e);
        }
        //reflectively call method
        try {
            return (String[]) clazz.getDeclaredMethod("getStrings").invoke(null);
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }
    

    It takes an array so you can unescape in batches. So the following simple test succeeds:

    public static void main(String[] meh) {
        if ("1\02\03\n".equals(unescapeJavaStrings("1\\02\\03\\n")[0])) {
            System.out.println("Success");
        } else {
            System.out.println("Failure");
        }
    }
    
    0 讨论(0)
  • 2020-11-22 02:37

    Java 13 added a method which does this: String#translateEscapes.

    It was a preview feature in Java 13 and 14, but was promoted to a full feature in Java 15.

    0 讨论(0)
  • 2020-11-22 02:37

    If you are reading unicode escaped chars from a file, then you will have a tough time doing that because the string will be read literally along with an escape for the back slash:

    my_file.txt

    Blah blah...
    Column delimiter=;
    Word delimiter=\u0020 #This is just unicode for whitespace
    
    .. more stuff
    

    Here, when you read line 3 from the file the string/line will have:

    "Word delimiter=\u0020 #This is just unicode for whitespace"
    

    and the char[] in the string will show:

    {...., '=', '\\', 'u', '0', '0', '2', '0', ' ', '#', 't', 'h', ...}
    

    Commons StringUnescape will not unescape this for you (I tried unescapeXml()). You'll have to do it manually as described here.

    So, the sub-string "\u0020" should become 1 single char '\u0020'

    But if you are using this "\u0020" to do String.split("... ..... ..", columnDelimiterReadFromFile) which is really using regex internally, it will work directly because the string read from file was escaped and is perfect to use in the regex pattern!! (Confused?)

    0 讨论(0)
提交回复
热议问题