“Unmappable character for encoding UTF-8” error

后端 未结 10 787
失恋的感觉
失恋的感觉 2020-11-27 03:21

I\'m getting a compile error at the following method.

public static boolean isValidPasswd(String passwd) {
    String reg = \"^(?=.*[0-9])(?=.*[a-z])(?=.*[A-         


        
相关标签:
10条回答
  • 2020-11-27 03:44

    I observed this issue while using Eclipse. I needed to add encoding in my pom.xml file and it resolved. http://ctrlaltsolve.blogspot.in/2015/11/encoding-properties-in-maven.html

    0 讨论(0)
  • 2020-11-27 03:48

    You have encoding problem with your sourcecode file. It is maybe ISO-8859-1 encoded, but the compiler was set to use UTF-8. This will results in errors when using characters, which will not have the same bytes representation in UTF-8 and ISO-8859-1. This will happen to all characters which are not part of ASCII, for example ¬ NOT SIGN.

    You can simulate this with the following program. It just uses your line of source code and generates a ISO-8859-1 byte array and decode this "wrong" with UTF-8 encoding. You can see at which position the line gets corrupted. I added 2 spaces at your source code to fit position 74 to fit this to ¬ NOT SIGN, which is the only character, which will generate different bytes in ISO-8859-1 encoding and UTF-8 encoding. I guess this will match indentation with the real source file.

     String reg = "      String reg = \"^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[~#;:?/@&!\"'%*=¬.,-])(?=[^\\s]+$).{8,24}$\";";
     String corrupt=new String(reg.getBytes("ISO-8859-1"),"UTF-8");
     System.out.println(corrupt+": "+corrupt.charAt(74));
     System.out.println(reg+": "+reg.charAt(74));     
    

    which results in the following output (messed up because of markup):

    String reg = "^(?=.[0-9])(?=.[a-z])(?=.[A-Z])(?=.[~#;:?/@&!"'%*=�.,-])(?=[^\s]+$).{8,24}$";: �

    String reg = "^(?=.[0-9])(?=.[a-z])(?=.[A-Z])(?=.[~#;:?/@&!"'%*=¬.,-])(?=[^\s]+$).{8,24}$";: ¬

    See "live" at https://ideone.com/ShZnB

    To fix this, save the source files with UTF-8 encoding.

    0 讨论(0)
  • 2020-11-27 03:52

    The compiler is using the UTF-8 character encoding to read your source file. But the file must have been written by an editor using a different encoding. Open your file in an editor set to the UTF-8 encoding, fix the quote mark, and save it again.

    Alternatively, you can find the Unicode point for the character and use a Unicode escape in the source code. For example, the character A can be replaced with the Unicode escape \u0041.

    By the way, you don't need to use the begin- and end-line anchors ^ and $ when using the matches() method. The entire sequence must be matched by the regular expression when using the matches() method. The anchors are only useful with the find() method.

    0 讨论(0)
  • 2020-11-27 03:53

    The Java compiler assumes that your input is UTF-8 encoded, either because you specified it to be or because it's your platform default encoding.

    However, the data in your .java files is not actually encoded in UTF-8. The problem is probably the ¬ character. Make sure your editor (or IDE) of choice actually safes its file in UTF-8 encoding.

    0 讨论(0)
提交回复
热议问题