Java, Using Scanner to input characters as UTF-8, can't print text

后端未结

关注

 2  1782

I can convert String to Array as UTF-8, but I can\'t convert it back to String like the first String.

public static void main(String[] args) {

    Scanner


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  忘了有多久        
                
              
                            
                2021-01-19 22:28
              
            
            
                                                                       
There are several problems with the provided code:


You are not ensuring that you are getting the UTF-8 byte array from that String.

byte[] theByteArray = stringToConvert.getBytes();


returns a byte array with the default encoding on the given platform, as described by the JavaDoc. What you actually want to do is the following:

byte[] theByteArray = stringToConvert.getBytes("UTF-8");

You should check the documentation for System.out.println():

System.out.println(theByteArray);


is calling System.out.println(Object x), which will print the results of x.toString(). By default, toString() returns the memory address of the given object.

So when you see output of the form:


  INPUT :
  
  [B@5f1121f6
  
  inputText


What you are seeing is the memory location of theByteArray and then the given input line of text.
You seem to not understand the 'x.toString()' method. Remember, Strings in Java are immutable; None of String's methods will alter the String. theByteArray.toString(); returns a string representation of theByteArray;. The returned value is thrown out unless you give the value to another String

String arrayAsString = theByteArray.toString();


However, as previously described, the returned String will be the memory location of theByteArray. In order to print out the contents of theByteArray, you will need to convert it to a String

String convertedString = new String(theByteArray, Charset.forName("UTF-8"));



Assuming your requirements are to print the converted String and then print the original String, your code should look something like this:

public static void main(String[] args) {

    Scanner h = new Scanner(System.in);
    System.out.println("INPUT : ");
    String stringToConvert = h.nextLine();

    try {
        // Array of the UTF-8 representation of the given String
        byte[] theByteArray;
        theByteArray = stringToConvert.getBytes("UTF-8");

        // The converted String
        System.out.println(new String(theByteArray, Charset.forName("UTF-8")));
    } catch (UnsupportedEncodingException e) {
        // We may provide an invalid character set
        e.printStackTrace();
    }

    // The original String
    System.out.println(stringToConvert);
}

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  终归单人心        
                
              
                            
                2021-01-19 22:29
              
            
            
                                                                       
String s = new String(theByteArray);


should really be

String s = new String(theByteArray, Charset.forName("UTF-8"));


The underlying issue here is that String constructors aren't smart.  The String constructor cannot distinguish the charset that is being used and will try to convert it using the system standard which is generally something like ASCII or ISO-8859-1.  This is why normal A-Za-z looks proper but then everything else begins to fail.

byte is a type that runs from -127 to 127 thus for UTF-8 conversion consecutive bytes need to be concatenated.  It's impossible for the String constructor to distinguish this off a byte array so it will handle each byte individually by default (thus why basic alphanumeric will always work as they fall into this range).

Example:

String text = "こんにちは";
byte[] array = text.getBytes("UTF-8");
String s = new String(array, Charset.forName("UTF-8"));
System.out.println(s); // Prints as expected
String sISO = new String(array, Charset.forName("ISO-8859-1")); // Prints 'ããã«ã¡ã¯'
System.out.println(sISO);

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复