SAXParser fails when responce contains Hindi or other special characters

前端 未结 5 719
遥遥无期
遥遥无期 2021-01-21 05:57

I am using SAX parser to parse a XML response but it throws an exception.

ExpatParser$ParseException : (not well formed) invalid token

Is there any

相关标签:
5条回答
  • 2021-01-21 06:36

    First Answer

    The ampersand character (&) and the left angle bracket (<) MUST NOT appear in your xml output in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. If they are needed elsewhere, they must be escaped using either numeric character references or the strings " & " and "< " respectively.

    The right angle bracket (>) may be represented using the string " &gt; ", and MUST, for compatibility, be escaped using either " &gt; " or a character reference when it appears in the string " ]]> " in content, when that string is not marking the end of a CDATA section.

    Please check your xml seems that it comes the these special characters(&,<,>)

    After discussion with Vaibhav Jani

    Here is the sample xml file

    <?xml version="1.0"?>
    <first_screen>
       <first_screen_object id="1">
          <name><![CDATA[मानक हिन्दी]]></name>
          <desc><![CDATA[मानक हिन्दीमानक हिन्दी मानक हिन्दी मानक हिन्दी मानक हिन्दी मानक हिन्दी मानक हिन्दी  मानक हिन्दी मानक हिन्दी मानक हिन्दी मानक हिन्दी]]></desc>
           </first_screen_object>
    
           <first_screen_object id="2">
          <name><![CDATA[मानक हिन्दी]]></name>
          <desc><![CDATA[मानक हिन्दीमानक हिन्दी मानक हिन्दी मानक हिन्दी मानक हिन्दी मानक हिन्दी मानक हिन्दी  मानक हिन्दी मानक हिन्दी मानक हिन्दी मानक हिन्दी]]></desc>
             </first_screen_object>
    
    
           <first_screen_object id="3">
          <name><![CDATA[मानक हिन्दी]]></name>
          <desc><![CDATA[मानक हिन्दीमानक हिन्दी मानक हिन्दी मानक हिन्दी मानक हिन्दी मानक हिन्दी मानक हिन्दी  मानक हिन्दी मानक हिन्दी मानक हिन्दी मानक हिन्दी]]></desc>
            </first_screen_object>
    
           </first_screen>
    

    And this the SAX parser for the sample XML

    import java.io.InputStream;
    import org.apache.http.HttpResponse;
    import org.apache.http.client.HttpClient;
    import org.apache.http.client.methods.HttpGet;
    import org.apache.http.impl.client.DefaultHttpClient;
    import android.sax.Element;
    import android.sax.EndTextElementListener;
    import android.sax.RootElement;
    import android.util.Xml;
    
    public class HindiParser {
    
        // Constructor
        public HindiParser() {
    
        }
    
        public static InputStream getInputStreamFromUrl(String url) {
            InputStream content = null;
            try {
                HttpGet httpGet = new HttpGet(url);
                HttpClient httpclient = new DefaultHttpClient();
                // Execute HTTP Get Request
                HttpResponse response = httpclient.execute(httpGet);
                content = response.getEntity().getContent();
            } catch (Exception e) {
                // handle the exception !
            }
            return content;
        }
    
        /*
         * <?xml version="1.0"?> <first_screen> <first_screen_object id="1">
         * <name><![CDATA[मानक हिन्दी]]></name> <desc><![CDATA[मानक हिन्दीमानक
         * हिन्दी मानक हिन्दी मानक हिन्दी मानक हिन्दी मानक हिन्दी मानक हिन्दी मानक
         * हिन्दी मानक हिन्दी मानक हिन्दी मानक हिन्दी]]></desc>
         * </first_screen_object>
         * 
         * <first_screen_object id="2"> <name><![CDATA[मानक हिन्दी]]></name>
         * <desc><![CDATA[मानक हिन्दीमानक हिन्दी मानक हिन्दी मानक हिन्दी मानक हिन्दी
         * मानक हिन्दी मानक हिन्दी मानक हिन्दी मानक हिन्दी मानक हिन्दी मानक
         * हिन्दी]]></desc> </first_screen_object> </first_screen_object>
         * 
         * 
         * <first_screen_object id="3"> <name><![CDATA[मानक हिन्दी]]></name>
         * <desc><![CDATA[मानक हिन्दीमानक हिन्दी मानक हिन्दी मानक हिन्दी मानक हिन्दी
         * मानक हिन्दी मानक हिन्दी मानक हिन्दी मानक हिन्दी मानक हिन्दी मानक
         * हिन्दी]]></desc> </first_screen_object>
         * 
         * </first_screen>
         */
    
        public void parse() {
            try {
                RootElement root = new RootElement("first_screen");
                Element firstScreenElemnet = root.getChild("first_screen_object");
                firstScreenElemnet.getChild("name").setEndTextElementListener(
                        new EndTextElementListener() {
                            public void end(String body) {
                                System.out.println("Name is " + body);
                            }
                        });
                firstScreenElemnet.getChild("desc").setEndTextElementListener(
                        new EndTextElementListener() {
                            public void end(String body) {
                                System.out.println("Description  is " + body);
                            }
                        });
    
                try {
                    Xml.parse(
                            getInputStreamFromUrl("http://pastebin.com/raw.php?i=M6zrbJ0W"),
                            Xml.Encoding.UTF_8, root.getContentHandler());
                } catch (Exception e) {
                    e.printStackTrace();
                }
    
            } catch (Exception e) {
                e.printStackTrace();
            }
    
        }
    
    }
    
    0 讨论(0)
  • 2021-01-21 06:44

    Try with android.util.Xml.parse()
    First argument InputStream => HttpResponse.getEntity().getContent()
    Second argument Xml.Encoding => Xml.Encoding.UTF_8
    Last argument ContentHandler => your handler

    0 讨论(0)
  • 2021-01-21 06:48

    what encoding are you using?

    if you are using ISO-8859-1, try using UTF-8

    <?xml version="1.0" encoding="UTF-8"?>
    
    0 讨论(0)
  • 2021-01-21 06:49

    I'm not entirely sure that it will solve your problem but I'd set the charset on the InputSource using its setEncoding() method.

    InputSource inputSource = new InputSource(byteArrayInputStream);
    inputSource.setEncoding("UTF-8");
    
    xr.parse(inputSource);
    
    0 讨论(0)
  • 2021-01-21 06:49

    This should solve the problem:

    InputSource inputSource = new InputSource(is);
    inputSource.setEncoding("ISO-8859-1");
    
    0 讨论(0)
提交回复
热议问题