How to tell if a string is xml?

后端未结

关注

 7  1592

We have a string field which can contain XML or plain text. The XML contains no header, and no root element, i.e. is not well formed.



We need t


                      
              相关标签:


      
      
        
          7条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  遥遥无期        
                
              
                            
                2021-01-11 11:11
              
            
            
                                                                       
If your goal is reliability then the best option is to use XmlDocument.LoadXml to determine if it's valid XML or not.  A full parse of the data may be expensive but it's the only way to reliably tell if it's valid XML or not.  Otherwise any character you don't examine in the buffer could cause the data to be illegal XML.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  春和景丽        
                
              
                            
                2021-01-11 11:14
              
            
            
                                                                       
One possibility is to mix both solutions. You can use your redact method and try to load it (inside the if). This way, you'll only try to load what is likely to be a well-formed xml, and discard most of the non-xml entries.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  我在风中等你        
                
              
                            
                2021-01-11 11:16
              
            
            
                                                                       
If the XML contains no root element (i.e. it's an XML fragment, not a full document), then the following would be perfectly valid sample, as well - but wouldn't match your detector:

foo<bar/>baz


In fact, any text string would be valid XML fragment (consider if the original XML document was just the root element wrapping some text, and you take the root element tags away)!
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  我寻月下人不归        
                
              
                            
                2021-01-11 11:18
              
            
            
                                                                       
Depends on how accurate a test you want.  Considering that you already don't have the official <xml, you're already trying to detect something that isn't XML.   Ideally you'd parse the text by a full XML parser (as you suggest LoadXML); anything it rejects isn't XML.  The question is, do you care if you accept a non-XML string?  For instance,
are you OK with accepting

  <the quick brown fox jumped over the lazy dog's back>


as XML and stripping it?   If so, your technique is fine.  If not, you have to decide how tight a test you want and code a recognizer with that degree of tightness.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  [愿得一人]        
                
              
                            
                2021-01-11 11:18
              
            
            
                                                                       
try
{
    XmlDocument myDoc = new XmlDocument();
    myDoc.LoadXml(myString);
}
catch(XmlException ex)
{
    //take care of the exception
}

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  梦如初夏        
                
              
                            
                2021-01-11 11:21
              
            
            
                                                                       
How is the data coming to you? What is the other type of data surrounding it? Perhaps there is a better way; perhaps you can tokenise the data you control, and then infer that anything that is not within those tokens is XML, but we'd need to know more.

Failing a cute solution like that, I think what you have is fine (for validating that it starts and ends with those characters).

We need to know more about the data format really.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     1
2
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复