HTML to List using XMLWorker

前端 未结 1 726
栀梦
栀梦 2020-11-29 13:20

Could somebody please provide an example of parsing HTML into a list of elements using XMLWorkerHelper in iTextSharp (C#).

The JAVA version as given in the documenta

相关标签:
1条回答
  • 2020-11-29 14:12

    You need to implement the IElementHandler interface in a class of your own:

    public class SampleHandler : IElementHandler {
        //Generic list of elements
        public List<IElement> elements = new List<IElement>();
        //Add the supplied item to the list
        public void Add(IWritable w) {
            if (w is WritableElement) {
                elements.AddRange(((WritableElement)w).Elements());
            }
        }
    }
    

    Instead of using the file stream here's an example parsing a string. To use a file replace the StringReader with a StreamReader.

        string html = "<html><head><title>Test Document</title></head><body><p>This is a test. <strong>Bold <em>and italic</em></strong></p><ol><li>Dog</li><li>Cat</li></ol></body></html>";
        //Instantiate our handler
        var mh = new SampleHandler();
        //Bind a reader to our text
        using (TextReader sr = new StringReader(html)) {
            //Parse
            XMLWorkerHelper.GetInstance().ParseXHtml(mh, sr);
        }
    
        //Loop through each element
        foreach (var element in mh.elements) {
            //Loop through each chunk in each element
            foreach (var chunk in element.Chunks) {
                //Do something
            }
        }
    
    0 讨论(0)
提交回复
热议问题