Parse image src with HTML Agilty Pack

烈酒焚心 提交于 2019-12-12 19:07:33

问题


Hi so i am trying to parse a webpage with HTML Agilty Pack to get the src of an image. This is the structure of the page.

<div class="post_body"> 
    <div style="text-align: center;"> 
        <a href="http://www.engadget.com/2012/02/29/qualcomm-windows-8/">
            <img src="http://www.blogcdn.com/www.engadget.com/media/2012/02/201202297192-1330536971.jpg" style="border-width: 0px; border-style: solid; margin: 4px;">
        </a>
    </div>
<div>

Now I am using this code to attempt to get the src

HtmlWeb hw = new HtmlWeb();
            HtmlDocument doc = hw.Load("http://www.engadget.com/2012/02/29/qualcomm-windows-8");

            HtmlNode baseNode = doc.DocumentNode.SelectSingleNode("//div[@class='post_content permalink ']");
            string Description = baseNode.SelectSingleNode("//div[@class='post_body']").InnerText.Replace("\n", "").Replace("\r", "").Trim();

            string href = baseNode.SelectSingleNode("//div[@class='post_body']//img[@src]").InnerText;

However the string is always being returned null :/

Any ideas maybe i have a bad xpath expression?


回答1:


Any ideas maybe i have a bad xpath expression?

Yes, there are a few problems:

//div[@class='post_content permalink ']

This selects nothing, because in the provided document there isnt a div with class attribute, whose value is 'post_content permalink '

SelectSingleNode("//div[@class='post_body']//img[@src]").InnerText;  

The img element, even if such is found, has no children -- thus no innerText.

Solution:

You want something like this:

HtmlNode  img = doc.DocumentNode.SelectSingleNode(//div[@class='post_body']//img[@src])

String srcUrl = img.Attributes["src"].Value;


来源:https://stackoverflow.com/questions/9506588/parse-image-src-with-html-agilty-pack

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!