Html Agility Pack get specific content from a <li> tag

核能气质少年 提交于 2019-12-23 04:24:19

问题


I need some text from this website https://www.amazon.com/dp/B074J9SSPD, to be specific, I need to extract data under the "About the Product" section.

I tried

HtmlWeb web = new HtmlWeb();
HtmlDocument doc = new HtmlDocument();
doc = web.Load("https://amazon.com/dp/B074J9SSPD");

foreach(var node in doc.DocumentNode.SelectNodes("//li[@class='showHiddenFeatureBullets']") {
  string ar = node.InnerText;
  HtmlAttribute att = node.Attributes["class"];
  MessageBox.Show(ar.ToString());
  if (att.Value.Contains("showHiddenFeatureBulletsway,

  }
}

Plz suggest the right way , I'm getting blank string.


回答1:


Your original code (before that first edit) worked for me it just was missing the right parentheses on the foreach loop. I also broke out the nodes into it's own variable to make it easier to read but this should work for you. I tested it locally and it worked for me.

HtmlWeb web = new HtmlWeb();
HtmlDocument doc = new HtmlDocument();
doc = web.Load("https://amazon.com/dp/B074J9SSPD");

var aboutProductNodes = doc.DocumentNode.SelectNodes("//li[@class='showHiddenFeatureBullets']");

foreach (var node in aboutProductNodes)
{
    string ar = node.InnerText;
    HtmlAttribute att = node.Attributes["class"];
    MessageBox.Show(ar.ToString().Trim());
    if (att.Value.Contains("showHiddenFeatureBullets"))
    {

    }
}

However I would suggest looking into the amazon API. It worked about half the time and then the other half was Amazon replying to use their api and not web scrape them. So that might have been a part of your problem too.

https://developer.amazon.com/services-and-apis



来源:https://stackoverflow.com/questions/52670970/html-agility-pack-get-specific-content-from-a-li-tag

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!