Querying html using Yahoo YQL

后端 未结 1 413
盖世英雄少女心
盖世英雄少女心 2020-12-28 09:37

While trying to parse html using Yahoo Query Language and xpath functionality provided by YQL, I ran into problems of not being able to extract “text()” or attribute values.

相关标签:
1条回答
  • 2020-12-28 10:25

    YQL requires the xpath expression to evaluate to an itemPath rather than node text. But once you have an itemPath you can project various values from the tree

    In other words an ItemPath should point to the Node in the resulting HTML rather than text content/attributes. YQL returns all matching nodes and their children when you select * from the data.

    example

    select * from html where url="http://stackoverflow.com" and xpath='//div/h3/a'
    

    This returns all the a's matching the xpath. Now to project the text content you can project it out using

    select content from html where url="http://stackoverflow.com" and xpath='//div/h3/a'
    

    "content" returns the text content held within the node.

    For projecting out attributes, you can specify it relative to the xpath expression. In this case, since you need the href which is relative to a.

    select href from html where url="http://stackoverflow.com" and xpath='//div/h3/a'
    

    this returns <results> <a href="/questions/663973/putting-a-background-pictures-with-leds"/> <a href="/questions/663013/advantages-and-disadvantages-of-popular-high-level-languages"/> .... </results>

    If you needed both the attribute 'href' and the textContent, then you can execute the following YQL query:

    select href, content from html where url="http://stackoverflow.com" and xpath='//div/h3/a'
    

    returns:

    <results> <a href="/questions/663950/double-pointer-const-issue-issue">double pointer const issue issue</a>... </results>
    

    Hope that helps. let me know if you have more questions on YQL.

    0 讨论(0)
提交回复
热议问题