问题
I am trying to perform some image scrapping tool which enables the user to scrap all the images contained within a given page using xpath process the scrapped images to find which have an alt tags and which doesn't and return the result as two separate json objects
i.e. {alted:["",""],nonAlted:["",""]}
now comes my problem, although i am able to scrap the page and retrieve all the images and separate them to the alted and nonAlted categories i can't put them in the response object !
I think to further clarify my issue it would be better to add some code, so the following code is what i use in the execute block of my YQL table:
query = "select * from html where url='http://www.mysite.com/page-path' and xpath='//li'";
var result = y.query(query);
y.log(result.results..img.(@alt));
var querieselement = <urls/>;
querieselement.query = result.results..img.(@alt);
response.object = querieselement;
So my question is how can i set the response object to contain the processed list of the images, note that after running the query the result doesn't show any data although the log is showing the list, hope someone can point me to the cause of that problem.
P.S. The reason i mentioned "resources usage" in the title is that because i am aware of the ability to perform to separate calls for each images category which means scrapping the same page two times which i think is kind of inefficient.
P.S. i would also be glad if someone can help me understand what is the meaning of those two lines
querieselement = <urls/>;
querieselement.query = result.results..img.(@alt);
why "<urls/>" and why "querieselement.query", i don't know what they are supposed to do while they seem to be doing critical job as changing them breaks the code.
Thanks.
回答1:
So my question is how can i set the response object to contain the processed list of the images
Use a stylesheet rather than an XPath selector:
select * from xslt where url="http://www.mysite.com/page-path" and stylesheet="http://www.mysite.com/page-path.xsl"
Define the stylesheet as such:
<xsl:template match="img[@alt]">
<xsl:for-each select="@alt">
<script>
alt.push(<xsl:value-of select="."/>);
</script>
</xsl:for-each>
</xsl:template>
<xsl:template match="img[not(@alt)]">
<xsl:for-each select="@src">
<script>
noalt.push(<xsl:value-of select="."/>);
</script>
</xsl:for-each>
</xsl:template>
来源:https://stackoverflow.com/questions/13461474/performing-image-scrapping-using-yql-with-lowest-resources-usage-possible-i-e-l