Is it possible for Scrapy to get plain text from raw HTML data?

后端 未结 3 814
悲&欢浪女
悲&欢浪女 2021-02-12 17:27

For example:

scrapy shell http://scrapy.org/
content = hxs.select(\'//*[@id=\"content\"]\').extract()[0]
print content

Then, I get the followin

3条回答
  •  忘掉有多难
    2021-02-12 18:06

    At this moment, I don't think you need to install any 3rd party library. scrapy provides this functionality using selectors:
    Assume this complex selector:

    sel = Selector(text='Click here to go to the Next Page')
    

    we can get the entire text using:

    text_content = sel.xpath("//a[1]//text()").extract()
    # which results [u'Click here to go to the ', u'Next Page']
    

    then you can join them together easily:

       ' '.join(text_content)
       # Click here to go to the Next Page
    

提交回复
热议问题