发表新帖

发表新帖

Is it possible for Scrapy to get plain text from raw HTML data?

后端未结

关注

 3  814

悲&欢浪女 2021-02-12 17:27

For example:

scrapy shell http://scrapy.org/
content = hxs.select(\'//*[@id=\"content\"]\').extract()[0]
print content

Then, I get the followin

3条回答

忘掉有多难 (楼主)

2021-02-12 18:06
At this moment, I don't think you need to install any 3rd party library. scrapy provides this functionality using selectors:
Assume this complex selector:
```
sel = Selector(text='Click here to go to the Next Page')
```
we can get the entire text using:
```
text_content = sel.xpath("//a[1]//text()").extract()
# which results [u'Click here to go to the ', u'Next Page']
```
then you can join them together easily:
```
   ' '.join(text_content)
   # Click here to go to the Next Page
```
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...

热议问题