Strip style attributes with nokogiri

前端未结

关注

 3  2083

I\'m scrapling an html page with nokogiri and i want to strip out all style attributes.
How can I achieve this? (i\'m not using rails so i can\'t use it\'s sanitize meth

相关标签:

3条回答

误落风尘

2020-12-14 03:58
I tried the answer from Phrogz but could not get it to work (I was using a document fragment though but I'd have thought it should work the same?).

The "//" at the start didn't seem to be checking all nodes as I would expect. In the end I did something a bit more long winded but it worked, so here for the record in case anyone else has the same trouble is my solution (dirty though it is):
```
doc = Nokogiri::HTML::Document.new
body_dom = doc.fragment( my_html )

# strip out any attributes we don't want
body_dom.xpath( './/*[@align]|*[@align]' ).each do |tag|
    tag.attributes["align"].remove
end
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
慢半拍i

2020-12-14 04:08
```
require 'nokogiri'

html = 'bla bla'
doc = Nokogiri::HTML(html)
doc.xpath('//@style').remove
puts doc.css('.post')
#=> bla bla
```
Edited to show that you can just call NodeSet#remove instead of having to use .each(&:remove).

Note that if you have a DocumentFragment instead of a Document, Nokogiri has a longstanding bug where searching from a fragment does not work as you would expect. The workaround is to use:
```
doc.xpath('@style|.//@style').remove
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
深忆病人

2020-12-14 04:15
This works with both a document and a document fragment:
```
doc = Nokogiri::HTML::DocumentFragment.parse(...)
```
or
```
doc = Nokogiri::HTML(...)
```
To delete all the 'style' attributes, you can do a
```
doc.css('*').remove_attr('style')
```
0 讨论(0)
发布评论:

提交评论
- 加载中...