I can't remove whitespaces from a string parsed by Nokogiri

心已入冬 提交于 2019-12-04 13:13:26

问题


I can't remove whitespaces from a string.

My HTML is:

<p class='your-price'>
Cena pro Vás: <strong>139&nbsp;<small>Kč</small></strong>
</p>

My code is:

#encoding: utf-8
require 'rubygems'
require 'mechanize'

agent = Mechanize.new
site  = agent.get("http://www.astratex.cz/podlozky-pod-raminka/doplnky")
price = site.search("//p[@class='your-price']/strong/text()")

val = price.first.text  => "139 "
val.strip               => "139 "
val.gsub(" ", "")       => "139 "

gsub, strip, etc. don't work. Why, and how do I fix this?

val.class      => String
val.dump       => "\"139\\u{a0}\""      !
val.encoding   => #<Encoding:UTF-8>

__ENCODING__               => #<Encoding:UTF-8>
Encoding.default_external  => #<Encoding:UTF-8>

I'm using Ruby 1.9.3 so Unicode shouldn't be problem.


回答1:


strip only removes ASCII whitespace and the character you've got here is a Unicode no-break space.

Removing the character is easy. You can use gsub by providing a regex with the character code: gsub(/\u00a0/, '')

You could also call gsub(/[[:space:]]/, '') to remove all Unicode whitespace. For details, check the documentation



来源:https://stackoverflow.com/questions/14127484/i-cant-remove-whitespaces-from-a-string-parsed-by-nokogiri

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!