nokogiri

How to parse XML with nokogiri without losing HTML entities?

左心房为你撑大大i 提交于 2020-01-04 13:46:55
问题 If you look at the output below in the after section ruby is removing all the html entities. How to parse XML with nokogiri without loosing HTML entities? --- BEFORE --- <blog:entryFull> <p><iframe src="http://w.soundcloud.com/player/?url=http%3A%2F%2Fapi.soundcloud.com%2Ftracks%2F39858946&amp;show_artwork=true" width="100%" height="166" frameborder="no" scrolling="no"></iframe></p></blog:entryFull> --- AFTER --- <blog:entryFull> piframe src="http://w.soundcloud.com/player/?url=http%3A%2F

Is there a way to escape non-alphanumeric characters in Nokogiri css?

倾然丶 夕夏残阳落幕 提交于 2020-01-04 09:12:32
问题 I have an anchor tag: file.html#stuff-morestuff-CHP-1-SECT-2.1 Trying to pull the referenced content in Nokogiri: documentFragment.at_css('#stuff-morestuff-CHP-1-SECT-2.1') fails with the error: unexpected '.1' after '[#<Nokogiri::CSS: :Node:0x007fd1a7df9b40 @type=:CONDITIONAL_SELECTOR, @value=[#<Nokogiri::CSS::Node:0x007fd1a7df9b90 @type=:ELEMENT_NAME, @value=["*"]>, #<Nokogiri::CSS::Node:0x007fd1a7df9cd0 @ type=:ID, @value=["#unixnut4-CHP-1-SECT-2" ]>]>]' (Nokogiri::CSS::SyntaxError) Just

Is there a way to escape non-alphanumeric characters in Nokogiri css?

六月ゝ 毕业季﹏ 提交于 2020-01-04 09:09:08
问题 I have an anchor tag: file.html#stuff-morestuff-CHP-1-SECT-2.1 Trying to pull the referenced content in Nokogiri: documentFragment.at_css('#stuff-morestuff-CHP-1-SECT-2.1') fails with the error: unexpected '.1' after '[#<Nokogiri::CSS: :Node:0x007fd1a7df9b40 @type=:CONDITIONAL_SELECTOR, @value=[#<Nokogiri::CSS::Node:0x007fd1a7df9b90 @type=:ELEMENT_NAME, @value=["*"]>, #<Nokogiri::CSS::Node:0x007fd1a7df9cd0 @ type=:ID, @value=["#unixnut4-CHP-1-SECT-2" ]>]>]' (Nokogiri::CSS::SyntaxError) Just

Nokogiri installation failes on Elastic Beanstalk

半世苍凉 提交于 2020-01-03 17:11:43
问题 Im trying to deploy my Rails application with AWS Elastic Beanstalk. I've created the instance and all but when I try to deploy the app using aws.push I get the following errors in the event log: 2014-09-22 01:23:40 UTC+0550 ERROR [Instance: i-744edb4a Module: AWSEBAutoScalingGroup ConfigSet: null] Command failed on instance. Return code: 1 Output: Error occurred during build: Command hooks failed . 2014-09-22 01:23:39 UTC+0550 ERROR Script /opt/elasticbeanstalk/hooks/appdeploy/pre/10_bundle

Nokogiri installation failes on Elastic Beanstalk

这一生的挚爱 提交于 2020-01-03 17:11:21
问题 Im trying to deploy my Rails application with AWS Elastic Beanstalk. I've created the instance and all but when I try to deploy the app using aws.push I get the following errors in the event log: 2014-09-22 01:23:40 UTC+0550 ERROR [Instance: i-744edb4a Module: AWSEBAutoScalingGroup ConfigSet: null] Command failed on instance. Return code: 1 Output: Error occurred during build: Command hooks failed . 2014-09-22 01:23:39 UTC+0550 ERROR Script /opt/elasticbeanstalk/hooks/appdeploy/pre/10_bundle

Is it possible to omit the processing instruction from an XML document using Nokogiri::XML::Builder [duplicate]

混江龙づ霸主 提交于 2020-01-02 21:11:06
问题 This question already has answers here : Closed 7 years ago . Possible Duplicate: Print an XML document without the XML header line at the top I'm trying to create a fragment of XML using the Nokogiri::XML::Builder but I can't find any documentation on how to exclude the processing instruction ( <?xml version=... ) Can anyone point me in the right direction? 回答1: Now I can answer: doc.to_xml :save_with => Nokogiri::XML::Node::SaveOptions::NO_DECLARATION 来源: https://stackoverflow.com/questions

how to use nokogiri methods .xpath & .at_xpath

非 Y 不嫁゛ 提交于 2020-01-02 04:47:06
问题 I'm learning how to use nokogiri and few questions came to me based on the code below require 'rubygems' require 'mechanize' post_agent = WWW::Mechanize.new post_page = post_agent.get('http://www.vbulletin.org/forum/showthread.php?t=230708') puts "\nabsolute path with tbody gives nil" puts post_page.parser.xpath('/html/body/div/div/div/div/div/table/tbody/tr/td/div[2]').xpath('text()').to_s.strip.inspect puts "\n.at_xpath gives an empty string" puts post_page.parser.at_xpath("//div[@id='posts

Convert HTML to plain text and maintain structure/formatting, with ruby

放肆的年华 提交于 2020-01-02 04:36:05
问题 I'd like to convert html to plain text. I don't want to just strip the tags though, I'd like to intelligently retain as much formatting as possible. Inserting line breaks for <br> tags, detecting paragraphs and formatting them as such, etc. The input is pretty simple, usually well-formatted html (not entire documents, just a bunch of content, usually with no anchors or images). I could put together a couple regexs that get me 80% there but figured there might be some existing solutions with

running nokogiri in Jruby vs. just ruby

只谈情不闲聊 提交于 2020-01-01 22:29:54
问题 I found startling difference in CPU and memory consumption usage. It seems garbage collection is not happening when i run the following nokogiri script require 'rubygems' require 'nokogiri' require 'open-uri' def getHeader() doz = Nokogiri::HTML(open('http://losangeles.craigslist.org/wst/reb/1484772751.html')) puts doz.xpath("html[1]\/body[1]\/h2[1]") end (1..10000).each do |a| getHeader() end when run in Jruby, CPU consumption is over 10, and memory consumption % rises with time(starts from

Using XPath with HTML or XML fragment?

若如初见. 提交于 2020-01-01 19:30:51
问题 I am new to Nokogiri and XPath, and I am trying to access all comments in a HTML or XML fragment. The XPaths .//comment() and //comment() work when I am not using the fragment function, but they do not find anything with a fragment. With a tag instead of a comment, it works with the first XPath. By trial and error, I realized that in this case comment() finds only top level comments and .//comment() and some others find only inner comments. Am I doing something wrong? What am I missing? Can