nokogiri | 易学教程

How to parse XML with nokogiri without losing HTML entities?

阅读更多关于 How to parse XML with nokogiri without losing HTML entities?

问题 If you look at the output below in the after section ruby is removing all the html entities. How to parse XML with nokogiri without loosing HTML entities? --- BEFORE --- <blog:entryFull> <p><iframe src="http://w.soundcloud.com/player/?url=http%3A%2F%2Fapi.soundcloud.com%2Ftracks%2F39858946&show_artwork=true" width="100%" height="166" frameborder="no" scrolling="no"></iframe></p></blog:entryFull> --- AFTER --- <blog:entryFull> piframe src="http://w.soundcloud.com/player/?url=http%3A%2F

Is there a way to escape non-alphanumeric characters in Nokogiri css?

阅读更多关于 Is there a way to escape non-alphanumeric characters in Nokogiri css?

问题 I have an anchor tag: file.html#stuff-morestuff-CHP-1-SECT-2.1 Trying to pull the referenced content in Nokogiri: documentFragment.at_css('#stuff-morestuff-CHP-1-SECT-2.1') fails with the error: unexpected '.1' after '[#<Nokogiri::CSS: :Node:0x007fd1a7df9b40 @type=:CONDITIONAL_SELECTOR, @value=[#<Nokogiri::CSS::Node:0x007fd1a7df9b90 @type=:ELEMENT_NAME, @value=["*"]>, #<Nokogiri::CSS::Node:0x007fd1a7df9cd0 @ type=:ID, @value=["#unixnut4-CHP-1-SECT-2" ]>]>]' (Nokogiri::CSS::SyntaxError) Just

Is there a way to escape non-alphanumeric characters in Nokogiri css?

阅读更多关于 Is there a way to escape non-alphanumeric characters in Nokogiri css?

Nokogiri installation failes on Elastic Beanstalk

阅读更多关于 Nokogiri installation failes on Elastic Beanstalk

问题 Im trying to deploy my Rails application with AWS Elastic Beanstalk. I've created the instance and all but when I try to deploy the app using aws.push I get the following errors in the event log: 2014-09-22 01:23:40 UTC+0550 ERROR [Instance: i-744edb4a Module: AWSEBAutoScalingGroup ConfigSet: null] Command failed on instance. Return code: 1 Output: Error occurred during build: Command hooks failed . 2014-09-22 01:23:39 UTC+0550 ERROR Script /opt/elasticbeanstalk/hooks/appdeploy/pre/10_bundle

Nokogiri installation failes on Elastic Beanstalk

阅读更多关于 Nokogiri installation failes on Elastic Beanstalk

Is it possible to omit the processing instruction from an XML document using Nokogiri::XML::Builder [duplicate]

阅读更多关于 Is it possible to omit the processing instruction from an XML document using Nokogiri::XML::Builder [duplicate]

问题 This question already has answers here : Closed 7 years ago . Possible Duplicate: Print an XML document without the XML header line at the top I'm trying to create a fragment of XML using the Nokogiri::XML::Builder but I can't find any documentation on how to exclude the processing instruction ( <?xml version=... ) Can anyone point me in the right direction? 回答1: Now I can answer: doc.to_xml :save_with => Nokogiri::XML::Node::SaveOptions::NO_DECLARATION 来源： https://stackoverflow.com/questions

how to use nokogiri methods .xpath & .at_xpath

阅读更多关于 how to use nokogiri methods .xpath & .at_xpath

问题 I'm learning how to use nokogiri and few questions came to me based on the code below require 'rubygems' require 'mechanize' post_agent = WWW::Mechanize.new post_page = post_agent.get('http://www.vbulletin.org/forum/showthread.php?t=230708') puts "\nabsolute path with tbody gives nil" puts post_page.parser.xpath('/html/body/div/div/div/div/div/table/tbody/tr/td/div[2]').xpath('text()').to_s.strip.inspect puts "\n.at_xpath gives an empty string" puts post_page.parser.at_xpath("//div[@id='posts

Convert HTML to plain text and maintain structure/formatting, with ruby

阅读更多关于 Convert HTML to plain text and maintain structure/formatting, with ruby

问题 I'd like to convert html to plain text. I don't want to just strip the tags though, I'd like to intelligently retain as much formatting as possible. Inserting line breaks for <br> tags, detecting paragraphs and formatting them as such, etc. The input is pretty simple, usually well-formatted html (not entire documents, just a bunch of content, usually with no anchors or images). I could put together a couple regexs that get me 80% there but figured there might be some existing solutions with

running nokogiri in Jruby vs. just ruby

阅读更多关于 running nokogiri in Jruby vs. just ruby

问题 I found startling difference in CPU and memory consumption usage. It seems garbage collection is not happening when i run the following nokogiri script require 'rubygems' require 'nokogiri' require 'open-uri' def getHeader() doz = Nokogiri::HTML(open('http://losangeles.craigslist.org/wst/reb/1484772751.html')) puts doz.xpath("html[1]\/body[1]\/h2[1]") end (1..10000).each do |a| getHeader() end when run in Jruby, CPU consumption is over 10, and memory consumption % rises with time(starts from

Using XPath with HTML or XML fragment?

阅读更多关于 Using XPath with HTML or XML fragment?

问题 I am new to Nokogiri and XPath, and I am trying to access all comments in a HTML or XML fragment. The XPaths .//comment() and //comment() work when I am not using the fragment function, but they do not find anything with a fragment. With a tag instead of a comment, it works with the first XPath. By trial and error, I realized that in this case comment() finds only top level comments and .//comment() and some others find only inner comments. Am I doing something wrong? What am I missing? Can