nokogiri | 易学教程

How to get text after or before certain tags using Nokogiri

阅读更多关于 How to get text after or before certain tags using Nokogiri

问题 I have an HTML document, something like this: <root><template>title</template> <h level="3" i="3">Something</h> <template element="1"><title>test</title></template> # one # two # three # four <h level="4" i="5">something1</h> some random test <template element="1"><title>test</title></template> # first # second # third # fourth <template element="2"><title>testing</title></template> I want to extract: # one # two # three # four # first # second # third # fourth </root> In other words, I want

How to navigate the DOM using Nokogiri

阅读更多关于 How to navigate the DOM using Nokogiri

问题 I'm trying to fill the variables parent_element_h1 and parent_element_h2 . Can anyone help me use Nokogiri to get the information I need into those variables? require 'rubygems' require 'nokogiri' value = Nokogiri::HTML.parse(<<-HTML_END) "<html> <body> <p id='para-1'>A</p> <div class='block' id='X1'> <h1>Foo</h1> <p id='para-2'>B</p> </div> <p id='para-3'>C</p> <h2>Bar</h2> <p id='para-4'>D</p> <p id='para-5'>E</p> <div class='block' id='X2'> <p id='para-6'>F</p> </div> </body> </html>" HTML

Transform XML with XSLT and preserve CDATA (in Ruby)

阅读更多关于 Transform XML with XSLT and preserve CDATA (in Ruby)

问题 I am trying to convert a document with content like the following into another document, leaving the CDATA exactly as it was in the first document, but I haven't figured out how to preserve the CDATA with XSLT. Initial XML: <node> <subNode> <![CDATA[ HI THERE ]]> </subNode> <subNode> <![CDATA[ SOME TEXT ]]> </subNode> </node> Final XML: <newDoc> <data> <text> <![CDATA[ HI THERE ]]> </text> <text> <![CDATA[ SOME TEXT ]]> </text> </data> </newDoc> I've tried something like this, but no luck,

Can't install Nokogiri for Ruby in Windows

阅读更多关于 Can't install Nokogiri for Ruby in Windows

问题 I know this is simple but I just can't figure it out. I need to run a script in Ruby and it requires Nokogiri. I do have some experience in other languages but not in Ruby. Here is my system : Ruby 2.0.0-p195 (x64) is installed @ C:\Programs\RubyLanguage Ruby Development Kit (mingw64-64-4.7.2-20130224-1432) is installed @ C:\Programs\RubyDevKit When I run gem install nokogiri I get this error: ERROR: Error installing nokogiri: The 'nokogiri' native gem requires installed build tools. Please

ERROR: While executing gem … (TypeError) incompatible marshal file format (can't be read)

阅读更多关于 ERROR: While executing gem … (TypeError) incompatible marshal file format (can't be read)

问题 I encountered this issue when I run bundle install with Ruby version 2.4.4 and macOS Mojave: Fetching nokogiri 1.8.5 Installing nokogiri 1.8.5 with native extensions Gem::Ext::BuildError: ERROR: Failed to build gem native extension. ERROR: cannot discover where libxml2 is located on your system. please make sure `pkg-config` is installed. So I ran xcode-select --install But then when I run gem install nokogiri I got the following output: ERROR: While executing gem ... (TypeError) incompatible

ERROR: While executing gem … (TypeError) incompatible marshal file format (can't be read)

阅读更多关于 ERROR: While executing gem … (TypeError) incompatible marshal file format (can't be read)

How do I do a regex search in Nokogiri for text that matches a certain beginning?

阅读更多关于 How do I do a regex search in Nokogiri for text that matches a certain beginning?

问题 Given: require 'rubygems' require 'nokogiri' value = Nokogiri::HTML.parse(<<-HTML_END) "<html> <body> <p id='para-1'>A</p> <div class='block' id='X1'> <h1>Foo</h1> <p id='para-2'>B</p> </div> <p id='para-3'>C</p> <h2>Bar</h2> <p id='para-4'>D</p> <p id='para-5'>E</p> <div class='block' id='X2'> <p id='para-6'>F</p> </div> </body> </html>" HTML_END I want to do something like what I can do in Hpricot: divs = value.search('//div[@id^="para-"]') How do I do a pattern search for elements in XPath

Nokogiri vs Hpricot?

阅读更多关于 Nokogiri vs Hpricot?

问题 Which one would you choose? My important attributes are (not in order): Support and future enhancements. Community and general knowledge base (on the Internet). Comprehensive (I.E., proven to parse a wide range of *.*ml pages). Performance. Memory footprint (runtime, not the code-base). 回答1: Pick Nokogiri, for all points and especially point one: Hpricot is no longer maintained. Meta answer: See ruby-toolbox to get an idea of the popularity of different tools in a given area. 回答2: Only pick

How do I wrap HTML untagged text with <p> tag using Nokogiri?

阅读更多关于 How do I wrap HTML untagged text with tag using Nokogiri?

问题 I have to parse an HTML document into different new files. The problem is that there are text nodes which have not been wrapped with "<p>" tags, instead they having "<br>" tags at the end of each paragraph. I want to wrap this text with <p> tags using Nokogiri: <div id="f15"><b>Footnote 15</b>: Catullus iii, 12.</div> <div class="pgmonospaced pgheader"><br/> <br/> End of the Project abc<br/> <br/> *** END OF THIS PROJECT XYZ ***<br/> <br/> ***** This file should be named new file.html... ****

 is getting converted as “\u0092” by nokogiri in ruby on rails

阅读更多关于  is getting converted as “\u0092” by nokogiri in ruby on rails

问题 I have html page which has following line with some html entities like "". #Here I am not pasting whole html page content. just putting issue line only html_file = "<html>....<body><p>theyre originally intended to describe the spread of of viral diseases, but theyre nice analogies for how web/SN apps grow.<p> ...</body></html>" doc = Nokogiri::HTML(html) body = doc.xpath('//body') body_content = body[0].inner_html puts body_content Result: These terms come from the fields of