nokogiri | 易学教程

Get text directly inside a tag in Nokogiri

阅读更多关于 Get text directly inside a tag in Nokogiri

问题 I have some HTML that looks like: <dt> <a href="#">Hello</a> (2009) </dt> I already have all my HTML loaded into a variable called record . I need to parse out the year i.e. 2009 if it exists. How can I get the text inside the dt tag but not the text inside the a tag? I've used record.search("dt").inner_text and this gives me everything. It's a trivial question but I haven't managed to figure this out. 回答1: To get all the direct children with text, but not any further sub-children, you can

How to parse consecutive tags with Nokogiri?

阅读更多关于 How to parse consecutive tags with Nokogiri?

问题 I have HTML code like this: <div id="first"> <dt>Label1</dt> <dd>Value1</dd> <dt>Label2</dt> <dd>Value2</dd> ... </div> My code does not work. doc.css("first").each do |item| label = item.css("dt") value = item.css("dd") end Show all the <dt> tags firsts and then the <dd> tags and I need "label: value" 回答1: First of all, your HTML should have the <dt> and <dd> elements inside a <dl> : <div id="first"> <dl> <dt>Label1</dt> <dd>Value1</dd> <dt>Label2</dt> <dd>Value2</dd> ... </dl> </div> but

trying to get content inside cdata tags in xml file using nokogiri

阅读更多关于 trying to get content inside cdata tags in xml file using nokogiri

问题 I have seen several things on this, but nothing has seemed to work so far. I am parsing an xml via a url using nokogiri on rails 3 ruby 1.9.2. A snippet of the xml looks like this: <NewsLineText> <![CDATA[ Anna Kendrick is ''obsessed'' with 'Game of Thrones' and loves to cook, particularly creme brulee. ]]> </NewsLineText> I am trying to parse this out to get the text associated with the NewsLineText r = node.at_xpath('.//newslinetext') if node.at_xpath('.//newslinetext') s = node.at_xpath('.

Using Nokogiri to Split Content on BR tags

阅读更多关于 Using Nokogiri to Split Content on BR tags

问题 I have a snippet of code im trying to parse with nokogiri that looks like this: <td class="j"> <a title="title text1" href="http://link1.com">Link 1</a> (info1), Blah 1,<br> <a title="title text2" href="http://link2.com">Link 2</a> (info1), Blah 1,<br> <a title="title text2" href="http://link3.com">Link 3</a> (info2), Blah 1 Foo 2,<br> </td> I have access to the source of the td.j using something like this: data_items = doc.css("td.j") My goal is to split each of those lines up into an array

Install Nokogiri 1.6.1 under Ruby 2.0.0p353 (rvm based installation) fails (OSX Mavericks)?

阅读更多关于 Install Nokogiri 1.6.1 under Ruby 2.0.0p353 (rvm based installation) fails (OSX Mavericks)?

问题 I've tried to install Nokogiri 1.6.1 under Ruby and RVM but is failing with the following error: Gem::Installer::ExtensionBuildError: ERROR: Failed to build gem native extension. /Users/lmo0/.rvm/rubies/ruby-2.0.0-p353/bin/ruby extconf.rb Extracting libxml2-2.8.0.tar.gz into tmp/x86_64-apple-darwin13.0.0/ports/libxml2/2.8.0... OK Running 'configure' for libxml2 2.8.0... OK Running 'compile' for libxml2 2.8.0... OK Running 'install' for libxml2 2.8.0... OK Activating libxml2 2.8.0 (from /Users

How do I validate XHTML with nokogiri?

阅读更多关于 How do I validate XHTML with nokogiri?

问题 I've found a few posts alluding to the fact that you can validate XHTML against its DTD using the nokogiri gem. Whilst I've managed to use it to parse XHTML successfully (looking for 'a' tags etc.), I'm struggling to validate documents. For me, this: doc = Nokogiri::XML(Net::HTTP.get(URI.parse("http://www.w3.org"))) puts doc.validate results in a whole heap of: [ #<Nokogiri::XML::SyntaxError: No declaration for element html>, #<Nokogiri::XML::SyntaxError: No declaration for attribute xmlns of

Screen scraping through nokogiri or hpricot

阅读更多关于 Screen scraping through nokogiri or hpricot

问题 I'm trying to get actual value of given xpath. I am having the following code in sample.rb file require 'rubygems' require 'nokogiri' require 'open-uri' doc = Nokogiri::HTML(open('http://www.changebadtogood.com/')) desc "Trying to get the value of given xapth" task :sample do begin doc.xpath('//*[@id="view_more"]').each do |link| puts link.content end rescue Exception => e puts "error" end end Output is: View more issues .. When I try to get the value for other a different XPath, such as:

Parsing XML to get the population of Albania?

阅读更多关于 Parsing XML to get the population of Albania?

问题 I am trying to learn how to use Nokogiri and parse XML files, however I can't seem to get past this issue I am having. I have this XML file with information about countries such as population, name, religion, inflation etc.: <cia> <continent id='europe' name='Europe'/> <continent id='asia' name='Asia'/> <continent id='northAmerica' name='North America'/> <continent id='australia' name='Australia/Oceania'/> <continent id='southAmerica' name='South America'/> <continent id='africa' name='Africa

How to access multiple <p> tags one at a time

阅读更多关于 How to access multiple tags one at a time

问题 I have the following HTML: <div id="test_id"> <p>Some words.</p> <p>Some more words.</p> <p>Even more words.</p> </div> If I parse the HTML using: doc = Nokogiri::HTML(open("http://my_url")) and run doc.css('#test_id').text in the console I get: => "Some words.\nSome more words.\nEven more words" How do I get the first <p> element only? I think I figured it out with .children doc.css('#test_id').children[0].text Is this the correct way to do this? 回答1: The problem is that you're not using

Parse html GET via open() with nokogiri - redirect exception

阅读更多关于 Parse html GET via open() with nokogiri - redirect exception

问题 I'm trying to learn ruby, so I'm following an exercise of google dev. I'm trying to parse some links. In the case of successful redirection (considering that I know that it its possible only to get redirected once), I get redirect forbidden. I noticed that I go from a http protocol link to an https protocol link. Any concrete idea how could I implement in this in ruby because google's exercise is for python? error: ruby fix.rb redirection forbidden: http://code.google.com/edu/languages/google