nokogiri

Get text directly inside a tag in Nokogiri

孤者浪人 提交于 2019-12-30 02:44:07
问题 I have some HTML that looks like: <dt> <a href="#">Hello</a> (2009) </dt> I already have all my HTML loaded into a variable called record . I need to parse out the year i.e. 2009 if it exists. How can I get the text inside the dt tag but not the text inside the a tag? I've used record.search("dt").inner_text and this gives me everything. It's a trivial question but I haven't managed to figure this out. 回答1: To get all the direct children with text, but not any further sub-children, you can

How to parse consecutive tags with Nokogiri?

你离开我真会死。 提交于 2019-12-30 02:33:08
问题 I have HTML code like this: <div id="first"> <dt>Label1</dt> <dd>Value1</dd> <dt>Label2</dt> <dd>Value2</dd> ... </div> My code does not work. doc.css("first").each do |item| label = item.css("dt") value = item.css("dd") end Show all the <dt> tags firsts and then the <dd> tags and I need "label: value" 回答1: First of all, your HTML should have the <dt> and <dd> elements inside a <dl> : <div id="first"> <dl> <dt>Label1</dt> <dd>Value1</dd> <dt>Label2</dt> <dd>Value2</dd> ... </dl> </div> but

trying to get content inside cdata tags in xml file using nokogiri

两盒软妹~` 提交于 2019-12-29 08:27:29
问题 I have seen several things on this, but nothing has seemed to work so far. I am parsing an xml via a url using nokogiri on rails 3 ruby 1.9.2. A snippet of the xml looks like this: <NewsLineText> <![CDATA[ Anna Kendrick is ''obsessed'' with 'Game of Thrones' and loves to cook, particularly creme brulee. ]]> </NewsLineText> I am trying to parse this out to get the text associated with the NewsLineText r = node.at_xpath('.//newslinetext') if node.at_xpath('.//newslinetext') s = node.at_xpath('.

Using Nokogiri to Split Content on BR tags

情到浓时终转凉″ 提交于 2019-12-29 07:43:10
问题 I have a snippet of code im trying to parse with nokogiri that looks like this: <td class="j"> <a title="title text1" href="http://link1.com">Link 1</a> (info1), Blah 1,<br> <a title="title text2" href="http://link2.com">Link 2</a> (info1), Blah 1,<br> <a title="title text2" href="http://link3.com">Link 3</a> (info2), Blah 1 Foo 2,<br> </td> I have access to the source of the td.j using something like this: data_items = doc.css("td.j") My goal is to split each of those lines up into an array

Install Nokogiri 1.6.1 under Ruby 2.0.0p353 (rvm based installation) fails (OSX Mavericks)?

落花浮王杯 提交于 2019-12-29 07:38:07
问题 I've tried to install Nokogiri 1.6.1 under Ruby and RVM but is failing with the following error: Gem::Installer::ExtensionBuildError: ERROR: Failed to build gem native extension. /Users/lmo0/.rvm/rubies/ruby-2.0.0-p353/bin/ruby extconf.rb Extracting libxml2-2.8.0.tar.gz into tmp/x86_64-apple-darwin13.0.0/ports/libxml2/2.8.0... OK Running 'configure' for libxml2 2.8.0... OK Running 'compile' for libxml2 2.8.0... OK Running 'install' for libxml2 2.8.0... OK Activating libxml2 2.8.0 (from /Users

How do I validate XHTML with nokogiri?

 ̄綄美尐妖づ 提交于 2019-12-28 15:02:57
问题 I've found a few posts alluding to the fact that you can validate XHTML against its DTD using the nokogiri gem. Whilst I've managed to use it to parse XHTML successfully (looking for 'a' tags etc.), I'm struggling to validate documents. For me, this: doc = Nokogiri::XML(Net::HTTP.get(URI.parse("http://www.w3.org"))) puts doc.validate results in a whole heap of: [ #<Nokogiri::XML::SyntaxError: No declaration for element html>, #<Nokogiri::XML::SyntaxError: No declaration for attribute xmlns of

Screen scraping through nokogiri or hpricot

可紊 提交于 2019-12-25 18:11:10
问题 I'm trying to get actual value of given xpath. I am having the following code in sample.rb file require 'rubygems' require 'nokogiri' require 'open-uri' doc = Nokogiri::HTML(open('http://www.changebadtogood.com/')) desc "Trying to get the value of given xapth" task :sample do begin doc.xpath('//*[@id="view_more"]').each do |link| puts link.content end rescue Exception => e puts "error" end end Output is: View more issues .. When I try to get the value for other a different XPath, such as:

Parsing XML to get the population of Albania?

别说谁变了你拦得住时间么 提交于 2019-12-25 08:56:33
问题 I am trying to learn how to use Nokogiri and parse XML files, however I can't seem to get past this issue I am having. I have this XML file with information about countries such as population, name, religion, inflation etc.: <cia> <continent id='europe' name='Europe'/> <continent id='asia' name='Asia'/> <continent id='northAmerica' name='North America'/> <continent id='australia' name='Australia/Oceania'/> <continent id='southAmerica' name='South America'/> <continent id='africa' name='Africa

How to access multiple <p> tags one at a time

这一生的挚爱 提交于 2019-12-25 08:23:54
问题 I have the following HTML: <div id="test_id"> <p>Some words.</p> <p>Some more words.</p> <p>Even more words.</p> </div> If I parse the HTML using: doc = Nokogiri::HTML(open("http://my_url")) and run doc.css('#test_id').text in the console I get: => "Some words.\nSome more words.\nEven more words" How do I get the first <p> element only? I think I figured it out with .children doc.css('#test_id').children[0].text Is this the correct way to do this? 回答1: The problem is that you're not using

Parse html GET via open() with nokogiri - redirect exception

此生再无相见时 提交于 2019-12-25 06:39:05
问题 I'm trying to learn ruby, so I'm following an exercise of google dev. I'm trying to parse some links. In the case of successful redirection (considering that I know that it its possible only to get redirected once), I get redirect forbidden. I noticed that I go from a http protocol link to an https protocol link. Any concrete idea how could I implement in this in ruby because google's exercise is for python? error: ruby fix.rb redirection forbidden: http://code.google.com/edu/languages/google