nokogiri | 易学教程

Encoding issue when using Nokogiri replace

阅读更多关于 Encoding issue when using Nokogiri replace

问题 I have this code: # encoding: utf-8 require 'nokogiri' s = "<a href='/path/to/file'>Café Verona</a>".encode('UTF-8') puts "Original string: #{s}" @doc = Nokogiri::HTML::DocumentFragment.parse(s) links = @doc.css('a') only_text = 'Café Verona'.encode('UTF-8') puts "Replacement text: #{only_text}" links.first.replace(only_text) puts @doc.to_html However, the output is this: Original string: <a href='/path/to/file'>Café Verona</a> Replacement text: Café Verona CafÃ© Verona Why does the text in

Heroku app crashes, logs say “No such file to load — nokogiri (LoadError)”

阅读更多关于 Heroku app crashes, logs say “No such file to load — nokogiri (LoadError)”

问题 I had a working app, added Nokogiri, to parse some xml, runs fine locally. My Gemfile includes: gem 'nokogiri' I ran bundle install and verified my Gemfile.lock includes DEPENDENCIES ... nokogiri In my controller class I added (didnt thinkI had to but got an error locally if I didnt): class MydealController < ApplicationController require 'rubygems' require 'open-uri' require 'nokogiri' when I use my browser to get the url in MydealController that uses nokogiri doc = Nokogiri::XML(getresult)

Using SAX Parser to get several sub-nodes?

阅读更多关于 Using SAX Parser to get several sub-nodes?

问题 I have a large local XML file (24 GB) with a structure like this: <id>****</id> <url> ****</url> (several times within an id...) I need a result like this: id1;url1 id1;url2 id1;url3 id2;url4 .... I wanted to use Nokigiri either with the SAX Parser or the Reader since I can't load the whole file into memory. I am using a Ruby Rake task to execute the code. My code with SAX is: task :fetch_saxxml => :environment do require 'nokogiri' require 'open-uri' class MyDocument < Nokogiri::XML::SAX:

Remove unnecessary temporary files after gem install nokogiri [duplicate]

阅读更多关于 Remove unnecessary temporary files after gem install nokogiri [duplicate]

问题 This question already has answers here : Can I delete some folders of nokogiri and capybara-webkit inside of my rvm gemset? (2 answers) Closed 5 years ago . I have to use nokogiri for some xml processing. For this I create a rvm gemset specific to the project and install nokogiri by gem install nokogiri. No problems this far. But when I look into ~.rvm/gems/ruby-...@nokogiri/gems/nokogiri-.../ext/nokogiri/ and its subfolders I see files worth of 140MB in the filesystem. Is there some generic

Iterating through multiple URLs to parse HTML with Nokogori

阅读更多关于 Iterating through multiple URLs to parse HTML with Nokogori

问题 What I'm trying to do is scrape the names and prices of items from multiple vendors using Nokogiri. I'm passing the CSS selectors (to the find names and prices) to Nokogiri with method arguments. Any guidance on how to pass multiple URLs to the "scrape" method while also passing the other arguments (ex: vendor, item_path)? Or am I going about this the completely wrong way? Here is the code: require 'rubygems' # Load Ruby Gems require 'nokogiri' # Load Nokogiri require 'open-uri' # Load Open

Iterating through multiple URLs to parse HTML with Nokogori

阅读更多关于 Iterating through multiple URLs to parse HTML with Nokogori

Generating XML with cdata using Ox?

阅读更多关于 Generating XML with cdata using Ox?

问题 I need to generate XML using ox but didn't get much help from the documentation. I need to generate XML like this: <Jobpostings> <Postings> <Posting> <JobTitle><cdata>Programmer Analyst 3-IT</cdata></JobTitle> <Location><cdata>Romania,Bucharest...</cdata></Location> <CountryCode><cdata>US</cdata> </CountryCode> <JobDescription><cdata>class technology to develop.</cdata></JobDescription> </Posting> </Postings> </jobpostings> I have the data inside the tags as strings in variables like this:

How to get 'value' of select tag based on content of select tag, using Nokogiri

阅读更多关于 How to get 'value' of select tag based on content of select tag, using Nokogiri

问题 How would one get the contents of the 'value' attribute of a select tag, based on content of the select tag (i.e. the text wrapped by option), using Nokogiri? For example, given the following HTML: <select id="options" name="options"> <option value="1">First Option - 4</option> <option value="2">Second Option - 5</option> <option value="3">Third Option - 6</option> </select> I would like to be able to specify a string (e.g. 'First Option') and have the contents of the 'value' attribute

What is the absolutely cheapest way to select a child node in Nokogiri?

阅读更多关于 What is the absolutely cheapest way to select a child node in Nokogiri?

问题 I know that there are dozens of ways to select the first child element in Nokogiri, but which is the cheapest? I can't get around using Node#children, which sounds awfully expensive. Say that there are 10000 child nodes, and I don't want to touch the 9999 others... 回答1: Node#child is the fastest way to get the first child element. However, if the node you're looking for is NOT the first (e.g., the 99th), then there is no faster way to select that node than to call #children and index into it.

Adjusting timeouts for Nokogiri connections

阅读更多关于 Adjusting timeouts for Nokogiri connections

问题 Why nokogiri waits for couple of secongs (3-5) when the server is busy and I'm requesting pages one by one, but when these request are in a loop, nokogiri does not wait and throws the timeout message. I'm using timeout block wrapping the request, but nokogiri does not wait for that time at all. Any suggested procedure on this? # this is a method from the eng class def get_page(url,page_type) begin timeout(10) do # Get a Nokogiri::HTML::Document for the page we’re interested in... @@doc =