nokogiri

Encoding issue when using Nokogiri replace

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-23 09:27:41
问题 I have this code: # encoding: utf-8 require 'nokogiri' s = "<a href='/path/to/file'>Café Verona</a>".encode('UTF-8') puts "Original string: #{s}" @doc = Nokogiri::HTML::DocumentFragment.parse(s) links = @doc.css('a') only_text = 'Café Verona'.encode('UTF-8') puts "Replacement text: #{only_text}" links.first.replace(only_text) puts @doc.to_html However, the output is this: Original string: <a href='/path/to/file'>Café Verona</a> Replacement text: Café Verona Café Verona Why does the text in

Heroku app crashes, logs say “No such file to load — nokogiri (LoadError)”

烈酒焚心 提交于 2019-12-23 07:40:30
问题 I had a working app, added Nokogiri, to parse some xml, runs fine locally. My Gemfile includes: gem 'nokogiri' I ran bundle install and verified my Gemfile.lock includes DEPENDENCIES ... nokogiri In my controller class I added (didnt thinkI had to but got an error locally if I didnt): class MydealController < ApplicationController require 'rubygems' require 'open-uri' require 'nokogiri' when I use my browser to get the url in MydealController that uses nokogiri doc = Nokogiri::XML(getresult)

Using SAX Parser to get several sub-nodes?

最后都变了- 提交于 2019-12-23 04:38:18
问题 I have a large local XML file (24 GB) with a structure like this: <id>****</id> <url> ****</url> (several times within an id...) I need a result like this: id1;url1 id1;url2 id1;url3 id2;url4 .... I wanted to use Nokigiri either with the SAX Parser or the Reader since I can't load the whole file into memory. I am using a Ruby Rake task to execute the code. My code with SAX is: task :fetch_saxxml => :environment do require 'nokogiri' require 'open-uri' class MyDocument < Nokogiri::XML::SAX:

Remove unnecessary temporary files after gem install nokogiri [duplicate]

馋奶兔 提交于 2019-12-23 03:26:23
问题 This question already has answers here : Can I delete some folders of nokogiri and capybara-webkit inside of my rvm gemset? (2 answers) Closed 5 years ago . I have to use nokogiri for some xml processing. For this I create a rvm gemset specific to the project and install nokogiri by gem install nokogiri. No problems this far. But when I look into ~.rvm/gems/ruby-...@nokogiri/gems/nokogiri-.../ext/nokogiri/ and its subfolders I see files worth of 140MB in the filesystem. Is there some generic

Iterating through multiple URLs to parse HTML with Nokogori

醉酒当歌 提交于 2019-12-23 03:15:08
问题 What I'm trying to do is scrape the names and prices of items from multiple vendors using Nokogiri. I'm passing the CSS selectors (to the find names and prices) to Nokogiri with method arguments. Any guidance on how to pass multiple URLs to the "scrape" method while also passing the other arguments (ex: vendor, item_path)? Or am I going about this the completely wrong way? Here is the code: require 'rubygems' # Load Ruby Gems require 'nokogiri' # Load Nokogiri require 'open-uri' # Load Open

Iterating through multiple URLs to parse HTML with Nokogori

大城市里の小女人 提交于 2019-12-23 03:15:04
问题 What I'm trying to do is scrape the names and prices of items from multiple vendors using Nokogiri. I'm passing the CSS selectors (to the find names and prices) to Nokogiri with method arguments. Any guidance on how to pass multiple URLs to the "scrape" method while also passing the other arguments (ex: vendor, item_path)? Or am I going about this the completely wrong way? Here is the code: require 'rubygems' # Load Ruby Gems require 'nokogiri' # Load Nokogiri require 'open-uri' # Load Open

Generating XML with cdata using Ox?

血红的双手。 提交于 2019-12-23 02:47:30
问题 I need to generate XML using ox but didn't get much help from the documentation. I need to generate XML like this: <Jobpostings> <Postings> <Posting> <JobTitle><cdata>Programmer Analyst 3-IT</cdata></JobTitle> <Location><cdata>Romania,Bucharest...</cdata></Location> <CountryCode><cdata>US</cdata> </CountryCode> <JobDescription><cdata>class technology to develop.</cdata></JobDescription> </Posting> </Postings> </jobpostings> I have the data inside the tags as strings in variables like this:

How to get 'value' of select tag based on content of select tag, using Nokogiri

纵然是瞬间 提交于 2019-12-23 01:52:34
问题 How would one get the contents of the 'value' attribute of a select tag, based on content of the select tag (i.e. the text wrapped by option), using Nokogiri? For example, given the following HTML: <select id="options" name="options"> <option value="1">First Option - 4</option> <option value="2">Second Option - 5</option> <option value="3">Third Option - 6</option> </select> I would like to be able to specify a string (e.g. 'First Option') and have the contents of the 'value' attribute

What is the absolutely cheapest way to select a child node in Nokogiri?

自古美人都是妖i 提交于 2019-12-23 01:37:28
问题 I know that there are dozens of ways to select the first child element in Nokogiri, but which is the cheapest? I can't get around using Node#children, which sounds awfully expensive. Say that there are 10000 child nodes, and I don't want to touch the 9999 others... 回答1: Node#child is the fastest way to get the first child element. However, if the node you're looking for is NOT the first (e.g., the 99th), then there is no faster way to select that node than to call #children and index into it.

Adjusting timeouts for Nokogiri connections

主宰稳场 提交于 2019-12-22 18:37:32
问题 Why nokogiri waits for couple of secongs (3-5) when the server is busy and I'm requesting pages one by one, but when these request are in a loop, nokogiri does not wait and throws the timeout message. I'm using timeout block wrapping the request, but nokogiri does not wait for that time at all. Any suggested procedure on this? # this is a method from the eng class def get_page(url,page_type) begin timeout(10) do # Get a Nokogiri::HTML::Document for the page we’re interested in... @@doc =