open_uri / Nokogiri redirection problems

给你一囗甜甜゛ 提交于 2019-12-10 18:55:19

问题


I am using Nokogiri for scraping a webpage that works fine unless the page has a redirection loop.

So when I scraping this site: https://www.cardcomplete.com/besuchen-isie-uns-auf-facebook/

I get this error

/home/balint/.rvm/rubies/ruby-2.2.1/lib/ruby/2.2.0/open-uri.rb:224:in open_loop': redirection forbidden: https://www.cardcomplete.com/besuchen-isie-uns-auf-facebook/ -> http://www.facebook.com/cardcomplete (RuntimeError)

But when I try to scrape this site I get the same error but now it is redirected to the https version of the facebook page:

/home/balint/.rvm/rubies/ruby-2.2.1/lib/ruby/2.2.0/open-uri.rb:224:in `open_loop': redirection forbidden: http://www.facebook.com/cardcomplete -> https://www.facebook.com/cardcomplete (RuntimeError)

Of course, scraping the https version of the facebook page works.

I installed this open_uri_redirections gem that works for the facebook http->https redirection but not for the first link:

doc = Nokogiri::HTML(open('https://www.cardcomplete.com/besuchen-isie-uns-auf-facebook/', :allow_redirections => :safe))

How to solve this?

来源:https://stackoverflow.com/questions/30764104/open-uri-nokogiri-redirection-problems

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!