urlopen | 易学教程

macOS Sierra/Python2.7.13 URLError: <urlopen error EOF occurred in violation of protocol (_ssl.c:661)>

阅读更多关于 macOS Sierra/Python2.7.13 URLError:

问题 I have been searching/trying everything I could find on Stack Overflow, but no joy. I am new to Python, so I apologize now for my ignorance, but very eager/excited to learn. macOS Sierra v10.12.5 (early 2011) Python v2.7.13 urllib==1.21.1 urllib2==1498656401.94 urllib3==1.21.1 Homebrew installed Here is the error I am receiving: Python 2.7.13 (v2.7.13:a06454b1afa1, Dec 17 2016, 12:39:47) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "copyright", "credits" or "license()" for more

how to fix python, urlopen error [Errno 8], using eventlet green

阅读更多关于 how to fix python, urlopen error [Errno 8], using eventlet green

问题 Python novice here. I'm making a lot of asynchronous http requests using eventlet and urllib2. At the top of my file I have import eventlet import urllib from eventlet.green import urllib2 Then I make a lot of asynchronous http requests that succeed with this line: conn = urllib2.urlopen(signed_url, None) And all of a sudden, I get this error: URLError: <urlopen error [Errno 8] nodename nor servname provided, or not known> This error occurs on the same urllib2.urlopen line, which is weird

Scraping the second page of a website in Python does not work

阅读更多关于 Scraping the second page of a website in Python does not work

问题 Let's say I want to scrape the data here. I can do it nicely using urlopen and BeautifulSoup in Python 2.7. Now if I want to scrape data from the second page with this address. What I get is the data from the first page! I looked at the page source of the second page using "view page source" of Chrome and the content belongs to first page! How can I scrape the data from the second page? 回答1: The page is of a quite asynchronous nature, there are XHR requests forming the search results,

HTML data is hidden from urllib

阅读更多关于 HTML data is hidden from urllib

问题 How do I get the real content from this page: http://kursuskatalog.au.dk/da/course/74960/105E17-Demokrati-og-diktatur-i-komparativt-perspektiv All I get from the code below is some links to javascript and CSS files. Is there a way out of this? from urllib.request import urlopen html = urlopen("http://kursuskatalog.au.dk/da/course/74960/105E17-Demokrati-og-diktatur-i-komparativt-perspektiv") print(html.read()) Best regards, Kresten 回答1: Content in this URL is created with JavaScript after page

Web-scraping JavaScript page with Python

阅读更多关于 Web-scraping JavaScript page with Python

问题 I'm trying to develop a simple web scraper. I want to extract text without the HTML code. In fact, I achieve this goal, but I have seen that in some pages where JavaScript is loaded I didn't obtain good results. For example, if some JavaScript code adds some text, I can't see it, because when I call response = urllib2.urlopen(request) I get the original text without the added one (because JavaScript is executed in the client). So, I'm looking for some ideas to solve this problem. 回答1: EDIT 30

Gibberish from urlopen

阅读更多关于 Gibberish from urlopen

问题 I am trying to read some utf-8 files from the addresses in the code below. It works for most of them, but for some files the urllib2 (and urllib) is unable to read. The obvious answer here is that the second file is corrupt, but the strange thing is that IE reads them both with no problem at all. The code has been tested on both XP and Linux, with identical results. Any sugestions? import urllib2 #This works: f=urllib2.urlopen("http://www.gutenberg.org/cache/epub/145/pg145.txt") line=f

Why urllib2.urlopen can not open pages like “http://localhost/new-post#comment-29”?

阅读更多关于 Why urllib2.urlopen can not open pages like “http://localhost/new-post#comment-29”?

问题 I'm curious, how come I get 404 error running this line: urllib2.urlopen("http://localhost/new-post#comment-29") While everything works fine surfing http://localhost/new-post#comment-29 in any browser... urlopen method does not parse urls with "#" in it? Anybody knows? 回答1: In the HTTP protocol, the fragment (from # onwards) is not sent to the server across the network: it's locally retained by the browser and used, once the server's response is fully received, to somehow "visually locate"

Use “byte-like object” from urlopen.read with JSON?

阅读更多关于 Use “byte-like object” from urlopen.read with JSON?

问题 Just trying to test out very simple Python JSON commands, but I'm having some trouble. urlopen('http://www.similarsitesearch.com/api/similar/ebay.com').read() should output '{"num":20,"status":"ok","r0":"http:\\/\\/www.propertyroom.com\\/","r1":"http:\\/\\/www.ubid.com\\/","r2":"http:\\/\\/www.bidcactus.com\\/","r3":"http:\\/\\/www.etsy.com\\/","r4":"http:\\/\\/us.ebid.net\\/","r5":"http:\\/\\/www.bidrivals.com\\/","r6":"http:\\/\\/www.ioffer.com\\/","r7":"http:\\/\\/www.shopgoodwill.com\\/",

Why urllib.urlopen.read() does not correspond to source code?

阅读更多关于 Why urllib.urlopen.read() does not correspond to source code?

问题 I'm trying to fetch the following webpage: import urllib urllib.urlopen("http://www.gallimard-jeunesse.fr/searchjeunesse/advanced/(order)/author?catalog[0]=1&SearchAction=1").read() The result does not correspond to what I see when inspecting the source code of the webpage using Google Chrome for example. Could you tell me why this happens and how I could improve my code to overcome the problem? Thank you for your help. 回答1: What you are getting from urlopen is the raw webpage meaning no

503 error when trying to access Google Patents using python

阅读更多关于 503 error when trying to access Google Patents using python

问题 Earlier today I was able to pull data from Google Patents using the code below import urllib2 url = 'http://www.google.com/search?tbo=p&q=ininventor:"John-Mudd"&hl=en&tbm=pts&source=lnt&tbs=ptso:us' req = urllib2.Request(url, headers={'User-Agent' : "foobar"}) response = urllib2.urlopen(req) Now when I go to run it I get the following 503 error. I had only looped through this code maybe 30 times on it (i'm trying to get all the patents owned by a list of 30 people). HTTPError Traceback (most