urllib | 易学教程

socket ResourceWarning using urllib in Python 3

阅读更多关于 socket ResourceWarning using urllib in Python 3

问题 I am using a urllib.request.urlopen() to GET from a web service I'm trying to test. This returns an HTTPResponse object, which I then read() to get the response body. But I always see a ResourceWarning about an unclosed socket from socket.py Here's the relevant function: from urllib.request import Request, urlopen def get_from_webservice(url): """ GET from the webservice """ req = Request(url, method="GET", headers=HEADERS) with urlopen(req) as rsp: body = rsp.read().decode('utf-8') return

Python urllib urlencode problem with æøå

阅读更多关于 Python urllib urlencode problem with æøå

问题 How can I urlencode a string with special chars æøå? ex. urllib.urlencode('http://www.test.com/q=testæøå') I get this error :(.. not a valid non-string sequence or mapping object 回答1: You should pass dictionary to urlencode, not a string. See the correct example below: from urllib import urlencode print 'http://www.test.com/?' + urlencode({'q': 'testæøå'}) 回答2: urlencode is intended to take a dictionary, for example: >>> q= u'\xe6\xf8\xe5' # u'æøå' >>> params= {'q': q.encode('utf-8')} >>>

Python urllib2 automatic form filling and retrieval of results

阅读更多关于 Python urllib2 automatic form filling and retrieval of results

问题 I'm looking to be able to query a site for warranty information on a machine that this script would be running on. It should be able to fill out a form if needed ( like in the case of say HP's service site) and would then be able to retrieve the resulting web page. I already have the bits in place to parse the resulting html that is reported back I'm just having trouble with what needs to be done in order to do a POST of data that needs to be put in the fields and then being able to retrieve

opening a url with urllib in python 3

阅读更多关于 opening a url with urllib in python 3

问题 i'm trying to open the URL of this API from the sunlight foundation and return the data from the page in json. this is the code Ive produced, minus the parenthesis around myapikey. import urllib.request.urlopen import json urllib.request.urlopen("https://sunlightlabs.github.io/congress/legislators?api_key='(myapikey)") and im getting this error Traceback (most recent call last): File "<input>", line 1, in <module> ImportError: No module named request.urlopen what am i doing wrong? ive

Urllib and validation of server certificate

阅读更多关于 Urllib and validation of server certificate

问题 I use python 2.6 and request Facebook API (https). I guess my service could be target of Man In The Middle attacks. I discovered this morning reading again urllib module documentation that : Citation: Warning : When opening HTTPS URLs, it is not attempted to validate the server certificate. Use at your own risk! Do you have hints / url / examples to complete a full certificate validation ? Thanks for your help 回答1: You could create a urllib2 opener which can do the validation for you using a

hangs on open url with urllib (python3)

阅读更多关于 hangs on open url with urllib (python3)

问题 I try to open url with python3: import urllib.request fp = urllib.request.urlopen("http://lebed.com/") mybytes = fp.read() mystr = mybytes.decode("utf8") fp.close() print(mystr) But it hangs on second line. What's the reason of this problem and how to fix it? 回答1: I suppose the reason is that the url does not support robot visiting a site visit. You need to fake a browser visit by sending browser headers along with your request import urllib.request url = "http://lebed.com/" req = urllib

Download file using urllib in Python with the wget -c feature

阅读更多关于 Download file using urllib in Python with the wget -c feature

问题 I am programming a software in Python to download HTTP PDF from a database. Sometimes the download stop with this message : retrieval incomplete: got only 3617232 out of 10689634 bytes How can I ask the download to restart where it stops using the 206 Partial Content HTTP feature ? I can do it using wget -c and it works pretty well, but I would like to implement it directly in my Python software. Any idea ? Thank you 回答1: You can request a partial download by sending a GET with the Range

What is the global default timeout

阅读更多关于 What is the global default timeout

问题 Python 3.4 . Trying to find what is the default timeout in urllib.request.urlopen() . Its signature is: urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None) The doc says that its "global default timeout", and looking at the code its: socket._GLOBAL_DEFAULT_TIMEOUT Still what is the actual value in secs? 回答1: I suspect this is implementation-dependent. That said, for CPython: From socket.create_connection, If no timeout is supplied, the

Force python mechanize/urllib2 to only use A requests?

阅读更多关于 Force python mechanize/urllib2 to only use A requests?

问题 Here is a related question but I could not figure out how to apply the answer to mechanize/urllib2: how to force python httplib library to use only A requests Basically, given this simple code: #!/usr/bin/python import urllib2 print urllib2.urlopen('http://python.org/').read(100) This results in wireshark saying the following: 0.000000 10.102.0.79 -> 8.8.8.8 DNS Standard query A python.org 0.000023 10.102.0.79 -> 8.8.8.8 DNS Standard query AAAA python.org 0.005369 8.8.8.8 -> 10.102.0.79 DNS

Python urllib vs httplib?

阅读更多关于 Python urllib vs httplib?

问题 When would someone use httplib and when urllib? What are the differences? I think I ready urllib uses httplib, I am planning to make an app that will need to make http request and so far I only used httplib.HTTPConnection in python for requests, and reading about urllib I see I can use that for request too, so whats the benefit of one or the other? 回答1: urllib (particularly urllib2) handles many things by default or has appropriate libs to do so. For example, urllib2 will follow redirects