urlopen | 易学教程

Is there a way to scrape Amazon Product Listing page using Python?

阅读更多关于 Is there a way to scrape Amazon Product Listing page using Python?

问题 I'm trying to scrape product listing pages that display the vendors and prices of particular products, but urllib.urlopen isn't working--it will work on all other pages on Amazon, but I'm kind of wondering if Amazon's bots prevent scraping on product listing pages. Can anyone verify this? Using Chrome I can still view page source... Here's an example of a product listing page I would want to scrape: http://www.amazon.com/gp/offer-listing/B007E84H96/ref=dp_olp_new?ie=UTF8&condition=new 回答1:

How to pass parameter to Url with Python urlopen

阅读更多关于 How to pass parameter to Url with Python urlopen

问题 I'm currently new to python programming. My problem is that my python program doesn't seem to pass/encode the parameter properly to the ASP file that I've created. This is my sample code: import urllib.request url = 'http://www.sample.com/myASP.asp' full_url = url + "?data='" + str(sentData).replace("'", '"').replace(" ", "%20").replace('"', "%22") + "'" print (full_url) response = urllib.request.urlopen(full_url) print(response) the output would give me something like: http://www.sample.com

unbuffered urllib2.urlopen

阅读更多关于 unbuffered urllib2.urlopen

I have client for web interface to long running process. I'd like to have output from that process to be displayed as it comes. Works great with urllib.urlopen() , but it doesn't have timeout parameter. On the other hand with urllib2.urlopen() the output is buffered. Is there a easy way to disable that buffer? A quick hack that has occurred to me is to use urllib.urlopen() with threading.Timer() to emulate timeout. But that's only quick and dirty hack. urllib2 is buffered when you just call read() you could define a size to read and therefore disable buffering. for example: import urllib2

Using urlopen to open list of urls

阅读更多关于 Using urlopen to open list of urls

问题 I have a python script that fetches a webpage and mirrors it. It works fine for one specific page, but I can't get it to work for more than one. I assumed I could put multiple URLs into a list and then feed that to the function, but I get this error: Traceback (most recent call last): File "autowget.py", line 46, in <module> getUrl() File "autowget.py", line 43, in getUrl response = urllib.request.urlopen(url) File "/usr/lib/python3.2/urllib/request.py", line 139, in urlopen return opener

How to reliably process web-data in Python

阅读更多关于 How to reliably process web-data in Python

问题 I'm using the following code to get data from a website: time_out = 4 def tryconnect(turl, timer=time_out, retries=10): urlopener = None sitefound = 1 tried = 0 while (sitefound != 0) and tried < retries: try: urlopener = urllib2.urlopen(turl, None, timer) sitefound = 0 except urllib2.URLError: tried += 1 if urlopener: return urlopener else: return None [...] urlopener = tryconnect('www.example.com') if not urlopener: return None try: for line in urlopener: do stuff except httplib

How to pass parameter to Url with Python urlopen

阅读更多关于 How to pass parameter to Url with Python urlopen

I'm currently new to python programming. My problem is that my python program doesn't seem to pass/encode the parameter properly to the ASP file that I've created. This is my sample code: import urllib.request url = 'http://www.sample.com/myASP.asp' full_url = url + "?data='" + str(sentData).replace("'", '"').replace(" ", "%20").replace('"', "%22") + "'" print (full_url) response = urllib.request.urlopen(full_url) print(response) the output would give me something like: http://www.sample.com/myASP.asp?data='{%22mykey%22:%20[{%22idno%22:%20%22id123%22,%20%22name%22:%20%22ej%22}]}' The asp file

Urllib's urlopen breaking on some sites (e.g. StackApps api): returns garbage results

阅读更多关于 Urllib's urlopen breaking on some sites (e.g. StackApps api): returns garbage results

I'm using urllib2 's urlopen function to try and get a JSON result from the StackOverflow api. The code I'm using: >>> import urllib2 >>> conn = urllib2.urlopen("http://api.stackoverflow.com/0.8/users/") >>> conn.readline() The result I'm getting: '\x1f\x8b\x08\x00\x00\x00\x00\x00\x04\x00\xed\xbd\x07`\x1cI\x96%&/m\xca{\x7fJ\... I'm fairly new to urllib, but this doesn't seem like the result I should be getting. I've tried it in other places and I get what I expect (the same as visiting the address with a browser gives me: a JSON object). Using urlopen on other sites (e.g. " http://google.com "

python urllib2.urlopen(url) process block

阅读更多关于 python urllib2.urlopen(url) process block

I am using urllib2.urlopen() and my process is getting blocked I am aware that urllib2.urlopen() has default timeout. How to make the call unblockable? The backtrace is (gdb) bt #0 0x0000003c6200dc35 in recv () from /lib64/libpthread.so.0 #1 0x00002b88add08137 in ?? () from /usr/lib64/python2.6/lib-dynload/_socketmodule.so #2 0x00002b88add0830e in ?? () from /usr/lib64/python2.6/lib-dynload/_socketmodule.so #3 0x000000310b2d8e19 in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0 If your problem is that you need to urllib to finish reading read() operation is blocking operation in

How to reliably process web-data in Python

阅读更多关于 How to reliably process web-data in Python

I'm using the following code to get data from a website: time_out = 4 def tryconnect(turl, timer=time_out, retries=10): urlopener = None sitefound = 1 tried = 0 while (sitefound != 0) and tried < retries: try: urlopener = urllib2.urlopen(turl, None, timer) sitefound = 0 except urllib2.URLError: tried += 1 if urlopener: return urlopener else: return None [...] urlopener = tryconnect('www.example.com') if not urlopener: return None try: for line in urlopener: do stuff except httplib.IncompleteRead: print 'incomplete' return None except socket.timeout: print 'socket' return None return stuff Is

python urllib2 urlopen response

阅读更多关于 python urllib2 urlopen response

问题 python urllib2 urlopen response: <addinfourl at 1081306700 whose fp = <socket._fileobject object at 0x4073192c>> expected: {"token":"mYWmzpunvasAT795niiR"} 回答1: You need to bind the resultant file-like object to a variable, otherwise the interpreter just dumps it via repr : >>> import urllib2 >>> urllib2.urlopen('http://www.google.com') <addinfourl at 18362520 whose fp = <socket._fileobject object at 0x106b250>> >>> >>> f = urllib2.urlopen('http://www.google.com') >>> f <addinfourl at