urllib2 | 易学教程

urllib2.HTTPError: HTTP Error 400: Bad Request - Python

阅读更多关于 urllib2.HTTPError: HTTP Error 400: Bad Request - Python

问题 I'm trying to POST using urllib and urllib2 but it keeps giving me this error Traceback (most recent call last): File "/Users/BaDRaN/Desktop/untitled text.py", line 39, in <module> response = urllib2.urlopen(request) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 127, in urlopen return _opener.open(url, data, timeout) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 410, in open response = meth(req,

Error - urlopen error [Errno 8] _ssl.c:504: EOF occurred in violation of protocol

阅读更多关于 Error - urlopen error [Errno 8] _ssl.c:504: EOF occurred in violation of protocol

问题 My aim is to extract the html from all the links in the first page after entering the google search term. I work behind a proxy so this is my approach. 1.I first used mechanize to enter the search term in the form , ive set the proxies and robots correctly. 2.After extracting the links , Ive used an opener using urllib2.ProxyHandler globally , to open the urls individually. However this gives me this error. Not able to figure it out. urlopen error [Errno 8] _ssl.c:504: EOF occurred in

Need to install urllib2 for Python 3.5.1

阅读更多关于 Need to install urllib2 for Python 3.5.1

问题 I'm running Python 3.5.1 for Mac. I want to use urllib2. I tried installing that but I'm told that it's been split into urllib.request and urllib.error for Python 3. My command (running from the framework bin directory for now because it's not in my path): sudo ./pip3 install urllib.request Returns: Could not find a version that satisfies the requirement urllib.request (from versions: ) No matching distribution found for urllib.request I got the same error before when I tried to install

Python and urllib

阅读更多关于 Python and urllib

问题 I'm trying to download a zip file ("tl_2008_01001_edges.zip") from an ftp census site using urllib. What form is the zip file in when I get it and how do I save it? I'm fairly new to Python and don't understand how urllib works. This is my attempt: import urllib, sys zip_file = urllib.urlretrieve("ftp://ftp2.census.gov/geo/tiger/TIGER2008/01_ALABAMA/Autauga_County/", "tl_2008_01001_edges.zip") If I know the list of ftp folders (or counties in this case), can I run through the ftp site list

python urllib2 - wait for page to finish loading/redirecting before scraping?

阅读更多关于 python urllib2 - wait for page to finish loading/redirecting before scraping?

问题 I'm learning to make web scrapers and want to scrape TripAdvisor for a personal project, grabbing the html using urllib2. However, I'm running into a problem where, using the code below, the html I get back is not correct as the page seems to take a second to redirect (you can verify this by visiting the url) - instead I get the code from the page that initially briefly appears. Is there some behavior or parameter to set to make sure the page has completely finished loading/redirecting before

爬虫之urllib2库的基本使用

阅读更多关于爬虫之urllib2库的基本使用

urllib2库的基本使用所谓网页抓取，就是把URL地址中指定的网络资源从网络流中读取出来，保存到本地。在Python中有很多库可以用来抓取网页，我们先学习 urllib2 。 urllib2 是 Python2.7 自带的模块(不需要下载，导入即可使用) urllib2 官方文档： https://docs.python.org/2/library/urllib2.html urllib2 源码： https://hg.python.org/cpython/file/2.7/Lib/urllib2.py urllib2 在 python3.x 中被改为 urllib.request urlopen 我们先来段代码： # urllib2_urlopen.py # 导入urllib2 库 import urllib2 # 向指定的url发送请求，并返回服务器响应的类文件对象 response = urllib2.urlopen("http://www.baidu.com") # 类文件对象支持文件对象的操作方法，如read()方法读取文件全部内容，返回字符串 html = response.read() # 打印字符串 print html 执行写的python代码，将打印结果 Power@PowerMac ~$: python urllib2_urlopen.py 实际上

opening websites using urllib2 from behind corporate firewall - 11004 getaddrinfo failed

阅读更多关于 opening websites using urllib2 from behind corporate firewall - 11004 getaddrinfo failed

问题 I am trying to access a website from behind corporate firewall using below:- password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm() password_mgr.add_password(None, url, username, password) auth_handler = urllib2.HTTPBasicAuthHandler(password_mgr) opener = urllib2.build_opener(auth_handler) urllib2.install_opener(opener) conn = urllib2.urlopen('http://python.org') Getting error URLError: <urlopen error [Errno 11004] getaddrinfo failed> I have tried with different handlers (tried ProxyHandler

Python handling socket.error: [Errno 104] Connection reset by peer

阅读更多关于 Python handling socket.error: [Errno 104] Connection reset by peer

问题 When using Python 2.7 with urllib2 to retrieve data from an API, I get the error [Errno 104] Connection reset by peer . Whats causing the error, and how should the error be handled so that the script does not crash? ticker.py def urlopen(url): response = None request = urllib2.Request(url=url) try: response = urllib2.urlopen(request).read() except urllib2.HTTPError as err: print "HTTPError: {} ({})".format(url, err.code) except urllib2.URLError as err: print "URLError: {} ({})".format(url,

python爬取准备三 urllib2模块

阅读更多关于 python爬取准备三 urllib2模块

urllib/urllib2默认的User-Agent是Python-urllib/2.7,容易被检查到是爬虫，所以我们要构造一个请求对象，要用到request方法。 1.查看Header信息 2.设置User-Agent模仿浏览器访问数据 Request总共三个参数，除了必须要有url参数，还有下面两个： data（默认空）：是伴随 url 提交的数据（比如要post的数据），同时 HTTP 请求将从 "GET"方式改为 "POST"方式。 headers（默认空）：是一个字典，包含了需要发送的HTTP报头的键值对 # _*_ coding:utf-8 _*_ import urllib2 # User-Agent是爬虫与反爬虫的第一步 ua_headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36'} # 通过urllib2.Request()方法构造一个请求对象 request = urllib2.Request('http://www.baidu.com/',headers=ua_headers) #向指定的url地址发送请求

Reading Sepcific CSV column Value and passing it to CURL/URLLib as param in Python

阅读更多关于 Reading Sepcific CSV column Value and passing it to CURL/URLLib as param in Python

问题 Hi All I have a scenario to handle through Python Scripts that requires reading a value from CSV file and using it in a CURL command (equivalent in python URLLIB) as follows: Here is the CSV file for eg: one two three four 1 3 5 7 2 3 5 7 I wrote a very simple Python program as : import csv with open('Activity_PSR.csv','rU') as csvFile: reader=csv.reader(csvFile,delimiter=',') for row in reader: print row[2] csvFile.close() Once I get the output of this column ,I want to store it in a