urllib2 | 易学教程

Google App Engine Ubuntu 14.04 urlfetch 500 / 200 issue (Python 2.7)

阅读更多关于 Google App Engine Ubuntu 14.04 urlfetch 500 / 200 issue (Python 2.7)

问题 I hope this saves somebody some time. Posting because I found very little concerning URLFetch error. I was suddenly receiving "WARNING 2017-06-28 23:09:40,971 urlfetch_stub.py:550] Stripped prohibited headers from URLFetch request: ['Host']" on a working Google Places Application. The update for Google Cloud SDK 161.0.0 was kind enough to inform me that my version of Python was out of date. Ubuntu 14.04 is frozen at Python v. 2.7.6 sudo apt-get install build-essential checkinstall sudo apt

urllib2 data sending

阅读更多关于 urllib2 data sending

问题 I've recently written this with help from SO. Now could someone please tell me how to make it actually log onto the board. It brings up everything just in a non logged in format. import urllib2, re import urllib, re logindata = urllib.urlencode({'username': 'x', 'password': 'y'}) page = urllib2.urlopen("http://www.woarl.com/board/index.php", logindata) pagesource = page.read() print pagesource 回答1: Someone recently asked the same question you're asking. If you read through the answers to that

Using certifi module with urllib2?

阅读更多关于 Using certifi module with urllib2?

问题 I'm having trouble downloading https pages with the urllib2 module, which seems to result from urllib2's inability to access the system's certificate store. To get around this issue, one possible solution is to download https web pages with pycurl, by using the certifi module. The following is an example of doing so: def download_web_page_with_curl(url_website): from pycurl import Curl, CAINFO, URL from certifi import where from cStringIO import StringIO response = StringIO() curl = Curl()

Python urllib urlretrieve behind proxy

阅读更多关于 Python urllib urlretrieve behind proxy

问题 I looked into the documentation of urllib but all I could find on proxies was related to urlopen. However, I want to download a PDF from a given URL and store it locally but using a certain proxy server. My approach so far which did not work: import urllib2 proxies = {'http': 'http://123.96.220.2:81'} opener = urllib2.FancyURLopener(proxies) download = opener.urlretrieve(URL, file_name) The error is AttributeError: FancyURLopener instance has no attribute 'urlretrieve' 回答1: I beleive you can

利用Python抓取和解析网页

阅读更多关于利用Python抓取和解析网页

　　【 IT168 技术专稿】对搜索引擎、文件索引、文档转换、数据检索、站点备份或迁移等应用程序来说，经常用到对网页(即HTML文件)的解析处理。事实上，通过Python语言提供的各种模块，我们无需借助Web 服务器或者Web浏览器就能够解析和处理HTML文档。本文将详细介绍如何利用Python抓取和解析网页。首先，我们介绍一个可以帮助简化打开位于本地和Web上的HTML文档的Python模块，然后，我们论述如何使用Python模块来迅速解析在HTML文件中的数据，从而处理特定的内容，如链接、图像和Cookie等。最后，我们会给出一个规整HTML文件的格式标签的例子，通过这个例子您会发现使用python处理HTML文件的内容是非常简单的一件事情。　　一、解析URL 　　通过Python所带的urlparse模块，我们能够轻松地把URL分解成元件，之后，还能将这些元件重新组装成一个URL。当我们处理HTML 文档的时候，这项功能是非常方便的。　　 import urlparse 　　parsedTuple = urlparse.urlparse( 　　 " http://www.google.com/search? 　　hl = en & q = urlparse & btnG = Google + Search " ) 　　unparsedURL = urlparse

Unable to save image from web using urllib2

阅读更多关于 Unable to save image from web using urllib2

问题 I want to save some images from a website using python urllib2 but when I run the code it saves something else. This is my code: user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' headers = { 'User-Agent' : user_agent } url = "http://m.jaaar.com/" r = urllib2.Request(url, headers=headers) page = urllib2.urlopen(r).read() soup = BeautifulSoup(page) imgTags = soup.findAll('img') imgTags = imgTags[1:] for imgTag in imgTags: imgUrl = "http://www.jaaar.com" + imgTag['src'] imgUrl =

Python urllib2.urlopen freezes script infinitely even though timeout is set

阅读更多关于 Python urllib2.urlopen freezes script infinitely even though timeout is set

问题 The function urllib2.urlopen freezes. So my question is simple: Why does urlopen freeze my script for ever even though timeout is set? How can I access data at an URL (in this case: http://api.own3d.tv/live?channel=FnaticTV) without the possibility of my Python process freezing up for all eternity? This is the part where is freezes (In own3d.py): # Try three times to make contact while True: try: # Connect to API # Right here! It freezes here connection = urllib2.urlopen(request, timeout=10)

Python : Problems getting past the login page of an .aspx site

阅读更多关于 Python : Problems getting past the login page of an .aspx site

问题 Problem: I have searched several websites/blogs/etc to find a solution but did not get to what I was looking for. The problem in short, is that I would like to scrape a site - but to get to that site - I have to get past the login page. What I did: I did manage to use urllib2 and httplib to open the page, but even after logging in (no errors being displayed) the redirection of the login page as shown in the browser does not happen. My code was not too different than what was displayed here:

Load web page in python AFTER JavaScripts executes

阅读更多关于 Load web page in python AFTER JavaScripts executes

问题 I am trting to get the definition of words in spanish (like a dictionary) based on what the user inputs. The idea would be: >>> hola '1. interj. U. como salutación familiar.' I first tried with urllib2, but since the definition appeared after the execution of JS (makes sense duh) it didn't work. I also tried selenium, but from what I understood it has to open a navigator window, right? I need it to be like urllib2, invisible. If you want to try, the page where I search the definition is http:

Big requests issue: GET doesnt release/reset TCP connections, loop crashes

阅读更多关于 Big requests issue: GET doesnt release/reset TCP connections, loop crashes

问题 im using python3.3 and the requests module to scrape links from an arbitrary webpage. My program works as follows: I have a list of urls which in the beginning has just the starting url in it. The program loops over that list and gives the urls to a procedure GetLinks, where im using requests.get and Beautifulsoup to extract all links. Before that procedure appends links to my urllist it gives them to another procedure testLinks to see whether its an internal, external or broken link. In the