urllib | 易学教程

How to programmatically log into website in Python

阅读更多关于 How to programmatically log into website in Python

问题 I have searched all over the Internet, looking at many examples and have tried every one I've found, yet none of them are working for me, so please don't think this is a duplicate - I need help with my specific case. I'm trying to log into a website using Python (in this instance I'm trying with v2.7 but am not opposed to using a more recent version, it's just I've been able to find the most info on 2.7). I need to fill out a short form, consisting simply of a username and password. The form

Upload images from from web-page

阅读更多关于 Upload images from from web-page

问题 I want to implement a feature similar to this http://www.tineye.com/parse?url=yahoo.com - allow user upload images from any web page. Main problem for me is that it takes too much time for web pages with big number of images. I'm doing this in Django (using curl or urllib) according to the next scheme: Grab html of the page (takes about 1 sec for big pages): file = urllib.urlopen(requested_url) html_string = file.read() Parse it with HTML-parser (BeautifulSoup), looking for img tags, and

Python - Issue Scraping with BeautifulSoup

阅读更多关于 Python - Issue Scraping with BeautifulSoup

问题 I'm trying to scrape the Stack Overflow jobs page using Beautiful Soup 4 and URLLIB as a personal project. I'm facing an issue where I'm trying to scrape all the links to the 50 jobs listed on each page. I'm using a regex to identify these links. Even though I reference the tag properly, I am facing these two specific issues: Instead of the 50 links clearly visible in the source code, I get only 25 results each time as my output(after accounting for an removing an initial irrelevant link)

Python error TypeError: must be string or buffer, not instance [closed]

阅读更多关于 Python error TypeError: must be string or buffer, not instance [closed]

问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 3 years ago . i am trying to download some images which are listed in QListWidget i am passing the links to the urllib but its giving me TypeError: must be string or buffer, not instance this error. I tried looking up here but couldn't find any solution here is my code. Thanks def downloadStuff(self): files = self.listWidget

How can I download and read a URL with universal newlines?

阅读更多关于 How can I download and read a URL with universal newlines?

问题 I was using urllib.urlopen with Python 2.7, but I need to process the downloaded HTML document and its contained newlines (within a <pre> element). The urllib docs indicates urlopen will not use universal newlines. How can I do this? 回答1: Unless the HTML file is already on your disk, urlopen() will handle correctly all formats of newlines ( \n , \r\n and \r ) in the HTML file you want to parse (that is it will convert them to \n ), according to the urllib docs: "If the URL does not have a

How to Google in Python Using urllib or requests

阅读更多关于 How to Google in Python Using urllib or requests

问题 What is the proper way to Google something in Python 3? I have tried requests and urllib for a Google page. When I simply res = requests.get("https://www.google.com/#q=" + query) that doesn't come back with the same HTML as when I inspect the Google page in Safari. The same happens with urllib . A similar thing happens when I use Bing. I am familiar with AJAX . However, it seems that that is now depreciated. 回答1: In python, if you do not specify the user agent header in http requests manually

How to Google in Python Using urllib or requests

阅读更多关于 How to Google in Python Using urllib or requests

what does urllib.request.urlopen() do?

阅读更多关于 what does urllib.request.urlopen() do?

问题 In python 3 does urlopen function from urllib.request module retrieve the target of the URL or just open a connection to the URL as a file handle or have i completely lost it ? I would like to understand how it works. Basically i want to find the time taken to download a file from a URL. how do i go about it ? Here is my code: VERSION 1 import urllib import time start = time.time() with urllib.request.urlopen('http://mirror.hactar.bz/lastsync') as f: lastsync = f.read() #Do i need this line

Google App Engine Ubuntu 14.04 urlfetch 500 / 200 issue (Python 2.7)

阅读更多关于 Google App Engine Ubuntu 14.04 urlfetch 500 / 200 issue (Python 2.7)

问题 I hope this saves somebody some time. Posting because I found very little concerning URLFetch error. I was suddenly receiving "WARNING 2017-06-28 23:09:40,971 urlfetch_stub.py:550] Stripped prohibited headers from URLFetch request: ['Host']" on a working Google Places Application. The update for Google Cloud SDK 161.0.0 was kind enough to inform me that my version of Python was out of date. Ubuntu 14.04 is frozen at Python v. 2.7.6 sudo apt-get install build-essential checkinstall sudo apt

urllib exception http.client.BadStatusLine

阅读更多关于 urllib exception http.client.BadStatusLine

问题 I can't for the life of me figure out why I can't catch this exception. Looking here at this guide. def get_team_names(get_team_id_url, team_id): print(get_team_id_url + team_id) try: response = urllib.request.urlopen(get_team_id_url + team_id) except urllib.error.HTTPError as e: print(e.code) print(e.read()) except urllib.error.URLError as e: print(e.code) print(e.read()) exception: Traceback (most recent call last): File "queue_cleaner_main.py", line 60, in <module> sys.exit(main()) File