urlopen

Python check if website exists

爱⌒轻易说出口 提交于 2019-11-26 12:00:00
问题 I wanted to check if a certain website exists, this is what I\'m doing: user_agent = \'Mozilla/20.0.1 (compatible; MSIE 5.5; Windows NT)\' headers = { \'User-Agent\':user_agent } link = \"http://www.abc.com\" req = urllib2.Request(link, headers = headers) page = urllib2.urlopen(req).read() - ERROR 402 generated here! If the page doesn\'t exist (error 402, or whatever other errors), what can I do in the page = ... line to make sure that the page I\'m reading does exit? 回答1: You can use HEAD

Let JSON object accept bytes or let urlopen output strings

ぃ、小莉子 提交于 2019-11-26 10:16:26
With Python 3 I am requesting a json document from a URL. response = urllib.request.urlopen(request) The response object is a file-like object with read and readline methods. Normally a JSON object can be created with a file opened in text mode. obj = json.load(fp) What I would like to do is: obj = json.load(response) This however does not work as urlopen returns a file object in binary mode. A work around is of course: str_response = response.read().decode('utf-8') obj = json.loads(str_response) but this feels bad... Is there a better way that I can transform a bytes file object to a string

How can I speed up fetching pages with urllib2 in python?

懵懂的女人 提交于 2019-11-26 07:58:55
问题 I have a script that fetches several web pages and parses the info. (An example can be seen at http://bluedevilbooks.com/search/?DEPT=MATH&CLASS=103&SEC=01 ) I ran cProfile on it, and as I assumed, urlopen takes up a lot of time. Is there a way to fetch the pages faster? Or a way to fetch several pages at once? I\'ll do whatever is simplest, as I\'m new to python and web developing. Thanks in advance! :) UPDATE: I have a function called fetchURLs() , which I use to make an array of the URLs I

How to fetch a non-ascii url with Python urlopen?

纵饮孤独 提交于 2019-11-26 04:43:08
问题 I need to fetch data from a URL with non-ascii characters but urllib2.urlopen refuses to open the resource and raises: UnicodeEncodeError: \'ascii\' codec can\'t encode character u\'\\u0131\' in position 26: ordinal not in range(128) I know the URL is not standards compliant but I have no chance to change it. What is the way to access a resource pointed by a URL containing non-ascii characters using Python? edit: In other words, can / how urlopen open a URL like: http://example.org/Ñöñ-ÅŞÇİİ/

Let JSON object accept bytes or let urlopen output strings

落爺英雄遲暮 提交于 2019-11-26 03:28:27
问题 With Python 3 I am requesting a json document from a URL. response = urllib.request.urlopen(request) The response object is a file-like object with read and readline methods. Normally a JSON object can be created with a file opened in text mode. obj = json.load(fp) What I would like to do is: obj = json.load(response) This however does not work as urlopen returns a file object in binary mode. A work around is of course: str_response = response.read().decode(\'utf-8\') obj = json.loads(str

Web-scraping JavaScript page with Python

有些话、适合烂在心里 提交于 2019-11-25 21:41:36
问题 I\'m trying to develop a simple web scraper. I want to extract text without the HTML code. In fact, I achieve this goal, but I have seen that in some pages where JavaScript is loaded I didn\'t obtain good results. For example, if some JavaScript code adds some text, I can\'t see it, because when I call response = urllib2.urlopen(request) I get the original text without the added one (because JavaScript is executed in the client). So, I\'m looking for some ideas to solve this problem. 回答1: