urlopen | 易学教程

Python check if website exists

阅读更多关于 Python check if website exists

问题 I wanted to check if a certain website exists, this is what I\'m doing: user_agent = \'Mozilla/20.0.1 (compatible; MSIE 5.5; Windows NT)\' headers = { \'User-Agent\':user_agent } link = \"http://www.abc.com\" req = urllib2.Request(link, headers = headers) page = urllib2.urlopen(req).read() - ERROR 402 generated here! If the page doesn\'t exist (error 402, or whatever other errors), what can I do in the page = ... line to make sure that the page I\'m reading does exit? 回答1: You can use HEAD

Let JSON object accept bytes or let urlopen output strings

阅读更多关于 Let JSON object accept bytes or let urlopen output strings

With Python 3 I am requesting a json document from a URL. response = urllib.request.urlopen(request) The response object is a file-like object with read and readline methods. Normally a JSON object can be created with a file opened in text mode. obj = json.load(fp) What I would like to do is: obj = json.load(response) This however does not work as urlopen returns a file object in binary mode. A work around is of course: str_response = response.read().decode('utf-8') obj = json.loads(str_response) but this feels bad... Is there a better way that I can transform a bytes file object to a string

How can I speed up fetching pages with urllib2 in python?

阅读更多关于 How can I speed up fetching pages with urllib2 in python?

问题 I have a script that fetches several web pages and parses the info. (An example can be seen at http://bluedevilbooks.com/search/?DEPT=MATH&CLASS=103&SEC=01 ) I ran cProfile on it, and as I assumed, urlopen takes up a lot of time. Is there a way to fetch the pages faster? Or a way to fetch several pages at once? I\'ll do whatever is simplest, as I\'m new to python and web developing. Thanks in advance! :) UPDATE: I have a function called fetchURLs() , which I use to make an array of the URLs I

How to fetch a non-ascii url with Python urlopen?

阅读更多关于 How to fetch a non-ascii url with Python urlopen?

问题 I need to fetch data from a URL with non-ascii characters but urllib2.urlopen refuses to open the resource and raises: UnicodeEncodeError: \'ascii\' codec can\'t encode character u\'\\u0131\' in position 26: ordinal not in range(128) I know the URL is not standards compliant but I have no chance to change it. What is the way to access a resource pointed by a URL containing non-ascii characters using Python? edit: In other words, can / how urlopen open a URL like: http://example.org/Ñöñ-ÅŞÇİİ/

Let JSON object accept bytes or let urlopen output strings

阅读更多关于 Let JSON object accept bytes or let urlopen output strings

问题 With Python 3 I am requesting a json document from a URL. response = urllib.request.urlopen(request) The response object is a file-like object with read and readline methods. Normally a JSON object can be created with a file opened in text mode. obj = json.load(fp) What I would like to do is: obj = json.load(response) This however does not work as urlopen returns a file object in binary mode. A work around is of course: str_response = response.read().decode(\'utf-8\') obj = json.loads(str

Web-scraping JavaScript page with Python

阅读更多关于 Web-scraping JavaScript page with Python

问题 I\'m trying to develop a simple web scraper. I want to extract text without the HTML code. In fact, I achieve this goal, but I have seen that in some pages where JavaScript is loaded I didn\'t obtain good results. For example, if some JavaScript code adds some text, I can\'t see it, because when I call response = urllib2.urlopen(request) I get the original text without the added one (because JavaScript is executed in the client). So, I\'m looking for some ideas to solve this problem. 回答1: