urlopen

TypeError: urlopen() got multiple values for keyword argument 'body' while executing tests through Selenium and Python on Kubuntu 14.04

[亡魂溺海] 提交于 2019-11-29 13:45:45
im trying to run a selenium in python on Kubuntu 14.04. I get this error message trying with chromedriver or geckodriver, both same error. Traceback (most recent call last): File "vse.py", line 15, in <module> driver = webdriver.Chrome(chrome_options=options, executable_path=r'/root/Desktop/chromedriver') File "/usr/local/lib/python3.4/dist-packages/selenium/webdriver/chrome/webdriver.py", line 75, in __init__ desired_capabilities=desired_capabilities) File "/usr/local/lib/python3.4/dist-packages/selenium/webdriver/remote/webdriver.py", line 156, in __init__ self.start_session(capabilities,

Does urllib2.urlopen() cache stuff?

删除回忆录丶 提交于 2019-11-29 06:00:42
They didn't mention this in python documentation. And recently I'm testing a website simply refreshing the site using urllib2.urlopen() to extract certain content, I notice sometimes when I update the site urllib2.urlopen() seems not get the newly added content. So I wonder it does cache stuff somewhere, right? So I wonder it does cache stuff somewhere, right? It doesn't. If you don't see new data, this could have many reasons. Most bigger web services use server-side caching for performance reasons, for example using caching proxies like Varnish and Squid or application-level caching. If the

timeout for urllib2.urlopen() in pre Python 2.6 versions

眉间皱痕 提交于 2019-11-29 01:45:50
问题 The urllib2 documentation says that timeout parameter was added in Python 2.6. Unfortunately my code base has been running on Python 2.5 and 2.4 platforms. Is there any alternate way to simulate the timeout? All I want to do is allow the code to talk the remote server for a fixed amount of time. Perhaps any alternative built-in library? (Don't want install 3rd party, like pycurl) 回答1: you can set a global timeout for all socket operations (including HTTP requests) by using: socket

Does urllib2.urlopen() cache stuff?

不问归期 提交于 2019-11-27 23:31:27
问题 They didn't mention this in python documentation. And recently I'm testing a website simply refreshing the site using urllib2.urlopen() to extract certain content, I notice sometimes when I update the site urllib2.urlopen() seems not get the newly added content. So I wonder it does cache stuff somewhere, right? 回答1: So I wonder it does cache stuff somewhere, right? It doesn't. If you don't see new data, this could have many reasons. Most bigger web services use server-side caching for

how to deal with ® in url for urllib2.urlopen?

╄→尐↘猪︶ㄣ 提交于 2019-11-27 15:54:27
I received a url: https://www.packtpub.com/virtualization-and-cloud/citrix-xenapp ®-75-desktop-virtualization-solutions; it is from BeautifulSoup. url=u'https://www.packtpub.com/virtualization-and-cloud/citrix-xenapp\xae-75-desktop-virtualization-solutions' I want to feed back into urllib2.urlopen again. import urllib2 source = urllib2.urlopen(url).read() The error I get: UnicodeEncodeError: 'gbk' codec can't encode character u'\xae' in position 43: illegal multibyte sequence Thus, I tried: source = urllib2.urlopen(url.encode("utf-8")).read() It got page source, however it is different from

Python check if website exists

我是研究僧i 提交于 2019-11-27 06:33:24
I wanted to check if a certain website exists, this is what I'm doing: user_agent = 'Mozilla/20.0.1 (compatible; MSIE 5.5; Windows NT)' headers = { 'User-Agent':user_agent } link = "http://www.abc.com" req = urllib2.Request(link, headers = headers) page = urllib2.urlopen(req).read() - ERROR 402 generated here! If the page doesn't exist (error 402, or whatever other errors), what can I do in the page = ... line to make sure that the page I'm reading does exit? You can use HEAD request instead of GET. It will only download the header, but not the content. Then you can check the response status

Tell urllib2 to use custom DNS

99封情书 提交于 2019-11-27 03:36:20
I'd like to tell urllib2.urlopen (or a custom opener ) to use 127.0.0.1 (or ::1 ) to resolve addresses. I wouldn't change my /etc/resolv.conf , however. One possible solution is to use a tool like dnspython to query addresses and httplib to build a custom url opener. I'd prefer telling urlopen to use a custom nameserver though. Any suggestions? Looks like name resolution is ultimately handled by socket.create_connection . -> urllib2.urlopen -> httplib.HTTPConnection -> socket.create_connection Though once the "Host:" header has been set, you can resolve the host and pass on the IP address

How can I speed up fetching pages with urllib2 in python?

爷,独闯天下 提交于 2019-11-26 21:35:48
I have a script that fetches several web pages and parses the info. (An example can be seen at http://bluedevilbooks.com/search/?DEPT=MATH&CLASS=103&SEC=01 ) I ran cProfile on it, and as I assumed, urlopen takes up a lot of time. Is there a way to fetch the pages faster? Or a way to fetch several pages at once? I'll do whatever is simplest, as I'm new to python and web developing. Thanks in advance! :) UPDATE: I have a function called fetchURLs() , which I use to make an array of the URLs I need so something like urls = fetchURLS() .The URLS are all XML files from Amazon and eBay APIs (which

how to deal with ® in url for urllib2.urlopen?

杀马特。学长 韩版系。学妹 提交于 2019-11-26 17:25:21
问题 I received a url: https://www.packtpub.com/virtualization-and-cloud/citrix-xenapp®-75-desktop-virtualization-solutions; it is from BeautifulSoup. url=u'https://www.packtpub.com/virtualization-and-cloud/citrix-xenapp\xae-75-desktop-virtualization-solutions' I want to feed back into urllib2.urlopen again. import urllib2 source = urllib2.urlopen(url).read() The error I get: UnicodeEncodeError: 'gbk' codec can't encode character u'\xae' in position 43: illegal multibyte sequence Thus, I tried:

How to fetch a non-ascii url with Python urlopen?

谁说胖子不能爱 提交于 2019-11-26 16:09:54
I need to fetch data from a URL with non-ascii characters but urllib2.urlopen refuses to open the resource and raises: UnicodeEncodeError: 'ascii' codec can't encode character u'\u0131' in position 26: ordinal not in range(128) I know the URL is not standards compliant but I have no chance to change it. What is the way to access a resource pointed by a URL containing non-ascii characters using Python? edit: In other words, can / how urlopen open a URL like: http://example.org/Ñöñ-ÅŞÇİİ/ Strictly speaking URIs can't contain non-ASCII characters; what you have there is an IRI . To convert an IRI