urlopen

How do I set cookies using Python urlopen?

好久不见. 提交于 2019-12-20 17:28:07
问题 I am trying to fetch an html site using Python urlopen. I am getting this error: HTTPError: HTTP Error 302: The HTTP server returned a redirect error that would lead to an infinite loop The code: from urllib2 import Request request = Request(url) response = urlopen(request) I understand that the server redirects to another URL and that it is looking for a cookie. How do I set the cookie it is looking for so I can read the html? 回答1: Here's an example from Python documentation, adjusted to

Parsing HTTP Response in Python

半城伤御伤魂 提交于 2019-12-18 10:42:09
问题 I want to manipulate the information at THIS url. I can successfully open it and read its contents. But what I really want to do is throw out all the stuff I don't want, and to manipulate the stuff I want to keep. Is there a way to convert the string into a dict so I can iterate over it? Or do I just have to parse it as is (str type)? from urllib.request import urlopen url = 'http://www.quandl.com/api/v1/datasets/FRED/GDP.json' response = urlopen(url) print(response.read()) # returns string

TypeError: urlopen() got multiple values for keyword argument 'body' while executing tests through Selenium and Python on Kubuntu 14.04

≯℡__Kan透↙ 提交于 2019-12-18 08:01:02
问题 im trying to run a selenium in python on Kubuntu 14.04. I get this error message trying with chromedriver or geckodriver, both same error. Traceback (most recent call last): File "vse.py", line 15, in <module> driver = webdriver.Chrome(chrome_options=options, executable_path=r'/root/Desktop/chromedriver') File "/usr/local/lib/python3.4/dist-packages/selenium/webdriver/chrome/webdriver.py", line 75, in __init__ desired_capabilities=desired_capabilities) File "/usr/local/lib/python3.4/dist

Tell urllib2 to use custom DNS

六月ゝ 毕业季﹏ 提交于 2019-12-17 07:15:18
问题 I'd like to tell urllib2.urlopen (or a custom opener ) to use 127.0.0.1 (or ::1 ) to resolve addresses. I wouldn't change my /etc/resolv.conf , however. One possible solution is to use a tool like dnspython to query addresses and httplib to build a custom url opener. I'd prefer telling urlopen to use a custom nameserver though. Any suggestions? 回答1: Looks like name resolution is ultimately handled by socket.create_connection . -> urllib2.urlopen -> httplib.HTTPConnection -> socket.create

Tell urllib2 to use custom DNS

会有一股神秘感。 提交于 2019-12-17 07:15:02
问题 I'd like to tell urllib2.urlopen (or a custom opener ) to use 127.0.0.1 (or ::1 ) to resolve addresses. I wouldn't change my /etc/resolv.conf , however. One possible solution is to use a tool like dnspython to query addresses and httplib to build a custom url opener. I'd prefer telling urlopen to use a custom nameserver though. Any suggestions? 回答1: Looks like name resolution is ultimately handled by socket.create_connection . -> urllib2.urlopen -> httplib.HTTPConnection -> socket.create

how to fix beautifulsoup ssl CERTIFICATE_VERIFY_FAILED error

老子叫甜甜 提交于 2019-12-13 09:49:34
问题 Code: import requests from bs4 import BeautifulSoup from urllib.request import Request, urlopen html = urlopen("https://www.familyeducation.com/baby-names/browse-origin/surname/german") soup = BeautifulSoup(html) metadata=soup.find_all('meta') Error: urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] 回答1: For this error check out this answer: urllib and "SSL: CERTIFICATE_VERIFY_FAILED" Error But you don't need urlopen for html request always. You can also send the request through requests lib.

Login to a website using python

余生长醉 提交于 2019-12-12 21:56:13
问题 I am trying to login to this page using Python.Here is my code from urllib2 import urlopen from bs4 import BeautifulSoup import requests import sys URL= 'http://coe2.annauniv.edu/result/index.php' soup = BeautifulSoup(urlopen(URL)) for hit in soup.findAll(attrs={'class' : 's2'}): print hit.contents[0].strip() RegisterNumber = raw_input("enter the register number") DateofBirth = raw_input("enter the date of birth [DD-MM-YYYY]") login_input = raw_input("enter the what is()?") def main(): #

Why does text retrieved from pages sometimes look like gibberish?

你。 提交于 2019-12-12 17:42:05
问题 I'm using urllib and urllib2 in Python to open and read webpages but sometimes, the text I get is unreadable. For example, if I run this: import urllib text = urllib.urlopen('http://tagger.steve.museum/steve/object/141913').read() print text I get some unreadable text. I've read these posts: Gibberish from urlopen Does python urllib2 automatically uncompress gzip data fetched from webpage? but can't seem to find my answer. Thank you in advance for your help! UPDATE: I fixed the problem by

Python urllib freezes with specific URL

你离开我真会死。 提交于 2019-12-12 14:21:21
问题 I am trying to fetch a page and urlopen hangs and never returns anything, although the web page is very light and can be opened with any browser without any problems import urllib.request with urllib.request.urlopen("http://www.planalto.gov.br/ccivil_03/_Ato2007-2010/2008/Lei/L11882.htm") as response: print(response.read()) This simple code just freezes while retrieving the response, but if you try to open http://www.planalto.gov.br/ccivil_03/_Ato2007-2010/2008/Lei/L11882.htm it opens without

python urllib2.urlopen(url) process block

[亡魂溺海] 提交于 2019-12-12 09:27:22
问题 I am using urllib2.urlopen() and my process is getting blocked I am aware that urllib2.urlopen() has default timeout. How to make the call unblockable? The backtrace is (gdb) bt #0 0x0000003c6200dc35 in recv () from /lib64/libpthread.so.0 #1 0x00002b88add08137 in ?? () from /usr/lib64/python2.6/lib-dynload/_socketmodule.so #2 0x00002b88add0830e in ?? () from /usr/lib64/python2.6/lib-dynload/_socketmodule.so #3 0x000000310b2d8e19 in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0 回答1