urllib2

kaggle kernels: urllib.request.urlopen not working for any url

眉间皱痕 提交于 2020-02-04 02:57:41
问题 What is the best way to handle fetching of list of urls in kaggle kernels ? I tried first testing with google.com . First Method : Using urllib.request import urllib.request resp = urllib.request.urlopen('http://www.google.com') This lead to error of gai and urlopen error [Errno -2] Name or service not known Second Method : Using requests import requests resp = requests.get('http://www.google.com') This lead to error gaierror: [Errno -3] Temporary failure in name resolution and Failed to

kaggle kernels: urllib.request.urlopen not working for any url

倖福魔咒の 提交于 2020-02-04 02:57:14
问题 What is the best way to handle fetching of list of urls in kaggle kernels ? I tried first testing with google.com . First Method : Using urllib.request import urllib.request resp = urllib.request.urlopen('http://www.google.com') This lead to error of gai and urlopen error [Errno -2] Name or service not known Second Method : Using requests import requests resp = requests.get('http://www.google.com') This lead to error gaierror: [Errno -3] Temporary failure in name resolution and Failed to

How to make api call that requires login in web2py?

左心房为你撑大大i 提交于 2020-02-02 14:52:47
问题 I want to access APIs from application. Those APIs has decorator @auth.requires_login() . I am calling api from controller using demo_app/controllers/plugin_task/task url = request.env.http_origin + URL('api', 'bind_task') page = urllib2.Request(url) page.add_header('cookie', request.env.http_cookie) response = urllib2.urlopen(page) Demo API api.py @auth.requires_login() @request.restful() def bind_task(): response.view = 'generic.json' return dict(GET=_bind_task) def _bind_task(**get_params)

How to set TCP_NODELAY flag when loading URL with urllib2?

半腔热情 提交于 2020-01-30 04:09:37
问题 I am using urllib2 for loading web-page, my code is: httpRequest = urllib2.Request("http:/www....com") pageContent = urllib2.urlopen(httpRequest) pageContent.readline() How can I get hold of the socket properties to set TCP_NODELAY ? In normal socket I would be using function: socket.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1) 回答1: If you need to access to such low level property on the socket used, you'll have to overload some objects. First, you'll need to create a subclass of

[Python]网络爬虫(二):利用urllib2通过指定的URL抓取网页内容

僤鯓⒐⒋嵵緔 提交于 2020-01-25 22:13:03
转自: http://blog.csdn.net/pleasecallmewhy/article/details/8923067 所谓网页抓取,就是把URL地址中指定的网络资源从网络流中读取出来,保存到本地。 类似于使用程序模拟IE浏览器的功能,把URL作为HTTP请求的内容发送到服务器端, 然后读取服务器端的响应资源。 在Python中,我们使用urllib2这个组件来抓取网页。 urllib2是Python的一个获取URLs(Uniform Resource Locators)的组件。 它以urlopen函数的形式提供了一个非常简单的接口。 最简单的urllib2的应用代码只需要四行。 我们新建一个文件urllib2_test01.py来感受一下urllib2的作用: import urllib2 response = urllib2.urlopen('http://www.baidu.com/') html = response.read() print html 按下F5可以看到运行的结果: 我们可以打开百度主页,右击,选择查看源代码(火狐OR谷歌浏览器均可),会发现也是完全一样的内容。 也就是说,上面这四行代码将我们访问百度时浏览器收到的代码们全部打印了出来。 这就是一个最简单的urllib2的例子。 除了"http:",URL同样可以使用"ftp:","file:

Urllib2 using Tor and socks in python

孤街浪徒 提交于 2020-01-25 18:00:32
问题 I'm trying to crawl websites in Python using tor. I tried below code, which gives the IP used by tor, trying this code for 2-3 times gives me different IP's from different countries. I want IP's from specific country eg India . Can we do it using tor and socks? import socks import socket import urllib2 socks.setdefaultproxy(socks.PROXY_TYPE_HTTP, "127.0.0.1", 9050) socket.socket = socks.socksocket print urllib2.urlopen('http://my-ip.herokuapp.com').read() 回答1: To get ip from specific country

Urllib2 using Tor and socks in python

早过忘川 提交于 2020-01-25 18:00:14
问题 I'm trying to crawl websites in Python using tor. I tried below code, which gives the IP used by tor, trying this code for 2-3 times gives me different IP's from different countries. I want IP's from specific country eg India . Can we do it using tor and socks? import socks import socket import urllib2 socks.setdefaultproxy(socks.PROXY_TYPE_HTTP, "127.0.0.1", 9050) socket.socket = socks.socksocket print urllib2.urlopen('http://my-ip.herokuapp.com').read() 回答1: To get ip from specific country

python爬虫教程代码示例经典例子菜鸟怎么学

丶灬走出姿态 提交于 2020-01-25 15:56:05
实例3–股票数据定向爬虫 程序结构如下: 1.先从网站中获取股票代号列表(requests库,re库) 2.遍历每一只股票,从股票信息网站中获得详细信息 3.使用字典的数据结构,写入文本文件中 更多的内容学习 点我 以下为代码: 1 # 股票数据定向爬虫 2 """ 3 Created on Thu Oct 12 16:12:48 2017 4 5 @author: DONG LONG RUI 6 """ 7 import requests 8 from bs4 import BeautifulSoup 9 import re 10 #import traceback 11 12 def getHTMLText(url,code='utf-8'):#参数code缺省值为‘utf-8’(编码方式) 13 try: 14 r=requests.get(url,timeout=30) 15 r.raise_for_status() 16 #r.encoding=r.apparent_encoding 17 r.encoding=code 18 return r.text 19 except: 20 return '' 21 22 def getStockList(lst,stockURL): 23 html=getHTMLText(stockURL,'GB2312') 24 soup

Python - Don't follow redirect on one URL only

纵然是瞬间 提交于 2020-01-23 11:13:28
问题 I'm wondering how you can prevent urllib2 from following a redirect request on my chosen url. I found this snippet of code while browsing but it seems it works globally and I only want it to disable redirect on a certain url: import urllib2 class RedirectHandler(urllib2.HTTPRedirectHandler): def http_error_302(self, req, fp, code, msg, headers): result = urllib2.HTTPError(req.get_full_url(), code, msg, headers, fp) result.status = code return result http_error_301 = http_error_303 = http

HTTPS request via urllib2 fails behind NTLM proxy

空扰寡人 提交于 2020-01-22 03:12:08
问题 Via Python's urllib2 I try to get data over HTTPS while I am behind a corporate NTLM proxy. I run proxy_url = ('http://user:pw@ntlmproxy:port/') proxy_handler = urllib2.ProxyHandler({'http': proxy_url}) opener = urllib2.build_opener(proxy_handler, urllib2.HTTPHandler) urllib2.install_opener(opener) f = urllib2.urlopen('https://httpbin.org/ip') myfile = f.read() print myfile but I get as error urllib2.URLError: <urlopen error [Errno 8] _ssl.c:507: EOF occurred in violation of protocol> How can