urllib2

urllib2模块的基本使用(四)

爷,独闯天下 提交于 2020-01-03 04:24:24
urllib2库的基本使用 所谓网页抓取,就是把URL地址中指定的网络资源从网络流中读取出来,保存到本地。 在Python中有很多库可以用来抓取网页,我们先学习 urllib2 。 urllib2 是 Python2.7 自带的模块(不需要下载,导入即可使用) urllib2 官方文档:https://docs.python.org/2/library/urllib2.html urllib2 源码:https://hg.python.org/cpython/file/2.7/Lib/urllib2.py urllib2 在 python3.x 中被改为 urllib.request urlopen d 我们先来段代码: # urllib2_urlopen.py # 导入urllib2 库 import urllib2 # 向指定的url发送请求,并返回服务器响应的类文件对象 response = urllib2.urlopen("http://www.baidu.com") # 类文件对象支持 文件对象的操作方法,如read()方法读取文件全部内容,返回字符串 html = response.read() # 打印字符串 print html 执行写的python代码,将打印结果 Power@PowerMac ~$: python urllib2_urlopen.py 实际上

unable to send data using urllib and urllib2 (python)

吃可爱长大的小学妹 提交于 2020-01-03 02:01:04
问题 Hello everybody (first post here). I am trying to send data to a webpage. This webpage request two fields (a file and an e-mail address) if everything is ok the webpage returns a page saying "everything is ok" and sends a file to the provided e-mail address. I execute the code below and I get nothing in my e-mail account. import urllib, urllib2 params = urllib.urlencode({'uploaded': open('file'),'email': 'user@domain.com'}) req = urllib2.urlopen('http://webpage.com', params) print req.read()

UnicodeEncodeError in urllib2

≡放荡痞女 提交于 2020-01-02 20:42:20
问题 I met UnicodeEncodeError while crawling Wikipedia dump json file. Here are my code snippet and the error message. It seems like the character 'é' cause this problem. However, I do not know how to solve this issue. import urllib2 import json # List of philosopher's name: mergel list # print mergel i = 0 for name in mergel: # Use the API to get the page content in a format that we like. # https://en.wikipedia.org/w/api.php?action=query&titles=Spider-Man&prop=revisions&rvprop=content&format=json

Python request api is not fetching data inside table bodies

橙三吉。 提交于 2020-01-02 18:06:49
问题 I am trying to scrap a webpage to get table values from text data returned from requests response. </thead> <tbody class="stats"></tbody> <tbody class="annotation"></tbody> </table> </div> Actually there is some data present inside tbody classes but `I am unable to access that data using requests. Here is my code server = "http://www.ebi.ac.uk/QuickGO/GProtein" header = {'User-agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5'} payloads = {'ac':

urllib2.urlopen fails in Django

人走茶凉 提交于 2020-01-02 12:47:13
问题 I use urllib2.urlopen(url) to get HTML content. The URL is http://127.0.0.1:8000/m.html/ . This method succeeds in getting the HTML content. But in Django, if I try to get the HTML content, it stops in the function: urllib2.urlopen('http://127.0.0.1:8000/m.html/'). It just stops. It does not report an error and the server also stops. I don't know why it works in a single file, but has problems in Django. 回答1: The Django development server is single-threaded. It can't both serve the view that

python urllib2 and unicode

六月ゝ 毕业季﹏ 提交于 2020-01-02 10:15:13
问题 I would like to collect information from the results given by a search engine. But I can only write text instead of unicode in the query part. import urllib2 a = "바둑" a = a.decode("utf-8") type(a) #Out[35]: unicode url = "http://search.naver.com/search.naver?where=nexearch&query=%s" %(a) url2 = urllib2.urlopen(url) give this error #UnicodeEncodeError: 'ascii' codec can't encode characters in position 39-40: ordinal not in range(128) 回答1: Encode the Unicode data to UTF-8, then URL-encode: from

Python_爬虫1

喜你入骨 提交于 2020-01-02 05:25:53
Urllib库的基本使用 那么接下来,小伙伴们就一起和我真正迈向我们的爬虫之路吧。 1.分分钟扒一个网页下来 怎样扒网页呢?其实就是根据URL来获取它的网页信息,虽然我们在浏览器中看到的是一幅幅优美的画面,但是其实是由浏览器解释才呈现出来的,实质它 是一段HTML代码,加 JS、CSS,如果把网页比作一个人,那么HTML便是他的骨架,JS便是他的肌肉,CSS便是它的衣服。所以最重要的部分是存在于HTML中的,下面我 们就写个例子来扒一个网页下来。 import urllib2 response = urllib2.urlopen("http://www.baidu.com") print response.read() 是的你没看错,真正的程序就两行,把它保存成 demo.py,进入该文件的目录,执行如下命令查看运行结果,感受一下。 python demo.py 看,这个网页的源码已经被我们扒下来了,是不是很酸爽? 2.分析扒网页的方法 那么我们来分析这两行代码,第一行 response = urllib2.urlopen("http://www.baidu.com") 首先我们调用的是urllib2库里面的urlopen方法,传入一个URL,这个网址是百度首页,协议是HTTP协议,当然你也可以把HTTP换做FTP,FILE,HTTPS 等等,只是代表了一种访问控制协议

making requests to localhost from inside docker container

倾然丶 夕夏残阳落幕 提交于 2020-01-02 04:07:05
问题 I have an application runing on my localhost at port 8080. I have some python code that consumes that service. The code runs fine on my base system but as soon as I put it inside a docker container I get urllib2.URLError: <urlopen error [Errno 111] Connection refused> . I have another application that exposes an api at port 6543. Same problem. I assume I need to tell docker that it's allowed to consume certain localhost ports. How do I do that? Here are some more specific details: I can

Download a file from https with authentication

北城以北 提交于 2020-01-02 03:27:48
问题 I have a Python 2.6 script that downloades a file from a web server. I want this this script to pass a username and password(for authenrication before fetching the file) and I am passing them as part of the url as follows: import urllib2 response = urllib2.urlopen("http://'user1':'password'@server_name/file") However, I am getting syntax error in this case. Is this the correct way to go about it? I am pretty new to Python and coding in general. Can anybody help me out? Thanks! 回答1: I suppose

Urllib's urlopen breaking on some sites (e.g. StackApps api): returns garbage results

只愿长相守 提交于 2020-01-02 01:46:32
问题 I'm using urllib2 's urlopen function to try and get a JSON result from the StackOverflow api. The code I'm using: >>> import urllib2 >>> conn = urllib2.urlopen("http://api.stackoverflow.com/0.8/users/") >>> conn.readline() The result I'm getting: '\x1f\x8b\x08\x00\x00\x00\x00\x00\x04\x00\xed\xbd\x07`\x1cI\x96%&/m\xca{\x7fJ\... I'm fairly new to urllib, but this doesn't seem like the result I should be getting. I've tried it in other places and I get what I expect (the same as visiting the