urllib

python urllib模块

泪湿孤枕 提交于 2019-12-05 05:36:02
在python中urllib模块提供上层接口,可以使用它下载读取数据,这里举个例子,把sina首页的html抓取下来显示出来.有2种方法可以实现. 1.urlopen(url, data=None, proxies=None) urlopen(url [, data]) -> open file-like object 创建一个表示远程url的类文件对象,然后像本地文件一样操作这个类文件对象来获取远程数据。参数url表示远程数据的路径,一般是网址;参数data表示以post方式提交到url的数据;参数proxies用于设置代理.urlopen返回一个类文件对象. #!/usr/bin/python2.5 import urllib url = "http://www.sina.com" data = urllib.urlopen(url).read() print data root@10.1.6.200:~# python gethtml.py <!Doctype html> <!--[30,131,1] published at 2013-04-11 23:15:33 from #150 by system--> <html> <head> <meta http-equiv="Content-type" content="text/html; charset=gb2312" /

Build query string using urlencode python

点点圈 提交于 2019-12-05 03:56:52
I am trying to build a url so that I can send a get request to it using urllib module. Let's suppose my final_url should be url = "www.example.com/find.php?data=http%3A%2F%2Fwww.stackoverflow.com&search=Generate+value" Now to achieve this I tried the following way: >>> initial_url = "http://www.stackoverflow.com" >>> search = "Generate+value" >>> params = {"data":initial_url,"search":search} >>> query_string = urllib.urlencode(params) >>> query_string 'search=Generate%2Bvalue&data=http%3A%2F%2Fwww.stackoverflow.com' Now if you compare my query_string with the format of final_url you can

urllib基本用法(了解)

烂漫一生 提交于 2019-12-05 03:15:53
一、urllib.urlopen 1、urlopen   from urllib import request r = request.urlopen('http://www.baidu.com/') # 获取状态码 print(r.status) # 获取相应头 print(r.getheaders()) print('=' * 30) # 获取网页源码 print(r.read().decode('utf-8')) 注意:urlopen() 含有data(bytes类型)的是post请求,timeout超时 2、Request from urllib import request # 创建请求对象 req = request.Request('https://www.cnblogs.com/') # 打开网页 r = request.urlopen(req) print(r.read().decode('utf-8')) 注意:data(bytes,dict->str->bytes),headers={}, method= 使用Handler实现验证、Cookies、代理等功能 二、urllib.error 处理异常 from urllib.error import URLError, HTTPError 使用try....except进行处理 注意

Check for `urllib.urlretrieve(url, file_name)` Completion Status

三世轮回 提交于 2019-12-05 03:05:08
问题 How do I check to see if urllib.urlretrieve(url, file_name) has completed before allowing my program to advance to the next statement? Take for example the following code snippet: import traceback import sys import Image from urllib import urlretrieve try: print "Downloading gif....." urlretrieve(imgUrl, "tides.gif") # Allow time for image to download/save: time.sleep(5) print "Gif Downloaded." except: print "Failed to Download new GIF" raw_input('Press Enter to exit...') sys.exit() try:

Urllib's urlopen breaking on some sites (e.g. StackApps api): returns garbage results

好久不见. 提交于 2019-12-05 02:31:08
I'm using urllib2 's urlopen function to try and get a JSON result from the StackOverflow api. The code I'm using: >>> import urllib2 >>> conn = urllib2.urlopen("http://api.stackoverflow.com/0.8/users/") >>> conn.readline() The result I'm getting: '\x1f\x8b\x08\x00\x00\x00\x00\x00\x04\x00\xed\xbd\x07`\x1cI\x96%&/m\xca{\x7fJ\... I'm fairly new to urllib, but this doesn't seem like the result I should be getting. I've tried it in other places and I get what I expect (the same as visiting the address with a browser gives me: a JSON object). Using urlopen on other sites (e.g. " http://google.com "

urllib parse

我的未来我决定 提交于 2019-12-05 02:10:22
1、urlparse 作用:解析url from urllib import parse url = "https://book.qidian.com/info/1004608738" result = parse.urlparse(url=url) print(result) 结果: ParseResult(scheme='https', netloc='book.qidian.com', path='/info/1004608738', params='', query='', fragment='') scheme:表示协议 netloc:域名 path:路径 params:参数 query:查询条件,一般都是get请求的url fragment:锚点,用于直接定位页 面的下拉位置,跳转到网页的指定位置 2、urlunparse 作用:上传url from urllib import parse url_params = ('https', 'book.qidian.com', '/info/1004608738', '', '', '') _url = parse.urlunparse(url_params) print(_url) # https://book.qidian.com/info/1004608738 3、urljoin 作用:拼接url from

Python urllib urlopen not working

不打扰是莪最后的温柔 提交于 2019-12-05 01:57:44
I am just trying to fetch data from a live web by using the urllib module, so I wrote a simple example Here is my code: import urllib sock = urllib.request.urlopen("http://diveintopython.org/") htmlSource = sock.read() sock.close() print (htmlSource) But I got error like: Traceback (most recent call last): File "D:\test.py", line 3, in <module> sock = urllib.request.urlopen("http://diveintopython.org/") AttributeError: 'module' object has no attribute 'request' You are reading the wrong documentation or the wrong Python interpreter version. You tried to use the Python 3 library in Python 2.

AttributeError: module 'urllib' has no attribute 'parse'

我是研究僧i 提交于 2019-12-04 22:17:31
python 3.5.2 code 1 import urllib s = urllib.parse.quote('"') print(s) it gave this error: AttributeError: module 'urllib' has no attribute 'parse' code 2 from urllib.parse import quote # import urllib # s = urllib.parse.quote('"') s = quote('"') print(s) it works... code3 from flask import Flask # from urllib.parse import quote # s = quote('"') import urllib s = urllib.parse.quote('"') print(s) it works,too. because of flask? Why I don't have the error anymore? is it a bug ? The urllib package serves as a namespace only. There are other modules under urllib like request and parse . For

NameError: name 'urllib' is not defined "

佐手、 提交于 2019-12-04 22:09:00
问题 CODE: import networkx as net from urllib.request import urlopen def read_lj_friends(g, name): # fetch the friend-list from LiveJournal response=urllib.urlopen('http://www.livejournal.com/misc/fdata.bml?user='+name) ERROR: Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'urllib' is not defined 回答1: You've imported urlopen directly, so you should refer to it like that rather than via urllib : response = urlopen('...') 回答2: You can also try in Python 3:

im trying to get proxies using regex python out of a web page

北城以北 提交于 2019-12-04 21:23:39
import urllib.request import re page = urllib.request.urlopen("http://www.samair.ru/proxy/ip-address-01.htm").read() re.findall('\d+\.\d+\.\d+\.\d+', page) i dont understand why it says: File "C:\Python33\lib\re.py", line 201, in findall return _compile(pattern, flags).findall(string) TypeError: can't use a string pattern on a bytes-like object import urllib import re page = urllib.urlopen("http://www.samair.ru/proxy/ip-address-01.htm").read() print re.findall('\d+\.\d+\.\d+\.\d+', page) Worked and gave me the result: ['056.249.66.50', '100.44.124.8', '103.31.250.115', ... Edit This works for