urllib | 易学教程

python urllib模块

阅读更多关于 python urllib模块

在python中urllib模块提供上层接口,可以使用它下载读取数据,这里举个例子,把sina首页的html抓取下来显示出来.有2种方法可以实现. 1.urlopen(url, data=None, proxies=None) urlopen(url [, data]) -> open file-like object 创建一个表示远程url的类文件对象，然后像本地文件一样操作这个类文件对象来获取远程数据。参数url表示远程数据的路径，一般是网址；参数data表示以post方式提交到url的数据;参数proxies用于设置代理.urlopen返回一个类文件对象. #!/usr/bin/python2.5 import urllib url = "http://www.sina.com" data = urllib.urlopen(url).read() print data root@10.1.6.200:~# python gethtml.py <!Doctype html>  <html> <head> <meta http-equiv="Content-type" content="text/html; charset=gb2312" /

Build query string using urlencode python

阅读更多关于 Build query string using urlencode python

I am trying to build a url so that I can send a get request to it using urllib module. Let's suppose my final_url should be url = "www.example.com/find.php?data=http%3A%2F%2Fwww.stackoverflow.com&search=Generate+value" Now to achieve this I tried the following way: >>> initial_url = "http://www.stackoverflow.com" >>> search = "Generate+value" >>> params = {"data":initial_url,"search":search} >>> query_string = urllib.urlencode(params) >>> query_string 'search=Generate%2Bvalue&data=http%3A%2F%2Fwww.stackoverflow.com' Now if you compare my query_string with the format of final_url you can

urllib基本用法(了解)

阅读更多关于 urllib基本用法(了解)

一、urllib.urlopen 1、urlopen　　 from urllib import request r = request.urlopen('http://www.baidu.com/') # 获取状态码 print(r.status) # 获取相应头 print(r.getheaders()) print('=' * 30) # 获取网页源码 print(r.read().decode('utf-8')) 注意：urlopen() 含有data（bytes类型）的是post请求，timeout超时 2、Request from urllib import request # 创建请求对象 req = request.Request('https://www.cnblogs.com/') # 打开网页 r = request.urlopen(req) print(r.read().decode('utf-8')) 注意：data(bytes，dict->str->bytes)，headers={}, method= 使用Handler实现验证、Cookies、代理等功能二、urllib.error 处理异常 from urllib.error import URLError, HTTPError 使用try....except进行处理注意

Check for `urllib.urlretrieve(url, file_name)` Completion Status

阅读更多关于 Check for `urllib.urlretrieve(url, file_name)` Completion Status

问题 How do I check to see if urllib.urlretrieve(url, file_name) has completed before allowing my program to advance to the next statement? Take for example the following code snippet: import traceback import sys import Image from urllib import urlretrieve try: print "Downloading gif....." urlretrieve(imgUrl, "tides.gif") # Allow time for image to download/save: time.sleep(5) print "Gif Downloaded." except: print "Failed to Download new GIF" raw_input('Press Enter to exit...') sys.exit() try:

Urllib's urlopen breaking on some sites (e.g. StackApps api): returns garbage results

阅读更多关于 Urllib's urlopen breaking on some sites (e.g. StackApps api): returns garbage results

I'm using urllib2 's urlopen function to try and get a JSON result from the StackOverflow api. The code I'm using: >>> import urllib2 >>> conn = urllib2.urlopen("http://api.stackoverflow.com/0.8/users/") >>> conn.readline() The result I'm getting: '\x1f\x8b\x08\x00\x00\x00\x00\x00\x04\x00\xed\xbd\x07`\x1cI\x96%&/m\xca{\x7fJ\... I'm fairly new to urllib, but this doesn't seem like the result I should be getting. I've tried it in other places and I get what I expect (the same as visiting the address with a browser gives me: a JSON object). Using urlopen on other sites (e.g. " http://google.com "

urllib parse

阅读更多关于 urllib parse

1、urlparse 作用：解析url from urllib import parse url = "https://book.qidian.com/info/1004608738" result = parse.urlparse(url=url) print(result) 结果： ParseResult(scheme='https', netloc='book.qidian.com', path='/info/1004608738', params='', query='', fragment='') scheme:表示协议 netloc:域名 path:路径 params:参数 query:查询条件，一般都是get请求的url fragment:锚点，用于直接定位页面的下拉位置，跳转到网页的指定位置 2、urlunparse 作用：上传url from urllib import parse url_params = ('https', 'book.qidian.com', '/info/1004608738', '', '', '') _url = parse.urlunparse(url_params) print(_url) # https://book.qidian.com/info/1004608738 3、urljoin 作用：拼接url from

Python urllib urlopen not working

阅读更多关于 Python urllib urlopen not working

I am just trying to fetch data from a live web by using the urllib module, so I wrote a simple example Here is my code: import urllib sock = urllib.request.urlopen("http://diveintopython.org/") htmlSource = sock.read() sock.close() print (htmlSource) But I got error like: Traceback (most recent call last): File "D:\test.py", line 3, in <module> sock = urllib.request.urlopen("http://diveintopython.org/") AttributeError: 'module' object has no attribute 'request' You are reading the wrong documentation or the wrong Python interpreter version. You tried to use the Python 3 library in Python 2.

AttributeError: module 'urllib' has no attribute 'parse'

阅读更多关于 AttributeError: module 'urllib' has no attribute 'parse'

python 3.5.2 code 1 import urllib s = urllib.parse.quote('"') print(s) it gave this error: AttributeError: module 'urllib' has no attribute 'parse' code 2 from urllib.parse import quote # import urllib # s = urllib.parse.quote('"') s = quote('"') print(s) it works... code3 from flask import Flask # from urllib.parse import quote # s = quote('"') import urllib s = urllib.parse.quote('"') print(s) it works,too. because of flask? Why I don't have the error anymore? is it a bug ? The urllib package serves as a namespace only. There are other modules under urllib like request and parse . For

NameError: name 'urllib' is not defined "

阅读更多关于 NameError: name 'urllib' is not defined "

问题 CODE: import networkx as net from urllib.request import urlopen def read_lj_friends(g, name): # fetch the friend-list from LiveJournal response=urllib.urlopen('http://www.livejournal.com/misc/fdata.bml?user='+name) ERROR: Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'urllib' is not defined 回答1: You've imported urlopen directly, so you should refer to it like that rather than via urllib : response = urlopen('...') 回答2: You can also try in Python 3:

im trying to get proxies using regex python out of a web page

阅读更多关于 im trying to get proxies using regex python out of a web page

import urllib.request import re page = urllib.request.urlopen("http://www.samair.ru/proxy/ip-address-01.htm").read() re.findall('\d+\.\d+\.\d+\.\d+', page) i dont understand why it says: File "C:\Python33\lib\re.py", line 201, in findall return _compile(pattern, flags).findall(string) TypeError: can't use a string pattern on a bytes-like object import urllib import re page = urllib.urlopen("http://www.samair.ru/proxy/ip-address-01.htm").read() print re.findall('\d+\.\d+\.\d+\.\d+', page) Worked and gave me the result: ['056.249.66.50', '100.44.124.8', '103.31.250.115', ... Edit This works for