一、CSV
简介:
CSV(Comma-Separated Value),即逗号分隔符。CSV并不算真正的结构化数据,CSV文件内容仅仅是一些用逗号分割的原始字符串。虽然可以用str.split(',')分割提取CSV文件,但有些字段值可能含有嵌套的逗号,所以Python提供了专门用于解析和生成CSV的库,CSV即是一个。
eg:该脚本演示了将数据转换成CSV格式写出,并再次读入。
input:
import csv
from distutils.log import warn as printf #避免python2和3的版本差异
DATA = (
(9,'Web Client and Server','base64,urllib'),
(10,'Web Programming:CGI & WSGI','cgi,time,wsgiref'),
(11,'Web Services','urllib, twython'),
)
printf('***WRITING CSV DATA')
f = open('bookdata.csv','w')
writer = csv.writer(f)
for record in DATA:
writer.writerow(record)
f.close()
printf('***REVIEW OF SAVED DATA')
f = open('bookdata.csv','r')
reader = csv.reader(f)
for chap, title, modpkgs in reader:
printf('Chapter %s: %r (featuring %s)' %(chap,title,modpkgs))
f.close()
output:
***WRITING CSV DATA
***REVIEW OF SAVED DATA
Chapter 9: 'Web Client and Server' (featuring base64,urllib)
Chapter 10: 'Web Programming:CGI & WSGI' (featuring cgi,time,wsgiref)
Chapter 11: 'Web Services' (featuring urllib, twython)
二、JSON
简介:
JSON中文意思为JavaScript对象表示法,从名字即可以看出它来自JavaScript领域,JSON是JavaScript的子集,专门用于指定结构化的数据,JSON是以人类更易读的方式传输结构化的数据。关于更多JSON的信息可以访问http://json.org
Python2.6开始通过标准库json支持JSON,同时提供了dump()和load()接口,对数据进行操作。
eg1:JSON对象和Python字典很像,以下示例展示JSON和字典对象的互相转换。
input:
import json
dictionary = dict(zip('abcde',range(5))) #字典格式
print(dictionary)
dict2json = json.dumps((dict(zip('abcde',range(5))))) #将字典格式转换为json格式,str类型
print(dict2json)
json2dict = json.loads(dict2json) #与dumps相反,将json转换为dict
print(json2dict)
output:
{'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4}
{"a": 0, "b": 1, "c": 2, "d": 3, "e": 4}
{'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4}
eg2:将Python字典转换为JSON格式,,并使用多种格式显示。
input:
from distutils.log import warn as printf
from json import dumps
from pprint import pprint
BOOKS = {
'001':{
'title':'core python',
'edition':'2',
'year':'7',
},
'002':{
'title':'python web',
'authors':['jeff','paul','wesley'],
'year':'2009',
},
'003':{
'title':'python fundamentals',
'year':'2009',
},
}
print('*** raw dict ***')
printf(BOOKS)
printf('\n*** pretty_printed dict ***')
pprint(BOOKS)
printf('\n*** raw json ***')
printf(dumps(BOOKS))
printf('\n*** pretty_printed json ***')
printf(dumps(BOOKS,indent=4))
output:
*** raw dict ***
{'001': {'title': 'core python', 'edition': '2', 'year': '7'}, '002': {'title': 'python web', 'authors': ['jeff', 'paul', 'wesley'], 'year': '2009'}, '003': {'title': 'python fundamentals', 'year': '2009'}}
{'001': {'edition': '2', 'title': 'core python', 'year': '7'},
*** pretty_printed dict ***
'002': {'authors': ['jeff', 'paul', 'wesley'],
'title': 'python web',
*** raw json ***
'year': '2009'},
{"001": {"title": "core python", "edition": "2", "year": "7"}, "002": {"title": "python web", "authors": ["jeff", "paul", "wesley"], "year": "2009"}, "003": {"title": "python fundamentals", "year": "2009"}}
'003': {'title': 'python fundamentals', 'year': '2009'}}
*** pretty_printed json ***
{
"001": {
"title": "core python",
"edition": "2",
"year": "7"
},
"002": {
"title": "python web",
"authors": [
"jeff",
"paul",
"wesley"
],
"year": "2009"
},
"003": {
"title": "python fundamentals",
"year": "2009"
}
}
三、XML
简介:
XML同样用来表示结构化数据,尽管XML数据是纯文本,但XML并不是可以认为是人类可读的。XML只有在解析器的帮助下的才变得可读。XML诞生已久,且比JSON应用更广。
Python最初在v1.5中提供了xmllib模块支持XML,最终融入到xml包中,v2.5使用了ElementTree进一步成熟的支持XML,是一款使用广泛、快速且符合Python的XML文档解析器和生成器。已添加至标准库。
eg1:将Python字典转换为XML,并以多种格式显示。
input:
from xml.etree.cElementTree import Element,SubElement,tostring
from xml.dom.minidom import parseString
BOOKS = {
'001':{
'title':'core python',
'edition':'2',
'year':'7',
},
'002':{
'title':'python web',
'authors':'jeff:paul:wesley',
'year':'2009',
},
'003':{
'title':'python fundamentals',
'year':'2009',
},
}
books = Element('books')
for isbn, info in BOOKS.items():
book = SubElement(books,'book')
info.setdefault('authors','wesley chun')
info.setdefault('edition',1)
for key, val in info.items():
SubElement(book,key).text = ','.join(str(val).split(':'))
xml = tostring(books)
print('*** raw xml ***')
print(xml)
print('\n*** pretty-printed xml ***')
dom = parseString(xml)
print(dom.toprettyxml(' '))
print('*** flat structure ***')
for elmt in books.iter():
print(elmt.tag,'-',elmt.text)
print('\n*** titles only ***')
for book in books.findall('.//title'):
print(book.text)
output:
*** raw xml ***
b'<books><book><title>core python</title><edition>2</edition><year>7</year><authors>wesley chun</authors></book><book><title>python web</title><authors>jeff,paul,wesley</authors><year>2009</year><edition>1</edition></book><book><title>python fundamentals</title><year>2009</year><authors>wesley chun</authors><edition>1</edition></book></books>'
*** pretty-printed xml ***
<?xml version="1.0" ?>
<books>
<book>
<title>core python</title>
<edition>2</edition>
<year>7</year>
<authors>wesley chun</authors>
</book>
<book>
<title>python web</title>
<authors>jeff,paul,wesley</authors>
<year>2009</year>
<edition>1</edition>
</book>
<book>
<title>python fundamentals</title>
<year>2009</year>
<authors>wesley chun</authors>
<edition>1</edition>
</book>
</books>
*** flat structure ***
books - None
book - None
title - core python
edition - 2
year - 7
authors - wesley chun
book - None
title - python web
authors - jeff,paul,wesley
year - 2009
edition - 1
book - None
title - python fundamentals
year - 2009
authors - wesley chun
edition - 1
*** titles only ***
core python
python web
python fundamentals
eg2:显示实时的排名靠前的头条新闻(默认为5个),以及Google News服务对应的链接。
input:
from io import BytesIO as StringIO
from itertools import *
from urllib.request import urlopen
from pprint import pprint
from xml.etree import ElementTree
g = urlopen('https://news.google.com/news?topic=h&output=rss') #h代表head头条新闻
f = StringIO(g.read())
g.close()
tree = ElementTree.parse(f) #用ElementTress解析XML
f.close()
def topnews(count=5): #默认解析5条新闻
pair = [None,None]
for elmt in tree.getiterator():
if elmt.tag == 'title': #由于页面最上面还有新闻类型标题,所以需要分析是新闻类型标题还是真正的头条新闻标题
skip = elmt.text.startswith('Top Stories')
if skip:
continue
pair[0] = elmt.text
if elmt.tag == 'link':
if skip:
continue
pair[1] = elmt.text
if pair[0] and pair[1]: #只有同时存在标题和链接才返回数据
count -= 1
yield(tuple(pair))
if not count:
return
pair = [None,None]
for news in topnews():
pprint(news)
output:
('This RSS feed URL is deprecated', 'https://news.google.com/news')
('Keeping Summit Hopes Alive Suggests Kim Jong-un May Need a Deal - New York '
'Times',
'http://news.google.com/news/url?sa=t&fd=R&ct2=us&usg=AFQjCNEp8ER18RwtH8pZKJBxzAXKyAUMHA&clid=c3a7d30bb8a4878e06b80cf16b898331&cid=52779907582906&ei=RSAJW4i4JIK94gKYwp24Dg&url=https://www.nytimes.com/2018/05/26/world/asia/kim-summit-trump.html')
('Science teacher who tackled student gunman among 2 wounded at Indiana middle '
'school - Chicago Tribune',
'http://news.google.com/news/url?sa=t&fd=R&ct2=us&usg=AFQjCNF_qIL2IA4IJRhgVGx6jDWacZIeOg&clid=c3a7d30bb8a4878e06b80cf16b898331&cid=52779910889936&ei=RSAJW4i4JIK94gKYwp24Dg&url=http://www.chicagotribune.com/news/nationworld/midwest/ct-noblesville-west-middle-school-20180525-story.html')
("Trump says he'll spare Chinese telecom firm ZTE from collapse, defying "
'lawmakers - Washington Post',
'http://news.google.com/news/url?sa=t&fd=R&ct2=us&usg=AFQjCNEF9XmwZpRtbcM7yq9Fw_ieMmYv6g&clid=c3a7d30bb8a4878e06b80cf16b898331&cid=52779911079543&ei=RSAJW4i4JIK94gKYwp24Dg&url=https://www.washingtonpost.com/business/economy/congress-threatens-to-block-deal-between-white-house-china-to-save-telecom-giant-zte/2018/05/25/1db326ba-604a-11e8-9ee3-49d6d4814c4c_story.html')
('USC President CL Max Nikias to step down - Los Angeles Times',
'http://news.google.com/news/url?sa=t&fd=R&ct2=us&usg=AFQjCNEFRNpZDmQoOJGsesa5yUSgga0fbA&clid=c3a7d30bb8a4878e06b80cf16b898331&cid=52779910766807&ei=RSAJW4i4JIK94gKYwp24Dg&url=http://www.latimes.com/local/lanow/la-me-max-nikias-usc-20180525-story.html')
四、参考文献
Wesley Chun. Python核心编程 : 第3版[M]. 人民邮电出版社, 2016.
来源:CSDN
作者:Monokaix
链接:https://blog.csdn.net/cxzzxc123456/article/details/80462665