模块定义、导入、本质
定义:用来从逻辑上组织python代码(变量,函数,类,逻辑,实现一个功能)
import本质:路径搜索和搜索路径
(1)import x(模块名)
x = all code(所有代码)
x.name,x.logger() 调用变量或函数
(2)from x import name(方法)
name = 'uson' 直接使用变量或函数
导入模块本质上就是.py结尾的python文件,将python文件解释一遍
(导入模块)import module_name -->(找到.py文件)module.py -->(找到路径)module.py路径【路径搜索】 -->sys.path(pycharm将相对路径转换成绝对路径的一个路径列表)【搜索路径】,如果导入的模块在当前目前下,导入成功,否则导入失败,怎么办?
将需要导入的模块添加到环境变量中, sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
优化: sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) # ['D:\\python\\oldboy\\5\\module', ...]
导入方法:
(1)import 模块名1, 模块名2,……
(2)from 模块名 import *(所有函数和方法) -->不建议
(3)from 模块名 import 函数名(方法名) as 重命名
(4)from 模块名 import m1, m2, m3, …… # 值得注意:模块名和m1,m2,m3必须在配置的环境路径下一级(导入包:同理)
包定义、导入、本质
定义:即一个目录(必须带有一个__init__.py文件),用来从逻辑上组织模块的
导入包的本质:执行该包下的__init__.py文件
(1)run __init__.py,需要在__init__.py文件中导入某个模块(即把导入语句放入了__init__.py中)
(2)譬如:导入相对于__init__.py当前目录的模块,from . import module_name # module_name = ‘module_name.py all code’
跨目录导入包示例如下:
在im文件里导入pychamr包: 1、在im文件里配置环境变量(至少到包的上级目录module) 2、import pychamr 3、调用方法:pychamr.模块名(__init__导入的模块名).变量或函数
模块小结:
在导入包时,发现奇怪问题,不知道为什么?
(1)导入当前目录的模块时,不能用 from . import 同级目录下的.py文件名 ,只能用 import 同级目录下的.py文件名 ,本文件才能正常执行
(2)反之,在导入包时,包目录下的__init__文件,不能用 import 同级目录下的.py文件名 ,只能用 from . import 同级目录下的.py文件名 ,调用文件才能正常执行
from 和 import后面的模块名(module目录同级)必须在配置的路径(目录5)下一级 ['D:\\python\\5', 'D:\\python\\5\\module',...]
模块分类:
1、标准库(time, os, ...)
time模块(调用C库)
表示方式:
1)时间戳:单位s
2)格式化的时间字符串
3)元组(9个元素):time.localtime() -->本地时间 备注:周日:一周的第一天0
time.struct_time(tm_year=2019, tm_mon=9, tm_mday=13, tm_hour=13, tm_min=29, tm_sec=52, tm_wday=4, tm_yday=256, tm_isdst=0) #isdst:是否是夏令时
世界标准时间:UTC,在中国:UTC+8(东八区)
时区:360° / 15° = 24个时区,世界时间是以子午线(0°)为准的,中国与中间的经度(子午线)相差8格,即中国比标准时间要早8个小时(早8小时见到太阳,时间上就大于世界标准时间)
word_time = time.timezone china_time = int(word_time) / 3600 print("中国与世界时间相差:%s小时" %china_time) # 中国与世界时间相差:-8.0小时
help(time)可以查看所有time的方法
(1)time
示例代码:
import time current = time.time() # 时间戳(单位s) 1568350840.6369312 hour = current / 3600 day = hour / 24 year = day / 365 start_date = 2019 - int(year) print("Unix系统诞生的时间:", start_date) # 起源时间: 1970 word_time = time.timezone china_time = int(word_time) / 3600 print("中国与世界时间相差:%s小时" %china_time) # 中国与世界时间相差:-8.0小时 print(time.altzone) # -32400 夏令时与UTC的时差 print(time.daylight) # 0 是否使用了夏令时 # time.sleep(1) # (一)时间戳转元组格式 # (1) print(time.gmtime()) # 默认为空,则为当前的UTC时区的时间 实际时间:2019-9-13 14:30 # time.struct_time(tm_year=2019, tm_mon=9, tm_mday=13, tm_hour=6, tm_min=30, tm_sec=12, tm_wday=4, tm_yday=256, tm_isdst=0) print(time.gmtime(23422544738)) # UTC时区的时间 # 2027年。。。 print(time.gmtime(2)) # UTC时区的时间 1970年。。。 # (2) # 元组形式(9个元素) isdst:是否是夏令时 print(time.localtime()) # 默认为空,即当前的本地时间(UTC+8) local = time.localtime(123232) print(local) # time.struct_time(tm_year=1970, tm_mon=1, tm_mday=2, tm_hour=18, tm_min=13, tm_sec=52, tm_wday=4, tm_yday=2, tm_isdst=0) print(local.tm_year, local.tm_mday) # 1970 2 # (二)元组格式为时间戳 local = time.localtime() # 转本地时间为时间戳 print(time.mktime(local)) # 1568357166.0 # (三)元组格式转格式化的字符串 # time.strftime(format[, tuple]) t1 = time.strftime("%Y-%m-%d %H:%M:%S") t2 = time.strftime("%Y-%m-%d %X") t3 = time.strftime("%y-%m-%d %X") print(t1, t2, t3) # 2019-09-13 14:54:30 2019-09-13 14:54:30 19-09-13 14:54:30 local = time.localtime(1232422) f_local = time.strftime("%Y-%m-%d %H:%M:%S", local) print(f_local) # 1970-01-15 14:20:22 # (四)格式化的字符串转元组格式 # time.strptime(string, format) tup = time.strptime("1970-01-15 14:20:22", "%Y-%m-%d %H:%M:%S") print(tup) # time.struct_time(tm_year=1970, tm_mon=1, tm_mday=15, tm_hour=14, tm_min=20, tm_sec=22, tm_wday=3, tm_yday=15, tm_isdst=-1) # 元组格式转字符串(周月日时分秒年:%a %b %d %H:%M:%S %Y) local = time.localtime(1232422) asc1 = time.asctime() # 默认(本地时间)元组格式转字符串 Fri Sep 13 15:41:13 2019 asc2 = time.asctime(local) # 指定元组格式转字符串 Thu Jan 15 14:20:22 1970 print(asc1, asc2) # 时间戳转字符串(周月日时分秒年:%a %b %d %H:%M:%S %Y) local = time.ctime() # (本地时间) Fri Sep 13 15:56:47 2019 stamp = time.ctime(123322) # (指定时间戳) Fri Jan 2 18:15:22 1970 print(local, stamp)
时间三种方式之间的转换:
参考(H):
(2)datetime
示例代码:
import datetime, time print(datetime.datetime.now()) #获取当前东八区时间 2019-09-13 16:08:23.626636 print(datetime.datetime.utcnow()) #获取当前utc时间 2019-09-13 08:08:23.626636 # timedelta: 时间增量 不可以单独使用 print(datetime.datetime.now() + datetime.timedelta(3)) #当前时间+3天 print(datetime.datetime.now() + datetime.timedelta(-3)) #当前时间-3天 print(datetime.datetime.now() + datetime.timedelta(hours=3)) #当前时间+3小时 print(datetime.datetime.now() + datetime.timedelta(minutes=30)) #当前时间+30分 print(datetime.datetime.now() + datetime.timedelta(seconds=30)) #当前时间+30秒 # 拓展 # 自定义日期和时间(时分秒)传参得到格式化的日期和时间字符串 print(datetime.datetime(year=2019, month=9, day=13, hour=16, minute=16, second=16)) # 2019-09-13 16:16:16 print(datetime.datetime(2019, 9, 13, 16, 16, 16)) # 2019-09-13 16:16:16 print(datetime.datetime(2019, 9, 13, 16, 16, 16, 42212)) # 2019-09-13 16:16:16.042212 # 自定义日期(年月日)传参得到格式化的日期字符串 print(datetime.date(year=2019, month=9, day=13)) # 2019-09-13 # 自定义时间(时分秒)传参得到格式化的时间字符串 print(datetime.time(hour=16, minute=16, second=16)) # 16:16:16 # 时间戳直接转成日期格式 print(datetime.date.fromtimestamp(time.time()) ) # 2019-09-13 # 修改当前获取的时间(时间替换) print(datetime.datetime.now().replace(hour=9)) # 修改前时间:2019-09-13 16:32:30.843760 # 修改后时间:2019-09-13 09:32:30.843760
random模块
方法详解示例代码:
#!/usr/bin/env python # Author:USON import random # 随机[0-1)浮点数 顾头不顾尾 print(random.random()) # 0.06706695529343099 # 随机[0-10)浮点数 顾头不顾尾 print(random.uniform(0, 10)) # 7.836148583000067 # 随机[1-3]整数 包含3 print(random.randint(1,3)) # 随机[1-2]整数 range顾头不顾尾 print(random.randrange(1,3)) # 随机取值:字符串、列表、元组...随机取1位 print(random.choice([1,2,4,5,6,8])) # 1 # 随机取值:字符串、列表、元组...随机取n个位 print(random.sample([1,2,4,5,6,8], 2)) # [6, 8] # 洗牌功能: 完全打乱顺序 items = [1,2,4,6,7,8,10] random.shuffle(items) print(items) # [10, 6, 7, 8, 2, 4, 1]
验证码功能示例:
import random checkcode = '' #英文+数字随机组合验证码: for i in range(4): rand_int = random.randint(0,3) if i == rand_int: #字母 #checkcode += chr(random.randint(65, 90)) tmp = chr(random.randint(65, 90)) else: #数字 #checkcode += str(random.randint(0, 9)) tmp = random.randint(0, 9) checkcode += str(tmp) print(checkcode) #全英文验证码: for i in range(4): checkcode += chr(random.randint(65, 90)) print(checkcode) #全数字验证码: for i in range(4): checkcode += str(random.randint(0, 9)) print(checkcode)
os模块:提供对操作系统进行调用的接口
os_path = os.path.join('run.BASE_DIR', 'database') # run.BASE_DIR\database
import os # 提供对操作系统进行调用的接口 # print(os.getcwd()) # 获取当前工作目录,即当前python脚本工作的目录路径 # os.chdir("D:\\python\\uson") # 改变当前脚本工作目录;相当于shell下cd # os.chdir(r"D:\python\uson") # 防止转义 # os.curdir 返回当前目录: ('.') # os.pardir 返回上一级目录,获取当前目录的父目录字符串名:('..') # os.makedirs('dirname1/dirname2') 可生成多层递归目录 # os.removedirs('dirname1') 若目录为空,则删除,并递归到上一级目录,如若也为空,则删除,依此类推 # os.mkdir('dirname') 生成单级目录;相当于shell中mkdir dirname # os.rmdir('dirname') 删除单级空目录,若目录不为空则无法删除,报错;相当于shell中rmdir dirname # os.listdir('dirname') 列出指定目录下的所有文件和子目录,包括隐藏文件,并以列表方式打印 # os.remove() 删除一个文件 # os.rename("oldname","newname") 重命名文件/目录 # os.stat('path/filename') 获取文件/目录信息 # os.sep 输出操作系统特定的路径分隔符,win下为"\\",Linux下为"/" # os.linesep 输出当前平台使用的行终止符,win下为"\t\n",Linux下为"\n" # os.pathsep 输出用于分割文件路径的字符串,win下为; # os.name 输出字符串指示当前使用平台。win->'nt'; Linux->'posix' # os.system("bash command") 运行shell命令,直接显示 # os.system('dir') # os.system('ipconfig/all') # os.environ 获取系统环境变量 # os.path.abspath(path) 返回path规范化的绝对路径 # os.path.split(path) 将path分割成目录和文件名二元组返回 # 'C:\\Users\\uson' ---> ('C:\\Users', 'uson') # 路径可以不存在 # os.path.dirname(path) 返回path的目录。其实就是os.path.split(path)的第一个元素 # os.path.basename(path) 返回path最后的文件名,文件为空,返回结尾目录名。如果path以/或\结尾,那么就会返回空值。即os.path.split(path)的第二个元素 # os.path.exists(path) 如果path存在,返回True;如果path不存在,返回False # os.path.isabs(path) 如果path是绝对路径,返回True # os.path.isfile(path) 如果path是一个存在的文件,返回True。否则返回False # os.path.isdir(path) 如果path是一个存在的目录,则返回True。否则返回False # os.path.join(path1[, path2[, ...]]) 将多个路径组合后返回,第一个绝对路径之前的参数将被忽略 # os.path.getatime(path) 返回path所指向的文件或者目录的最后存取时间(时间戳) # os.path.getmtime(path) 返回path所指向的文件或者目录的最后修改时间(时间戳)
sys模块
sys.argv 命令行参数List,第一个元素是程序本身路径 sys.exit(n) 退出程序,正常退出时exit(0) sys.version 获取Python解释程序的版本信息 sys.maxint 最大的Int值 sys.path 返回模块的搜索路径,初始化时使用PYTHONPATH环境变量的值 sys.platform 返回操作系统平台名称 sys.stdout.write('please:') val = sys.stdin.readline()[:-1]
shutil模块
高级的 文件、文件夹、压缩包 处理模块
shutil.copyfileobj(fsrc, fdst[, length])
将文件内容拷贝到另一个文件中,可以部分内容 shutil.copyfileobj(f1, f2)
import shutil f1 = open('os.py', 'r', encoding='utf-8') # 原文件 f2 = open('os2.py', 'w', encoding='utf-8') # 新文件 shutil.copyfileobj(f1, f2)
def copyfileobj(fsrc, fdst, length=16*1024): """copy data from file-like object fsrc to file-like object fdst""" while 1: buf = fsrc.read(length) if not buf: break fdst.write(buf)
shutil.copyfile(src, dst)
拷贝文件 shutil.copyfile('os.py', 'os2.py')
def copyfile(src, dst): """Copy data from src to dst""" if _samefile(src, dst): raise Error("`%s` and `%s` are the same file" % (src, dst)) for fn in [src, dst]: try: st = os.stat(fn) except OSError: # File most likely does not exist pass else: # XXX What about other special files? (sockets, devices...) if stat.S_ISFIFO(st.st_mode): raise SpecialFileError("`%s` is a named pipe" % fn) with open(src, 'rb') as fsrc: with open(dst, 'wb') as fdst: copyfileobj(fsrc, fdst)
shutil.copymode(src, dst)
仅拷贝权限。内容、组、用户均不变
def copymode(src, dst): """Copy mode bits from src to dst""" if hasattr(os, 'chmod'): st = os.stat(src) mode = stat.S_IMODE(st.st_mode) os.chmod(dst, mode)
shutil.copystat(src, dst)
拷贝状态的信息,包括:mode bits, atime, mtime, flags
def copystat(src, dst): """Copy all stat info (mode bits, atime, mtime, flags) from src to dst""" st = os.stat(src) mode = stat.S_IMODE(st.st_mode) if hasattr(os, 'utime'): os.utime(dst, (st.st_atime, st.st_mtime)) if hasattr(os, 'chmod'): os.chmod(dst, mode) if hasattr(os, 'chflags') and hasattr(st, 'st_flags'): try: os.chflags(dst, st.st_flags) except OSError, why: for err in 'EOPNOTSUPP', 'ENOTSUP': if hasattr(errno, err) and why.errno == getattr(errno, err): break else: raise
shutil.copy(src, dst)
拷贝文件和权限
def copy(src, dst): """Copy data and mode bits ("cp src dst"). The destination may be a directory. """ if os.path.isdir(dst): dst = os.path.join(dst, os.path.basename(src)) copyfile(src, dst) copymode(src, dst)
shutil.copy2(src, dst)
拷贝文件和状态信息
def copy2(src, dst): """Copy data and all stat info ("cp -p src dst"). The destination may be a directory. """ if os.path.isdir(dst): dst = os.path.join(dst, os.path.basename(src)) copyfile(src, dst) copystat(src, dst)
shutil.ignore_patterns(*patterns)
shutil.copytree(src, dst, symlinks=False, ignore=None)
递归的去拷贝目录和文件
例如:copytree(source, destination, ignore=ignore_patterns('*.pyc', 'tmp*'))
def ignore_patterns(*patterns): """Function that can be used as copytree() ignore parameter. Patterns is a sequence of glob-style patterns that are used to exclude files""" def _ignore_patterns(path, names): ignored_names = [] for pattern in patterns: ignored_names.extend(fnmatch.filter(names, pattern)) return set(ignored_names) return _ignore_patterns def copytree(src, dst, symlinks=False, ignore=None): """Recursively copy a directory tree using copy2(). The destination directory must not already exist. If exception(s) occur, an Error is raised with a list of reasons. If the optional symlinks flag is true, symbolic links in the source tree result in symbolic links in the destination tree; if it is false, the contents of the files pointed to by symbolic links are copied. The optional ignore argument is a callable. If given, it is called with the `src` parameter, which is the directory being visited by copytree(), and `names` which is the list of `src` contents, as returned by os.listdir(): callable(src, names) -> ignored_names Since copytree() is called recursively, the callable will be called once for each directory that is copied. It returns a list of names relative to the `src` directory that should not be copied. XXX Consider this example code rather than the ultimate tool. """ names = os.listdir(src) if ignore is not None: ignored_names = ignore(src, names) else: ignored_names = set() os.makedirs(dst) errors = [] for name in names: if name in ignored_names: continue srcname = os.path.join(src, name) dstname = os.path.join(dst, name) try: if symlinks and os.path.islink(srcname): linkto = os.readlink(srcname) os.symlink(linkto, dstname) elif os.path.isdir(srcname): copytree(srcname, dstname, symlinks, ignore) else: # Will raise a SpecialFileError for unsupported file types copy2(srcname, dstname) # catch the Error from the recursive copytree so that we can # continue with other files except Error, err: errors.extend(err.args[0]) except EnvironmentError, why: errors.append((srcname, dstname, str(why))) try: copystat(src, dst) except OSError, why: if WindowsError is not None and isinstance(why, WindowsError): # Copying file access times may fail on Windows pass else: errors.append((src, dst, str(why))) if errors: raise Error, errors
shutil.rmtree(path[, ignore_errors[, onerror]])
递归的去删除目录和文件
def rmtree(path, ignore_errors=False, onerror=None): """Recursively delete a directory tree. If ignore_errors is set, errors are ignored; otherwise, if onerror is set, it is called to handle the error with arguments (func, path, exc_info) where func is os.listdir, os.remove, or os.rmdir; path is the argument to that function that caused it to fail; and exc_info is a tuple returned by sys.exc_info(). If ignore_errors is false and onerror is None, an exception is raised. """ if ignore_errors: def onerror(*args): pass elif onerror is None: def onerror(*args): raise try: if os.path.islink(path): # symlinks to directories are forbidden, see bug #1669 raise OSError("Cannot call rmtree on a symbolic link") except OSError: onerror(os.path.islink, path, sys.exc_info()) # can't continue even if onerror hook returns return names = [] try: names = os.listdir(path) except os.error, err: onerror(os.listdir, path, sys.exc_info()) for name in names: fullname = os.path.join(path, name) try: mode = os.lstat(fullname).st_mode except os.error: mode = 0 if stat.S_ISDIR(mode): rmtree(fullname, ignore_errors, onerror) else: try: os.remove(fullname) except os.error, err: onerror(os.remove, fullname, sys.exc_info()) try: os.rmdir(path) except os.error: onerror(os.rmdir, path, sys.exc_info())
shutil.move(src, dst)
递归的去移动文件
def move(src, dst): """Recursively move a file or directory to another location. This is similar to the Unix "mv" command. If the destination is a directory or a symlink to a directory, the source is moved inside the directory. The destination path must not already exist. If the destination already exists but is not a directory, it may be overwritten depending on os.rename() semantics. If the destination is on our current filesystem, then rename() is used. Otherwise, src is copied to the destination and then removed. A lot more could be done here... A look at a mv.c shows a lot of the issues this implementation glosses over. """ real_dst = dst if os.path.isdir(dst): if _samefile(src, dst): # We might be on a case insensitive filesystem, # perform the rename anyway. os.rename(src, dst) return real_dst = os.path.join(dst, _basename(src)) if os.path.exists(real_dst): raise Error, "Destination path '%s' already exists" % real_dst try: os.rename(src, real_dst) except OSError: if os.path.isdir(src): if _destinsrc(src, dst): raise Error, "Cannot move a directory '%s' into itself '%s'." % (src, dst) copytree(src, real_dst, symlinks=True) rmtree(src) else: copy2(src, real_dst) os.unlink(src)
shutil.make_archive(base_name, format,...)
创建压缩包并返回文件路径,例如:zip、tar (archive:归档之意)
-
- base_name: 压缩包的文件名,也可以是压缩包的路径。只是文件名时,则保存至当前目录,否则保存至指定路径,
如:www =>保存至当前路径
如:/Users/wupeiqi/www =>保存至/Users/wupeiqi/ - format: 压缩包种类,“zip”, “tar”, “bztar”,“gztar” ( zip:打包和压缩,tar:只打包不压缩)
- root_dir: 要压缩的文件夹路径(默认当前目录)
- owner: 用户,默认当前用户
- group: 组,默认当前组
- logger: 用于记录日志,通常是logging.Logger对象
- base_name: 压缩包的文件名,也可以是压缩包的路径。只是文件名时,则保存至当前目录,否则保存至指定路径,
shutil.make_archive('modulezip', 'zip', 'D:\\python\\uson\\5\\复习\\module')
#将 /Users/wupeiqi/Downloads/test 下的文件打包放置当前程序目录 import shutil ret = shutil.make_archive("wwwwwwwwww", 'gztar', root_dir='/Users/wupeiqi/Downloads/test') #将 /Users/wupeiqi/Downloads/test 下的文件打包放置 /Users/wupeiqi/目录 import shutil ret = shutil.make_archive("/Users/wupeiqi/wwwwwwwwww", 'gztar', root_dir='/Users/wupeiqi/Downloads/test')
#将 /Users/wupeiqi/Downloads/test 下的文件打包放置当前程序目录 import shutil ret = shutil.make_archive("wwwwwwwwww", 'gztar', root_dir='/Users/wupeiqi/Downloads/test') #将 /Users/wupeiqi/Downloads/test 下的文件打包放置 /Users/wupeiqi/目录 import shutil ret = shutil.make_archive("/Users/wupeiqi/wwwwwwwwww", 'gztar', root_dir='/Users/wupeiqi/Downloads/test')
shutil 对压缩包的处理是调用 ZipFile 和 TarFile 两个模块来进行的,详细:
import zipfile z = zipfile.ZipFile('os.zip', 'w') z.write('os.py') print("我还可以玩会") z.write('os2.py') z.close()
import zipfile # 压缩 z = zipfile.ZipFile('laxi.zip', 'w') z.write('a.log') z.write('data.data') z.close() # 解压 z = zipfile.ZipFile('laxi.zip', 'r') z.extractall() z.close()
import tarfile # 压缩 tar = tarfile.open('your.tar','w') tar.add('/Users/wupeiqi/PycharmProjects/bbs2.zip', arcname='bbs2.zip') tar.add('/Users/wupeiqi/PycharmProjects/cmdb.zip', arcname='cmdb.zip') tar.close() # 解压 tar = tarfile.open('your.tar','r') tar.extractall() # 可设置解压地址 tar.close()
class ZipFile(object): """ Class with methods to open, read, write, close, list zip files. z = ZipFile(file, mode="r", compression=ZIP_STORED, allowZip64=False) file: Either the path to the file, or a file-like object. If it is a path, the file will be opened and closed by ZipFile. mode: The mode can be either read "r", write "w" or append "a". compression: ZIP_STORED (no compression) or ZIP_DEFLATED (requires zlib). allowZip64: if True ZipFile will create files with ZIP64 extensions when needed, otherwise it will raise an exception when this would be necessary. """ fp = None # Set here since __del__ checks it def __init__(self, file, mode="r", compression=ZIP_STORED, allowZip64=False): """Open the ZIP file with mode read "r", write "w" or append "a".""" if mode not in ("r", "w", "a"): raise RuntimeError('ZipFile() requires mode "r", "w", or "a"') if compression == ZIP_STORED: pass elif compression == ZIP_DEFLATED: if not zlib: raise RuntimeError,\ "Compression requires the (missing) zlib module" else: raise RuntimeError, "That compression method is not supported" self._allowZip64 = allowZip64 self._didModify = False self.debug = 0 # Level of printing: 0 through 3 self.NameToInfo = {} # Find file info given name self.filelist = [] # List of ZipInfo instances for archive self.compression = compression # Method of compression self.mode = key = mode.replace('b', '')[0] self.pwd = None self._comment = '' # Check if we were passed a file-like object if isinstance(file, basestring): self._filePassed = 0 self.filename = file modeDict = {'r' : 'rb', 'w': 'wb', 'a' : 'r+b'} try: self.fp = open(file, modeDict[mode]) except IOError: if mode == 'a': mode = key = 'w' self.fp = open(file, modeDict[mode]) else: raise else: self._filePassed = 1 self.fp = file self.filename = getattr(file, 'name', None) try: if key == 'r': self._RealGetContents() elif key == 'w': # set the modified flag so central directory gets written # even if no files are added to the archive self._didModify = True elif key == 'a': try: # See if file is a zip file self._RealGetContents() # seek to start of directory and overwrite self.fp.seek(self.start_dir, 0) except BadZipfile: # file is not a zip file, just append self.fp.seek(0, 2) # set the modified flag so central directory gets written # even if no files are added to the archive self._didModify = True else: raise RuntimeError('Mode must be "r", "w" or "a"') except: fp = self.fp self.fp = None if not self._filePassed: fp.close() raise def __enter__(self): return self def __exit__(self, type, value, traceback): self.close() def _RealGetContents(self): """Read in the table of contents for the ZIP file.""" fp = self.fp try: endrec = _EndRecData(fp) except IOError: raise BadZipfile("File is not a zip file") if not endrec: raise BadZipfile, "File is not a zip file" if self.debug > 1: print endrec size_cd = endrec[_ECD_SIZE] # bytes in central directory offset_cd = endrec[_ECD_OFFSET] # offset of central directory self._comment = endrec[_ECD_COMMENT] # archive comment # "concat" is zero, unless zip was concatenated to another file concat = endrec[_ECD_LOCATION] - size_cd - offset_cd if endrec[_ECD_SIGNATURE] == stringEndArchive64: # If Zip64 extension structures are present, account for them concat -= (sizeEndCentDir64 + sizeEndCentDir64Locator) if self.debug > 2: inferred = concat + offset_cd print "given, inferred, offset", offset_cd, inferred, concat # self.start_dir: Position of start of central directory self.start_dir = offset_cd + concat fp.seek(self.start_dir, 0) data = fp.read(size_cd) fp = cStringIO.StringIO(data) total = 0 while total < size_cd: centdir = fp.read(sizeCentralDir) if len(centdir) != sizeCentralDir: raise BadZipfile("Truncated central directory") centdir = struct.unpack(structCentralDir, centdir) if centdir[_CD_SIGNATURE] != stringCentralDir: raise BadZipfile("Bad magic number for central directory") if self.debug > 2: print centdir filename = fp.read(centdir[_CD_FILENAME_LENGTH]) # Create ZipInfo instance to store file information x = ZipInfo(filename) x.extra = fp.read(centdir[_CD_EXTRA_FIELD_LENGTH]) x.comment = fp.read(centdir[_CD_COMMENT_LENGTH]) x.header_offset = centdir[_CD_LOCAL_HEADER_OFFSET] (x.create_version, x.create_system, x.extract_version, x.reserved, x.flag_bits, x.compress_type, t, d, x.CRC, x.compress_size, x.file_size) = centdir[1:12] x.volume, x.internal_attr, x.external_attr = centdir[15:18] # Convert date/time code to (year, month, day, hour, min, sec) x._raw_time = t x.date_time = ( (d>>9)+1980, (d>>5)&0xF, d&0x1F, t>>11, (t>>5)&0x3F, (t&0x1F) * 2 ) x._decodeExtra() x.header_offset = x.header_offset + concat x.filename = x._decodeFilename() self.filelist.append(x) self.NameToInfo[x.filename] = x # update total bytes read from central directory total = (total + sizeCentralDir + centdir[_CD_FILENAME_LENGTH] + centdir[_CD_EXTRA_FIELD_LENGTH] + centdir[_CD_COMMENT_LENGTH]) if self.debug > 2: print "total", total def namelist(self): """Return a list of file names in the archive.""" l = [] for data in self.filelist: l.append(data.filename) return l def infolist(self): """Return a list of class ZipInfo instances for files in the archive.""" return self.filelist def printdir(self): """Print a table of contents for the zip file.""" print "%-46s %19s %12s" % ("File Name", "Modified ", "Size") for zinfo in self.filelist: date = "%d-%02d-%02d %02d:%02d:%02d" % zinfo.date_time[:6] print "%-46s %s %12d" % (zinfo.filename, date, zinfo.file_size) def testzip(self): """Read all the files and check the CRC.""" chunk_size = 2 ** 20 for zinfo in self.filelist: try: # Read by chunks, to avoid an OverflowError or a # MemoryError with very large embedded files. with self.open(zinfo.filename, "r") as f: while f.read(chunk_size): # Check CRC-32 pass except BadZipfile: return zinfo.filename def getinfo(self, name): """Return the instance of ZipInfo given 'name'.""" info = self.NameToInfo.get(name) if info is None: raise KeyError( 'There is no item named %r in the archive' % name) return info def setpassword(self, pwd): """Set default password for encrypted files.""" self.pwd = pwd @property def comment(self): """The comment text associated with the ZIP file.""" return self._comment @comment.setter def comment(self, comment): # check for valid comment length if len(comment) > ZIP_MAX_COMMENT: import warnings warnings.warn('Archive comment is too long; truncating to %d bytes' % ZIP_MAX_COMMENT, stacklevel=2) comment = comment[:ZIP_MAX_COMMENT] self._comment = comment self._didModify = True def read(self, name, pwd=None): """Return file bytes (as a string) for name.""" return self.open(name, "r", pwd).read() def open(self, name, mode="r", pwd=None): """Return file-like object for 'name'.""" if mode not in ("r", "U", "rU"): raise RuntimeError, 'open() requires mode "r", "U", or "rU"' if not self.fp: raise RuntimeError, \ "Attempt to read ZIP archive that was already closed" # Only open a new file for instances where we were not # given a file object in the constructor if self._filePassed: zef_file = self.fp should_close = False else: zef_file = open(self.filename, 'rb') should_close = True try: # Make sure we have an info object if isinstance(name, ZipInfo): # 'name' is already an info object zinfo = name else: # Get info object for name zinfo = self.getinfo(name) zef_file.seek(zinfo.header_offset, 0) # Skip the file header: fheader = zef_file.read(sizeFileHeader) if len(fheader) != sizeFileHeader: raise BadZipfile("Truncated file header") fheader = struct.unpack(structFileHeader, fheader) if fheader[_FH_SIGNATURE] != stringFileHeader: raise BadZipfile("Bad magic number for file header") fname = zef_file.read(fheader[_FH_FILENAME_LENGTH]) if fheader[_FH_EXTRA_FIELD_LENGTH]: zef_file.read(fheader[_FH_EXTRA_FIELD_LENGTH]) if fname != zinfo.orig_filename: raise BadZipfile, \ 'File name in directory "%s" and header "%s" differ.' % ( zinfo.orig_filename, fname) # check for encrypted flag & handle password is_encrypted = zinfo.flag_bits & 0x1 zd = None if is_encrypted: if not pwd: pwd = self.pwd if not pwd: raise RuntimeError, "File %s is encrypted, " \ "password required for extraction" % name zd = _ZipDecrypter(pwd) # The first 12 bytes in the cypher stream is an encryption header # used to strengthen the algorithm. The first 11 bytes are # completely random, while the 12th contains the MSB of the CRC, # or the MSB of the file time depending on the header type # and is used to check the correctness of the password. bytes = zef_file.read(12) h = map(zd, bytes[0:12]) if zinfo.flag_bits & 0x8: # compare against the file type from extended local headers check_byte = (zinfo._raw_time >> 8) & 0xff else: # compare against the CRC otherwise check_byte = (zinfo.CRC >> 24) & 0xff if ord(h[11]) != check_byte: raise RuntimeError("Bad password for file", name) return ZipExtFile(zef_file, mode, zinfo, zd, close_fileobj=should_close) except: if should_close: zef_file.close() raise def extract(self, member, path=None, pwd=None): """Extract a member from the archive to the current working directory, using its full name. Its file information is extracted as accurately as possible. `member' may be a filename or a ZipInfo object. You can specify a different directory using `path'. """ if not isinstance(member, ZipInfo): member = self.getinfo(member) if path is None: path = os.getcwd() return self._extract_member(member, path, pwd) def extractall(self, path=None, members=None, pwd=None): """Extract all members from the archive to the current working directory. `path' specifies a different directory to extract to. `members' is optional and must be a subset of the list returned by namelist(). """ if members is None: members = self.namelist() for zipinfo in members: self.extract(zipinfo, path, pwd) def _extract_member(self, member, targetpath, pwd): """Extract the ZipInfo object 'member' to a physical file on the path targetpath. """ # build the destination pathname, replacing # forward slashes to platform specific separators. arcname = member.filename.replace('/', os.path.sep) if os.path.altsep: arcname = arcname.replace(os.path.altsep, os.path.sep) # interpret absolute pathname as relative, remove drive letter or # UNC path, redundant separators, "." and ".." components. arcname = os.path.splitdrive(arcname)[1] arcname = os.path.sep.join(x for x in arcname.split(os.path.sep) if x not in ('', os.path.curdir, os.path.pardir)) if os.path.sep == '\\': # filter illegal characters on Windows illegal = ':<>|"?*' if isinstance(arcname, unicode): table = {ord(c): ord('_') for c in illegal} else: table = string.maketrans(illegal, '_' * len(illegal)) arcname = arcname.translate(table) # remove trailing dots arcname = (x.rstrip('.') for x in arcname.split(os.path.sep)) arcname = os.path.sep.join(x for x in arcname if x) targetpath = os.path.join(targetpath, arcname) targetpath = os.path.normpath(targetpath) # Create all upper directories if necessary. upperdirs = os.path.dirname(targetpath) if upperdirs and not os.path.exists(upperdirs): os.makedirs(upperdirs) if member.filename[-1] == '/': if not os.path.isdir(targetpath): os.mkdir(targetpath) return targetpath with self.open(member, pwd=pwd) as source, \ file(targetpath, "wb") as target: shutil.copyfileobj(source, target) return targetpath def _writecheck(self, zinfo): """Check for errors before writing a file to the archive.""" if zinfo.filename in self.NameToInfo: import warnings warnings.warn('Duplicate name: %r' % zinfo.filename, stacklevel=3) if self.mode not in ("w", "a"): raise RuntimeError, 'write() requires mode "w" or "a"' if not self.fp: raise RuntimeError, \ "Attempt to write ZIP archive that was already closed" if zinfo.compress_type == ZIP_DEFLATED and not zlib: raise RuntimeError, \ "Compression requires the (missing) zlib module" if zinfo.compress_type not in (ZIP_STORED, ZIP_DEFLATED): raise RuntimeError, \ "That compression method is not supported" if not self._allowZip64: requires_zip64 = None if len(self.filelist) >= ZIP_FILECOUNT_LIMIT: requires_zip64 = "Files count" elif zinfo.file_size > ZIP64_LIMIT: requires_zip64 = "Filesize" elif zinfo.header_offset > ZIP64_LIMIT: requires_zip64 = "Zipfile size" if requires_zip64: raise LargeZipFile(requires_zip64 + " would require ZIP64 extensions") def write(self, filename, arcname=None, compress_type=None): """Put the bytes from filename into the archive under the name arcname.""" if not self.fp: raise RuntimeError( "Attempt to write to ZIP archive that was already closed") st = os.stat(filename) isdir = stat.S_ISDIR(st.st_mode) mtime = time.localtime(st.st_mtime) date_time = mtime[0:6] # Create ZipInfo instance to store file information if arcname is None: arcname = filename arcname = os.path.normpath(os.path.splitdrive(arcname)[1]) while arcname[0] in (os.sep, os.altsep): arcname = arcname[1:] if isdir: arcname += '/' zinfo = ZipInfo(arcname, date_time) zinfo.external_attr = (st[0] & 0xFFFF) << 16L # Unix attributes if compress_type is None: zinfo.compress_type = self.compression else: zinfo.compress_type = compress_type zinfo.file_size = st.st_size zinfo.flag_bits = 0x00 zinfo.header_offset = self.fp.tell() # Start of header bytes self._writecheck(zinfo) self._didModify = True if isdir: zinfo.file_size = 0 zinfo.compress_size = 0 zinfo.CRC = 0 zinfo.external_attr |= 0x10 # MS-DOS directory flag self.filelist.append(zinfo) self.NameToInfo[zinfo.filename] = zinfo self.fp.write(zinfo.FileHeader(False)) return with open(filename, "rb") as fp: # Must overwrite CRC and sizes with correct data later zinfo.CRC = CRC = 0 zinfo.compress_size = compress_size = 0 # Compressed size can be larger than uncompressed size zip64 = self._allowZip64 and \ zinfo.file_size * 1.05 > ZIP64_LIMIT self.fp.write(zinfo.FileHeader(zip64)) if zinfo.compress_type == ZIP_DEFLATED: cmpr = zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION, zlib.DEFLATED, -15) else: cmpr = None file_size = 0 while 1: buf = fp.read(1024 * 8) if not buf: break file_size = file_size + len(buf) CRC = crc32(buf, CRC) & 0xffffffff if cmpr: buf = cmpr.compress(buf) compress_size = compress_size + len(buf) self.fp.write(buf) if cmpr: buf = cmpr.flush() compress_size = compress_size + len(buf) self.fp.write(buf) zinfo.compress_size = compress_size else: zinfo.compress_size = file_size zinfo.CRC = CRC zinfo.file_size = file_size if not zip64 and self._allowZip64: if file_size > ZIP64_LIMIT: raise RuntimeError('File size has increased during compressing') if compress_size > ZIP64_LIMIT: raise RuntimeError('Compressed size larger than uncompressed size') # Seek backwards and write file header (which will now include # correct CRC and file sizes) position = self.fp.tell() # Preserve current position in file self.fp.seek(zinfo.header_offset, 0) self.fp.write(zinfo.FileHeader(zip64)) self.fp.seek(position, 0) self.filelist.append(zinfo) self.NameToInfo[zinfo.filename] = zinfo def writestr(self, zinfo_or_arcname, bytes, compress_type=None): """Write a file into the archive. The contents is the string 'bytes'. 'zinfo_or_arcname' is either a ZipInfo instance or the name of the file in the archive.""" if not isinstance(zinfo_or_arcname, ZipInfo): zinfo = ZipInfo(filename=zinfo_or_arcname, date_time=time.localtime(time.time())[:6]) zinfo.compress_type = self.compression if zinfo.filename[-1] == '/': zinfo.external_attr = 0o40775 << 16 # drwxrwxr-x zinfo.external_attr |= 0x10 # MS-DOS directory flag else: zinfo.external_attr = 0o600 << 16 # ?rw------- else: zinfo = zinfo_or_arcname if not self.fp: raise RuntimeError( "Attempt to write to ZIP archive that was already closed") if compress_type is not None: zinfo.compress_type = compress_type zinfo.file_size = len(bytes) # Uncompressed size zinfo.header_offset = self.fp.tell() # Start of header bytes self._writecheck(zinfo) self._didModify = True zinfo.CRC = crc32(bytes) & 0xffffffff # CRC-32 checksum if zinfo.compress_type == ZIP_DEFLATED: co = zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION, zlib.DEFLATED, -15) bytes = co.compress(bytes) + co.flush() zinfo.compress_size = len(bytes) # Compressed size else: zinfo.compress_size = zinfo.file_size zip64 = zinfo.file_size > ZIP64_LIMIT or \ zinfo.compress_size > ZIP64_LIMIT if zip64 and not self._allowZip64: raise LargeZipFile("Filesize would require ZIP64 extensions") self.fp.write(zinfo.FileHeader(zip64)) self.fp.write(bytes) if zinfo.flag_bits & 0x08: # Write CRC and file sizes after the file data fmt = '<LQQ' if zip64 else '<LLL' self.fp.write(struct.pack(fmt, zinfo.CRC, zinfo.compress_size, zinfo.file_size)) self.fp.flush() self.filelist.append(zinfo) self.NameToInfo[zinfo.filename] = zinfo def __del__(self): """Call the "close()" method in case the user forgot.""" self.close() def close(self): """Close the file, and for mode "w" and "a" write the ending records.""" if self.fp is None: return try: if self.mode in ("w", "a") and self._didModify: # write ending records pos1 = self.fp.tell() for zinfo in self.filelist: # write central directory dt = zinfo.date_time dosdate = (dt[0] - 1980) << 9 | dt[1] << 5 | dt[2] dostime = dt[3] << 11 | dt[4] << 5 | (dt[5] // 2) extra = [] if zinfo.file_size > ZIP64_LIMIT \ or zinfo.compress_size > ZIP64_LIMIT: extra.append(zinfo.file_size) extra.append(zinfo.compress_size) file_size = 0xffffffff compress_size = 0xffffffff else: file_size = zinfo.file_size compress_size = zinfo.compress_size if zinfo.header_offset > ZIP64_LIMIT: extra.append(zinfo.header_offset) header_offset = 0xffffffffL else: header_offset = zinfo.header_offset extra_data = zinfo.extra if extra: # Append a ZIP64 field to the extra's extra_data = struct.pack( '<HH' + 'Q'*len(extra), 1, 8*len(extra), *extra) + extra_data extract_version = max(45, zinfo.extract_version) create_version = max(45, zinfo.create_version) else: extract_version = zinfo.extract_version create_version = zinfo.create_version try: filename, flag_bits = zinfo._encodeFilenameFlags() centdir = struct.pack(structCentralDir, stringCentralDir, create_version, zinfo.create_system, extract_version, zinfo.reserved, flag_bits, zinfo.compress_type, dostime, dosdate, zinfo.CRC, compress_size, file_size, len(filename), len(extra_data), len(zinfo.comment), 0, zinfo.internal_attr, zinfo.external_attr, header_offset) except DeprecationWarning: print >>sys.stderr, (structCentralDir, stringCentralDir, create_version, zinfo.create_system, extract_version, zinfo.reserved, zinfo.flag_bits, zinfo.compress_type, dostime, dosdate, zinfo.CRC, compress_size, file_size, len(zinfo.filename), len(extra_data), len(zinfo.comment), 0, zinfo.internal_attr, zinfo.external_attr, header_offset) raise self.fp.write(centdir) self.fp.write(filename) self.fp.write(extra_data) self.fp.write(zinfo.comment) pos2 = self.fp.tell() # Write end-of-zip-archive record centDirCount = len(self.filelist) centDirSize = pos2 - pos1 centDirOffset = pos1 requires_zip64 = None if centDirCount > ZIP_FILECOUNT_LIMIT: requires_zip64 = "Files count" elif centDirOffset > ZIP64_LIMIT: requires_zip64 = "Central directory offset" elif centDirSize > ZIP64_LIMIT: requires_zip64 = "Central directory size" if requires_zip64: # Need to write the ZIP64 end-of-archive records if not self._allowZip64: raise LargeZipFile(requires_zip64 + " would require ZIP64 extensions") zip64endrec = struct.pack( structEndArchive64, stringEndArchive64, 44, 45, 45, 0, 0, centDirCount, centDirCount, centDirSize, centDirOffset) self.fp.write(zip64endrec) zip64locrec = struct.pack( structEndArchive64Locator, stringEndArchive64Locator, 0, pos2, 1) self.fp.write(zip64locrec) centDirCount = min(centDirCount, 0xFFFF) centDirSize = min(centDirSize, 0xFFFFFFFF) centDirOffset = min(centDirOffset, 0xFFFFFFFF) endrec = struct.pack(structEndArchive, stringEndArchive, 0, 0, centDirCount, centDirCount, centDirSize, centDirOffset, len(self._comment)) self.fp.write(endrec) self.fp.write(self._comment) self.fp.flush() finally: fp = self.fp self.fp = None if not self._filePassed: fp.close()
class TarFile(object): """The TarFile Class provides an interface to tar archives. """ debug = 0 # May be set from 0 (no msgs) to 3 (all msgs) dereference = False # If true, add content of linked file to the # tar file, else the link. ignore_zeros = False # If true, skips empty or invalid blocks and # continues processing. errorlevel = 1 # If 0, fatal errors only appear in debug # messages (if debug >= 0). If > 0, errors # are passed to the caller as exceptions. format = DEFAULT_FORMAT # The format to use when creating an archive. encoding = ENCODING # Encoding for 8-bit character strings. errors = None # Error handler for unicode conversion. tarinfo = TarInfo # The default TarInfo class to use. fileobject = ExFileObject # The default ExFileObject class to use. def __init__(self, name=None, mode="r", fileobj=None, format=None, tarinfo=None, dereference=None, ignore_zeros=None, encoding=None, errors=None, pax_headers=None, debug=None, errorlevel=None): """Open an (uncompressed) tar archive `name'. `mode' is either 'r' to read from an existing archive, 'a' to append data to an existing file or 'w' to create a new file overwriting an existing one. `mode' defaults to 'r'. If `fileobj' is given, it is used for reading or writing data. If it can be determined, `mode' is overridden by `fileobj's mode. `fileobj' is not closed, when TarFile is closed. """ modes = {"r": "rb", "a": "r+b", "w": "wb"} if mode not in modes: raise ValueError("mode must be 'r', 'a' or 'w'") self.mode = mode self._mode = modes[mode] if not fileobj: if self.mode == "a" and not os.path.exists(name): # Create nonexistent files in append mode. self.mode = "w" self._mode = "wb" fileobj = bltn_open(name, self._mode) self._extfileobj = False else: if name is None and hasattr(fileobj, "name"): name = fileobj.name if hasattr(fileobj, "mode"): self._mode = fileobj.mode self._extfileobj = True self.name = os.path.abspath(name) if name else None self.fileobj = fileobj # Init attributes. if format is not None: self.format = format if tarinfo is not None: self.tarinfo = tarinfo if dereference is not None: self.dereference = dereference if ignore_zeros is not None: self.ignore_zeros = ignore_zeros if encoding is not None: self.encoding = encoding if errors is not None: self.errors = errors elif mode == "r": self.errors = "utf-8" else: self.errors = "strict" if pax_headers is not None and self.format == PAX_FORMAT: self.pax_headers = pax_headers else: self.pax_headers = {} if debug is not None: self.debug = debug if errorlevel is not None: self.errorlevel = errorlevel # Init datastructures. self.closed = False self.members = [] # list of members as TarInfo objects self._loaded = False # flag if all members have been read self.offset = self.fileobj.tell() # current position in the archive file self.inodes = {} # dictionary caching the inodes of # archive members already added try: if self.mode == "r": self.firstmember = None self.firstmember = self.next() if self.mode == "a": # Move to the end of the archive, # before the first empty block. while True: self.fileobj.seek(self.offset) try: tarinfo = self.tarinfo.fromtarfile(self) self.members.append(tarinfo) except EOFHeaderError: self.fileobj.seek(self.offset) break except HeaderError, e: raise ReadError(str(e)) if self.mode in "aw": self._loaded = True if self.pax_headers: buf = self.tarinfo.create_pax_global_header(self.pax_headers.copy()) self.fileobj.write(buf) self.offset += len(buf) except: if not self._extfileobj: self.fileobj.close() self.closed = True raise def _getposix(self): return self.format == USTAR_FORMAT def _setposix(self, value): import warnings warnings.warn("use the format attribute instead", DeprecationWarning, 2) if value: self.format = USTAR_FORMAT else: self.format = GNU_FORMAT posix = property(_getposix, _setposix) #-------------------------------------------------------------------------- # Below are the classmethods which act as alternate constructors to the # TarFile class. The open() method is the only one that is needed for # public use; it is the "super"-constructor and is able to select an # adequate "sub"-constructor for a particular compression using the mapping # from OPEN_METH. # # This concept allows one to subclass TarFile without losing the comfort of # the super-constructor. A sub-constructor is registered and made available # by adding it to the mapping in OPEN_METH. @classmethod def open(cls, name=None, mode="r", fileobj=None, bufsize=RECORDSIZE, **kwargs): """Open a tar archive for reading, writing or appending. Return an appropriate TarFile class. mode: 'r' or 'r:*' open for reading with transparent compression 'r:' open for reading exclusively uncompressed 'r:gz' open for reading with gzip compression 'r:bz2' open for reading with bzip2 compression 'a' or 'a:' open for appending, creating the file if necessary 'w' or 'w:' open for writing without compression 'w:gz' open for writing with gzip compression 'w:bz2' open for writing with bzip2 compression 'r|*' open a stream of tar blocks with transparent compression 'r|' open an uncompressed stream of tar blocks for reading 'r|gz' open a gzip compressed stream of tar blocks 'r|bz2' open a bzip2 compressed stream of tar blocks 'w|' open an uncompressed stream for writing 'w|gz' open a gzip compressed stream for writing 'w|bz2' open a bzip2 compressed stream for writing """ if not name and not fileobj: raise ValueError("nothing to open") if mode in ("r", "r:*"): # Find out which *open() is appropriate for opening the file. for comptype in cls.OPEN_METH: func = getattr(cls, cls.OPEN_METH[comptype]) if fileobj is not None: saved_pos = fileobj.tell() try: return func(name, "r", fileobj, **kwargs) except (ReadError, CompressionError), e: if fileobj is not None: fileobj.seek(saved_pos) continue raise ReadError("file could not be opened successfully") elif ":" in mode: filemode, comptype = mode.split(":", 1) filemode = filemode or "r" comptype = comptype or "tar" # Select the *open() function according to # given compression. if comptype in cls.OPEN_METH: func = getattr(cls, cls.OPEN_METH[comptype]) else: raise CompressionError("unknown compression type %r" % comptype) return func(name, filemode, fileobj, **kwargs) elif "|" in mode: filemode, comptype = mode.split("|", 1) filemode = filemode or "r" comptype = comptype or "tar" if filemode not in ("r", "w"): raise ValueError("mode must be 'r' or 'w'") stream = _Stream(name, filemode, comptype, fileobj, bufsize) try: t = cls(name, filemode, stream, **kwargs) except: stream.close() raise t._extfileobj = False return t elif mode in ("a", "w"): return cls.taropen(name, mode, fileobj, **kwargs) raise ValueError("undiscernible mode") @classmethod def taropen(cls, name, mode="r", fileobj=None, **kwargs): """Open uncompressed tar archive name for reading or writing. """ if mode not in ("r", "a", "w"): raise ValueError("mode must be 'r', 'a' or 'w'") return cls(name, mode, fileobj, **kwargs) @classmethod def gzopen(cls, name, mode="r", fileobj=None, compresslevel=9, **kwargs): """Open gzip compressed tar archive name for reading or writing. Appending is not allowed. """ if mode not in ("r", "w"): raise ValueError("mode must be 'r' or 'w'") try: import gzip gzip.GzipFile except (ImportError, AttributeError): raise CompressionError("gzip module is not available") try: fileobj = gzip.GzipFile(name, mode, compresslevel, fileobj) except OSError: if fileobj is not None and mode == 'r': raise ReadError("not a gzip file") raise try: t = cls.taropen(name, mode, fileobj, **kwargs) except IOError: fileobj.close() if mode == 'r': raise ReadError("not a gzip file") raise except: fileobj.close() raise t._extfileobj = False return t @classmethod def bz2open(cls, name, mode="r", fileobj=None, compresslevel=9, **kwargs): """Open bzip2 compressed tar archive name for reading or writing. Appending is not allowed. """ if mode not in ("r", "w"): raise ValueError("mode must be 'r' or 'w'.") try: import bz2 except ImportError: raise CompressionError("bz2 module is not available") if fileobj is not None: fileobj = _BZ2Proxy(fileobj, mode) else: fileobj = bz2.BZ2File(name, mode, compresslevel=compresslevel) try: t = cls.taropen(name, mode, fileobj, **kwargs) except (IOError, EOFError): fileobj.close() if mode == 'r': raise ReadError("not a bzip2 file") raise except: fileobj.close() raise t._extfileobj = False return t # All *open() methods are registered here. OPEN_METH = { "tar": "taropen", # uncompressed tar "gz": "gzopen", # gzip compressed tar "bz2": "bz2open" # bzip2 compressed tar } #-------------------------------------------------------------------------- # The public methods which TarFile provides: def close(self): """Close the TarFile. In write-mode, two finishing zero blocks are appended to the archive. """ if self.closed: return if self.mode in "aw": self.fileobj.write(NUL * (BLOCKSIZE * 2)) self.offset += (BLOCKSIZE * 2) # fill up the end with zero-blocks # (like option -b20 for tar does) blocks, remainder = divmod(self.offset, RECORDSIZE) if remainder > 0: self.fileobj.write(NUL * (RECORDSIZE - remainder)) if not self._extfileobj: self.fileobj.close() self.closed = True def getmember(self, name): """Return a TarInfo object for member `name'. If `name' can not be found in the archive, KeyError is raised. If a member occurs more than once in the archive, its last occurrence is assumed to be the most up-to-date version. """ tarinfo = self._getmember(name) if tarinfo is None: raise KeyError("filename %r not found" % name) return tarinfo def getmembers(self): """Return the members of the archive as a list of TarInfo objects. The list has the same order as the members in the archive. """ self._check() if not self._loaded: # if we want to obtain a list of self._load() # all members, we first have to # scan the whole archive. return self.members def getnames(self): """Return the members of the archive as a list of their names. It has the same order as the list returned by getmembers(). """ return [tarinfo.name for tarinfo in self.getmembers()] def gettarinfo(self, name=None, arcname=None, fileobj=None): """Create a TarInfo object for either the file `name' or the file object `fileobj' (using os.fstat on its file descriptor). You can modify some of the TarInfo's attributes before you add it using addfile(). If given, `arcname' specifies an alternative name for the file in the archive. """ self._check("aw") # When fileobj is given, replace name by # fileobj's real name. if fileobj is not None: name = fileobj.name # Building the name of the member in the archive. # Backward slashes are converted to forward slashes, # Absolute paths are turned to relative paths. if arcname is None: arcname = name drv, arcname = os.path.splitdrive(arcname) arcname = arcname.replace(os.sep, "/") arcname = arcname.lstrip("/") # Now, fill the TarInfo object with # information specific for the file. tarinfo = self.tarinfo() tarinfo.tarfile = self # Use os.stat or os.lstat, depending on platform # and if symlinks shall be resolved. if fileobj is None: if hasattr(os, "lstat") and not self.dereference: statres = os.lstat(name) else: statres = os.stat(name) else: statres = os.fstat(fileobj.fileno()) linkname = "" stmd = statres.st_mode if stat.S_ISREG(stmd): inode = (statres.st_ino, statres.st_dev) if not self.dereference and statres.st_nlink > 1 and \ inode in self.inodes and arcname != self.inodes[inode]: # Is it a hardlink to an already # archived file? type = LNKTYPE linkname = self.inodes[inode] else: # The inode is added only if its valid. # For win32 it is always 0. type = REGTYPE if inode[0]: self.inodes[inode] = arcname elif stat.S_ISDIR(stmd): type = DIRTYPE elif stat.S_ISFIFO(stmd): type = FIFOTYPE elif stat.S_ISLNK(stmd): type = SYMTYPE linkname = os.readlink(name) elif stat.S_ISCHR(stmd): type = CHRTYPE elif stat.S_ISBLK(stmd): type = BLKTYPE else: return None # Fill the TarInfo object with all # information we can get. tarinfo.name = arcname tarinfo.mode = stmd tarinfo.uid = statres.st_uid tarinfo.gid = statres.st_gid if type == REGTYPE: tarinfo.size = statres.st_size else: tarinfo.size = 0L tarinfo.mtime = statres.st_mtime tarinfo.type = type tarinfo.linkname = linkname if pwd: try: tarinfo.uname = pwd.getpwuid(tarinfo.uid)[0] except KeyError: pass if grp: try: tarinfo.gname = grp.getgrgid(tarinfo.gid)[0] except KeyError: pass if type in (CHRTYPE, BLKTYPE): if hasattr(os, "major") and hasattr(os, "minor"): tarinfo.devmajor = os.major(statres.st_rdev) tarinfo.devminor = os.minor(statres.st_rdev) return tarinfo def list(self, verbose=True): """Print a table of contents to sys.stdout. If `verbose' is False, only the names of the members are printed. If it is True, an `ls -l'-like output is produced. """ self._check() for tarinfo in self: if verbose: print filemode(tarinfo.mode), print "%s/%s" % (tarinfo.uname or tarinfo.uid, tarinfo.gname or tarinfo.gid), if tarinfo.ischr() or tarinfo.isblk(): print "%10s" % ("%d,%d" \ % (tarinfo.devmajor, tarinfo.devminor)), else: print "%10d" % tarinfo.size, print "%d-%02d-%02d %02d:%02d:%02d" \ % time.localtime(tarinfo.mtime)[:6], print tarinfo.name + ("/" if tarinfo.isdir() else ""), if verbose: if tarinfo.issym(): print "->", tarinfo.linkname, if tarinfo.islnk(): print "link to", tarinfo.linkname, print def add(self, name, arcname=None, recursive=True, exclude=None, filter=None): """Add the file `name' to the archive. `name' may be any type of file (directory, fifo, symbolic link, etc.). If given, `arcname' specifies an alternative name for the file in the archive. Directories are added recursively by default. This can be avoided by setting `recursive' to False. `exclude' is a function that should return True for each filename to be excluded. `filter' is a function that expects a TarInfo object argument and returns the changed TarInfo object, if it returns None the TarInfo object will be excluded from the archive. """ self._check("aw") if arcname is None: arcname = name # Exclude pathnames. if exclude is not None: import warnings warnings.warn("use the filter argument instead", DeprecationWarning, 2) if exclude(name): self._dbg(2, "tarfile: Excluded %r" % name) return # Skip if somebody tries to archive the archive... if self.name is not None and os.path.abspath(name) == self.name: self._dbg(2, "tarfile: Skipped %r" % name) return self._dbg(1, name) # Create a TarInfo object from the file. tarinfo = self.gettarinfo(name, arcname) if tarinfo is None: self._dbg(1, "tarfile: Unsupported type %r" % name) return # Change or exclude the TarInfo object. if filter is not None: tarinfo = filter(tarinfo) if tarinfo is None: self._dbg(2, "tarfile: Excluded %r" % name) return # Append the tar header and data to the archive. if tarinfo.isreg(): with bltn_open(name, "rb") as f: self.addfile(tarinfo, f) elif tarinfo.isdir(): self.addfile(tarinfo) if recursive: for f in os.listdir(name): self.add(os.path.join(name, f), os.path.join(arcname, f), recursive, exclude, filter) else: self.addfile(tarinfo) def addfile(self, tarinfo, fileobj=None): """Add the TarInfo object `tarinfo' to the archive. If `fileobj' is given, tarinfo.size bytes are read from it and added to the archive. You can create TarInfo objects using gettarinfo(). On Windows platforms, `fileobj' should always be opened with mode 'rb' to avoid irritation about the file size. """ self._check("aw") tarinfo = copy.copy(tarinfo) buf = tarinfo.tobuf(self.format, self.encoding, self.errors) self.fileobj.write(buf) self.offset += len(buf) # If there's data to follow, append it. if fileobj is not None: copyfileobj(fileobj, self.fileobj, tarinfo.size) blocks, remainder = divmod(tarinfo.size, BLOCKSIZE) if remainder > 0: self.fileobj.write(NUL * (BLOCKSIZE - remainder)) blocks += 1 self.offset += blocks * BLOCKSIZE self.members.append(tarinfo) def extractall(self, path=".", members=None): """Extract all members from the archive to the current working directory and set owner, modification time and permissions on directories afterwards. `path' specifies a different directory to extract to. `members' is optional and must be a subset of the list returned by getmembers(). """ directories = [] if members is None: members = self for tarinfo in members: if tarinfo.isdir(): # Extract directories with a safe mode. directories.append(tarinfo) tarinfo = copy.copy(tarinfo) tarinfo.mode = 0700 self.extract(tarinfo, path) # Reverse sort directories. directories.sort(key=operator.attrgetter('name')) directories.reverse() # Set correct owner, mtime and filemode on directories. for tarinfo in directories: dirpath = os.path.join(path, tarinfo.name) try: self.chown(tarinfo, dirpath) self.utime(tarinfo, dirpath) self.chmod(tarinfo, dirpath) except ExtractError, e: if self.errorlevel > 1: raise else: self._dbg(1, "tarfile: %s" % e) def extract(self, member, path=""): """Extract a member from the archive to the current working directory, using its full name. Its file information is extracted as accurately as possible. `member' may be a filename or a TarInfo object. You can specify a different directory using `path'. """ self._check("r") if isinstance(member, basestring): tarinfo = self.getmember(member) else: tarinfo = member # Prepare the link target for makelink(). if tarinfo.islnk(): tarinfo._link_target = os.path.join(path, tarinfo.linkname) try: self._extract_member(tarinfo, os.path.join(path, tarinfo.name)) except EnvironmentError, e: if self.errorlevel > 0: raise else: if e.filename is None: self._dbg(1, "tarfile: %s" % e.strerror) else: self._dbg(1, "tarfile: %s %r" % (e.strerror, e.filename)) except ExtractError, e: if self.errorlevel > 1: raise else: self._dbg(1, "tarfile: %s" % e) def extractfile(self, member): """Extract a member from the archive as a file object. `member' may be a filename or a TarInfo object. If `member' is a regular file, a file-like object is returned. If `member' is a link, a file-like object is constructed from the link's target. If `member' is none of the above, None is returned. The file-like object is read-only and provides the following methods: read(), readline(), readlines(), seek() and tell() """ self._check("r") if isinstance(member, basestring): tarinfo = self.getmember(member) else: tarinfo = member if tarinfo.isreg(): return self.fileobject(self, tarinfo) elif tarinfo.type not in SUPPORTED_TYPES: # If a member's type is unknown, it is treated as a # regular file. return self.fileobject(self, tarinfo) elif tarinfo.islnk() or tarinfo.issym(): if isinstance(self.fileobj, _Stream): # A small but ugly workaround for the case that someone tries # to extract a (sym)link as a file-object from a non-seekable # stream of tar blocks. raise StreamError("cannot extract (sym)link as file object") else: # A (sym)link's file object is its target's file object. return self.extractfile(self._find_link_target(tarinfo)) else: # If there's no data associated with the member (directory, chrdev, # blkdev, etc.), return None instead of a file object. return None def _extract_member(self, tarinfo, targetpath): """Extract the TarInfo object tarinfo to a physical file called targetpath. """ # Fetch the TarInfo object for the given name # and build the destination pathname, replacing # forward slashes to platform specific separators. targetpath = targetpath.rstrip("/") targetpath = targetpath.replace("/", os.sep) # Create all upper directories. upperdirs = os.path.dirname(targetpath) if upperdirs and not os.path.exists(upperdirs): # Create directories that are not part of the archive with # default permissions. os.makedirs(upperdirs) if tarinfo.islnk() or tarinfo.issym(): self._dbg(1, "%s -> %s" % (tarinfo.name, tarinfo.linkname)) else: self._dbg(1, tarinfo.name) if tarinfo.isreg(): self.makefile(tarinfo, targetpath) elif tarinfo.isdir(): self.makedir(tarinfo, targetpath) elif tarinfo.isfifo(): self.makefifo(tarinfo, targetpath) elif tarinfo.ischr() or tarinfo.isblk(): self.makedev(tarinfo, targetpath) elif tarinfo.islnk() or tarinfo.issym(): self.makelink(tarinfo, targetpath) elif tarinfo.type not in SUPPORTED_TYPES: self.makeunknown(tarinfo, targetpath) else: self.makefile(tarinfo, targetpath) self.chown(tarinfo, targetpath) if not tarinfo.issym(): self.chmod(tarinfo, targetpath) self.utime(tarinfo, targetpath) #-------------------------------------------------------------------------- # Below are the different file methods. They are called via # _extract_member() when extract() is called. They can be replaced in a # subclass to implement other functionality. def makedir(self, tarinfo, targetpath): """Make a directory called targetpath. """ try: # Use a safe mode for the directory, the real mode is set # later in _extract_member(). os.mkdir(targetpath, 0700) except EnvironmentError, e: if e.errno != errno.EEXIST: raise def makefile(self, tarinfo, targetpath): """Make a file called targetpath. """ source = self.extractfile(tarinfo) try: with bltn_open(targetpath, "wb") as target: copyfileobj(source, target) finally: source.close() def makeunknown(self, tarinfo, targetpath): """Make a file from a TarInfo object with an unknown type at targetpath. """ self.makefile(tarinfo, targetpath) self._dbg(1, "tarfile: Unknown file type %r, " \ "extracted as regular file." % tarinfo.type) def makefifo(self, tarinfo, targetpath): """Make a fifo called targetpath. """ if hasattr(os, "mkfifo"): os.mkfifo(targetpath) else: raise ExtractError("fifo not supported by system") def makedev(self, tarinfo, targetpath): """Make a character or block device called targetpath. """ if not hasattr(os, "mknod") or not hasattr(os, "makedev"): raise ExtractError("special devices not supported by system") mode = tarinfo.mode if tarinfo.isblk(): mode |= stat.S_IFBLK else: mode |= stat.S_IFCHR os.mknod(targetpath, mode, os.makedev(tarinfo.devmajor, tarinfo.devminor)) def makelink(self, tarinfo, targetpath): """Make a (symbolic) link called targetpath. If it cannot be created (platform limitation), we try to make a copy of the referenced file instead of a link. """ if hasattr(os, "symlink") and hasattr(os, "link"): # For systems that support symbolic and hard links. if tarinfo.issym(): if os.path.lexists(targetpath): os.unlink(targetpath) os.symlink(tarinfo.linkname, targetpath) else: # See extract(). if os.path.exists(tarinfo._link_target): if os.path.lexists(targetpath): os.unlink(targetpath) os.link(tarinfo._link_target, targetpath) else: self._extract_member(self._find_link_target(tarinfo), targetpath) else: try: self._extract_member(self._find_link_target(tarinfo), targetpath) except KeyError: raise ExtractError("unable to resolve link inside archive") def chown(self, tarinfo, targetpath): """Set owner of targetpath according to tarinfo. """ if pwd and hasattr(os, "geteuid") and os.geteuid() == 0: # We have to be root to do so. try: g = grp.getgrnam(tarinfo.gname)[2] except KeyError: g = tarinfo.gid try: u = pwd.getpwnam(tarinfo.uname)[2] except KeyError: u = tarinfo.uid try: if tarinfo.issym() and hasattr(os, "lchown"): os.lchown(targetpath, u, g) else: if sys.platform != "os2emx": os.chown(targetpath, u, g) except EnvironmentError, e: raise ExtractError("could not change owner") def chmod(self, tarinfo, targetpath): """Set file permissions of targetpath according to tarinfo. """ if hasattr(os, 'chmod'): try: os.chmod(targetpath, tarinfo.mode) except EnvironmentError, e: raise ExtractError("could not change mode") def utime(self, tarinfo, targetpath): """Set modification time of targetpath according to tarinfo. """ if not hasattr(os, 'utime'): return try: os.utime(targetpath, (tarinfo.mtime, tarinfo.mtime)) except EnvironmentError, e: raise ExtractError("could not change modification time") #-------------------------------------------------------------------------- def next(self): """Return the next member of the archive as a TarInfo object, when TarFile is opened for reading. Return None if there is no more available. """ self._check("ra") if self.firstmember is not None: m = self.firstmember self.firstmember = None return m # Read the next block. self.fileobj.seek(self.offset) tarinfo = None while True: try: tarinfo = self.tarinfo.fromtarfile(self) except EOFHeaderError, e: if self.ignore_zeros: self._dbg(2, "0x%X: %s" % (self.offset, e)) self.offset += BLOCKSIZE continue except InvalidHeaderError, e: if self.ignore_zeros: self._dbg(2, "0x%X: %s" % (self.offset, e)) self.offset += BLOCKSIZE continue elif self.offset == 0: raise ReadError(str(e)) except EmptyHeaderError: if self.offset == 0: raise ReadError("empty file") except TruncatedHeaderError, e: if self.offset == 0: raise ReadError(str(e)) except SubsequentHeaderError, e: raise ReadError(str(e)) break if tarinfo is not None: self.members.append(tarinfo) else: self._loaded = True return tarinfo #-------------------------------------------------------------------------- # Little helper methods: def _getmember(self, name, tarinfo=None, normalize=False): """Find an archive member by name from bottom to top. If tarinfo is given, it is used as the starting point. """ # Ensure that all members have been loaded. members = self.getmembers() # Limit the member search list up to tarinfo. if tarinfo is not None: members = members[:members.index(tarinfo)] if normalize: name = os.path.normpath(name) for member in reversed(members): if normalize: member_name = os.path.normpath(member.name) else: member_name = member.name if name == member_name: return member def _load(self): """Read through the entire archive file and look for readable members. """ while True: tarinfo = self.next() if tarinfo is None: break self._loaded = True def _check(self, mode=None): """Check if TarFile is still open, and if the operation's mode corresponds to TarFile's mode. """ if self.closed: raise IOError("%s is closed" % self.__class__.__name__) if mode is not None and self.mode not in mode: raise IOError("bad operation for mode %r" % self.mode) def _find_link_target(self, tarinfo): """Find the target member of a symlink or hardlink member in the archive. """ if tarinfo.issym(): # Always search the entire archive. linkname = "/".join(filter(None, (os.path.dirname(tarinfo.name), tarinfo.linkname))) limit = None else: # Search the archive before the link, because a hard link is # just a reference to an already archived file. linkname = tarinfo.linkname limit = tarinfo member = self._getmember(linkname, tarinfo=limit, normalize=True) if member is None: raise KeyError("linkname %r not found" % linkname) return member def __iter__(self): """Provide an iterator object. """ if self._loaded: return iter(self.members) else: return TarIter(self) def _dbg(self, level, msg): """Write debugging output to sys.stderr. """ if level <= self.debug: print >> sys.stderr, msg def __enter__(self): self._check() return self def __exit__(self, type, value, traceback): if type is None: self.close() else: # An exception occurred. We must not call close() because # it would try to write end-of-archive blocks and padding. if not self._extfileobj: self.fileobj.close() self.closed = True # class TarFile
shelve模块
shelve模块是一个简单的k,v将内存数据通过文件持久化的模块,可以持久化任何pickle可支持的python数据格式,但仅支持pickle,对pickle的上一层的封装(不用担心dumps多次造成累积的问题)
#!/usr/bin/env python # -*- coding:utf-8 -*- # Author:Uson import shelve, datetime # 仅支持pickle info = { 'name': 'uson', 'age': 27, 'gender': 'M', 'job': 'IT', } addr = ['SH', 'BJ', 'HF'] # d = shelve.open('review', 'w') # 打开一个文件 # dbm.error: need 'c' or 'n' flag to open new db d = shelve.open('review') # 打开一个文件,最后会创建三个文件.bak .dat .dir d['personInfo'] = info # 持久化字典 d['addr'] = addr # 持久化列表 d['date'] = datetime.datetime.now() d.close() d = shelve.open('review') print(d.get('personInfo')) print(d.get('date')) print(d.get('addr')) ''' {'age': 27, 'job': 'IT', 'gender': 'M', 'name': 'uson'} 2019-09-13 19:57:17.561936 ['SH', 'BJ', 'HF'] '''
import shelve d = shelve.open('shelve_test') #打开一个文件 class Test(object): def __init__(self,n): self.n = n t = Test(123) t2 = Test(123334) name = ["alex","rain","test"] d["test"] = name #持久化列表 d["t1"] = t #持久化类 d["t2"] = t2 d.close()
Xml模块
xml是实现不同语言或程序之间进行数据交换的协议,跟json差不多,但json使用起来更简单,不过,古时候,在json还没诞生的黑暗年代,大家只能选择用xml呀,至今很多传统公司如金融行业的很多系统的接口还主要是xml。
xml的格式如下,就是通过<>节点来区别数据结构的:
<?xml version="1.0"?> <data> <country name="Liechtenstein"> <rank updated="yes">2</rank> <year>2008</year> <gdppc>141100</gdppc> <neighbor name="Austria" direction="E"/> <neighbor name="Switzerland" direction="W"/> </country> <country name="Singapore"> <rank updated="yes">5</rank> <year>2011</year> <gdppc>59900</gdppc> <neighbor name="Malaysia" direction="N"/> </country> <country name="Panama"> <rank updated="yes">69</rank> <year>2011</year> <gdppc>13600</gdppc> <neighbor name="Costa Rica" direction="W"/> <neighbor name="Colombia" direction="E"/> </country> </data>
xml协议在各个语言里的都 是支持的,在python中可以用以下模块操作xml:
#!/usr/bin/env python # -*- coding:utf-8 -*- # Author:Uson import xml.etree.ElementTree as ET tree = ET.parse("xmltest.xml") # 将xmltest.xml解析成python可以识别的语言 root = tree.getroot() # 再通过getroot方法得到整个文件对象 print(root.tag) # obj.tag 类似于前端语法 root.tag: data标签,相当于html标签 # 遍历xml文档 for child in root: # child: 所有孩子的标签,不包含孙子以及以下的标签 # print(child.tag, child.attrib) ''' country {'name': 'Liechtenstein'} country {'name': 'Singapore'} country {'name': 'Panama'} ''' for i in child: # i:孩子的孩子的标签 # print(i.tag, i.text) # 仅能获取到标签和标签内容 print(i.tag, i.text, i.attrib) # tag:标签,text:标签内容,attrib:标签属性 # 只遍历year 节点 for node in root.iter('year'): # 迭代所有孩子的year标签 print(node.tag, node.text)
import xml.etree.ElementTree as ET tree = ET.parse("xmltest.xml") # 将xmltest.xml解析成python可以识别的语言 root = tree.getroot() # 再通过getroot方法得到整个文件对象 # 修改 for node in root.iter('year'): # 迭代所有孩子的year标签 new_year = int(node.text) + 1 node.text = str(new_year) node.set("python", "uson") # 给标签设置属性set() tree.write("xmltest.xml") # 写回原文件 # 删除node for country in root.findall('country'): # 找到所有的country标签循环处理每一个标签 rank = int(country.find('rank').text) if rank > 50: root.remove(country) tree.write('output.xml') # 写进了新文件
#!/usr/bin/env python # -*- coding:utf-8 -*- # Author:Uson ''' import xml.etree.ElementTree as ET tree = ET.parse("xmltest.xml") # 将xmltest.xml解析成python可以识别的语言 root = tree.getroot() # 再通过getroot方法得到整个文件对象 print(root.tag) # obj.tag 类似于前端语法 root.tag: data标签,相当于html标签 # 遍历xml文档 for child in root: # child: 所有孩子的标签,不包含孙子以及以下的标签 # print(child.tag, child.attrib) # country {'name': 'Liechtenstein'} # country {'name': 'Singapore'} # country {'name': 'Panama'} for i in child: # i:孩子的孩子的标签 # print(i.tag, i.text) # 仅能获取到标签和标签内容 print(i.tag, i.text, i.attrib) # tag:标签,text:标签内容,attrib:标签属性 # 只遍历year 节点 for node in root.iter('year'): # 迭代所有孩子的year标签 print(node.tag, node.text) ''' # 修改和删除 ''' import xml.etree.ElementTree as ET tree = ET.parse("xmltest.xml") # 将xmltest.xml解析成python可以识别的语言 root = tree.getroot() # 再通过getroot方法得到整个文件对象 # 修改 for node in root.iter('year'): # 迭代所有孩子的year标签 new_year = int(node.text) + 1 node.text = str(new_year) node.set("python", "uson") # 给标签设置属性set() tree.write("xmltest.xml") # 写回原文件 # 删除node for country in root.findall('country'): # 找到所有的country标签循环处理每一个标签 rank = int(country.find('rank').text) if rank > 50: root.remove(country) tree.write('output.xml') # 写进了新文件 ''' # 创建 import xml.etree.ElementTree as ET new_xml = ET.Element("personlist") # 创建根标签 person = ET.SubElement(new_xml, "person", attrib={"enrolled": "yes"}) # 根标签personlist下创建孩子标签person name = ET.SubElement(person, "name", attrib={"create": "uson"}) age = ET.SubElement(person, "age", attrib={"checked": "no"}) sex = ET.SubElement(person, "sex") age.text = '27' name.text = 'Uson' # person2 = ET.SubElement(new_xml, "person", attrib={"enrolled": "no"}) # person2是变量,不是标签名 person = ET.SubElement(new_xml, "person", attrib={"enrolled": "no"}) # 可以全部使用同一变量名person name = ET.SubElement(person, "name") age = ET.SubElement(person, "age") age.text = '19' et = ET.ElementTree(new_xml) # 生成文档对象 et.write("test.xml", encoding="utf-8", xml_declaration=True) # xml_declaration=True:声明是xml文件类型<?xml version='1.0' encoding='utf-8'?> ET.dump(new_xml) # 打印生成的格式
<?xml version='1.0' encoding='utf-8'?> <personlist> <person enrolled="yes"> <name create="uson">Uson</name> <age checked="no">27</age> <sex /></person> <person enrolled="no"> <name /> <age>19</age> </person> </personlist>
yaml模块:主要用来做配置文件的
类似于json,load出来后是个字典
参考:https://pyyaml.org/wiki/PyYAMLDocumentation
ConfigParser模块:
用于生成和修改常见配置文档,当前模块的名称在 python 3.x 版本中变更为 configparser(常用于mysql,ngix)。
来看一个好多软件的常见文档格式如下
[DEFAULT] ServerAliveInterval = 45 Compression = yes CompressionLevel = 9 ForwardX11 = yes [bitbucket.org] User = hg [topsecret.server.com] Port = 50022 ForwardX11 = no
如果想用python生成一个这样的文档怎么做呢?
import configparser config = configparser.ConfigParser() config["DEFAULT"] = {'ServerAliveInterval': '45', 'Compression': 'yes', 'CompressionLevel': '9'} config['bitbucket.org'] = {} config['bitbucket.org']['User'] = 'hg' config['topsecret.server.com'] = {} topsecret = config['topsecret.server.com'] topsecret['Host Port'] = '50022' # mutates the parser topsecret['ForwardX11'] = 'no' # same here config['DEFAULT']['ForwardX11'] = 'yes' with open('example.ini', 'w') as configfile: config.write(configfile)
写完了还可以再读出来哈。
>>> import configparser >>> config = configparser.ConfigParser() >>> config.sections() [] >>> config.read('example.ini') ['example.ini'] >>> config.sections() ['bitbucket.org', 'topsecret.server.com'] >>> 'bitbucket.org' in config True >>> 'bytebong.com' in config False >>> config['bitbucket.org']['User'] 'hg' >>> config['DEFAULT']['Compression'] 'yes' >>> topsecret = config['topsecret.server.com'] >>> topsecret['ForwardX11'] 'no' >>> topsecret['Port'] '50022' >>> for key in config['bitbucket.org']: print(key) ... user compressionlevel serveraliveinterval compression forwardx11 >>> config['bitbucket.org']['ForwardX11'] 'yes'
configparser增删改查语法
[section1] k1 = v1 k2:v2 [section2] k1 = v1 import ConfigParser config = ConfigParser.ConfigParser() config.read('i.cfg') # ########## 读 ########## #secs = config.sections() #print secs #options = config.options('group2') #print options #item_list = config.items('group2') #print item_list #val = config.get('group1','key') #val = config.getint('group1','key') # ########## 改写 ########## #sec = config.remove_section('group1') #config.write(open('i.cfg', "w")) #sec = config.has_section('wupeiqi') #sec = config.add_section('wupeiqi') #config.write(open('i.cfg', "w")) #config.set('group2','k1',11111) #config.write(open('i.cfg', "w")) #config.remove_option('group2','age') #config.write(open('i.cfg', "w"))
hashlib模块(字典是哈希做的)
用于加密相关的操作,3.x里代替了md5模块和sha模块,主要提供 SHA1, SHA224, SHA256, SHA384, SHA512 ,MD5 算法
加密,不能反解
import sha hash = sha.new() hash.update('admin') print hash.hexdigest()
import md5 hash = md5.new() hash.update('admin') print hash.hexdigest()
import hashlib m = hashlib.md5() m.update(b"Hello") m.update(b"It's me") print(m.digest()) m.update(b"It's been a long time since last time we ...") print(m.hexdigest()) # 16进制格式hash # a0e9894503cb9f1a14aa073f3caefaa5 m2 = hashlib.md5() m2.update(b"HelloIt's meIt's been a long time since last time we ...") print(m.hexdigest()) # 16进制格式hash # a0e9894503cb9f1a14aa073f3caefaa5 print(m.digest()) #2进制格式hash print(len(m.hexdigest())) #16进制格式hash ''' def digest(self, *args, **kwargs): # real signature unknown """ Return the digest value as a string of binary data. """ pass def hexdigest(self, *args, **kwargs): # real signature unknown """ Return the digest value as a string of hexadecimal digits. """ pass ''' import hashlib #越复杂越安全,但效率越低 # ######## md5 ######## hash = hashlib.md5() hash.update('admin') print(hash.hexdigest()) # ######## sha1 即将淘汰######## hash = hashlib.sha1() hash.update('admin') print(hash.hexdigest()) 可以:hashlib.sha1('admin').hexdigest() # ######## sha256 (新)######## hash = hashlib.sha256() hash.update('admin') print(hash.hexdigest()) # ######## sha384 ######## hash = hashlib.sha384() hash.update('admin') print(hash.hexdigest()) # ######## sha512 (新)######## hash = hashlib.sha512() hash.update('admin') print(hash.hexdigest())
hmac模块(双层加密)
它内部对我们创建 key 和 内容 再进行处理然后再加密:hmac.new(key
, value
)
散列消息鉴别码,简称HMAC,是一种基于消息鉴别码MAC(Message Authentication Code)的鉴别机制。使用HMAC时,消息通讯的双方,通过验证消息中加入的鉴别密钥K来鉴别消息的真伪;
一般用于网络通信中消息加密,前提是双方先要约定好key,就像接头暗号一样,然后消息发送把用key把消息加密,接收方用key + 消息明文再加密,拿加密后的值 跟 发送者的相对比是否相等,这样就能验证消息的真实性,及发送者的合法性了。
import hmac h = hmac.new(b'name', b'Uson') print(h.hexdigest()) h = hmac.new(b'name', b'Uson') h.update(b'hellowo') # 追加到value里面 print(h.hexdigest()) h = hmac.new(b'name', b'Usonhellowo') print(h.hexdigest()) # 3e7fbc4012a9454baa43f07a3c5010cf # f34b270493c17b4d6247546b645e411b # f34b270493c17b4d6247546b645e411b h = hmac.new('天王盖地虎'.encode(encoding='utf-8'), '宝塔镇河妖'.encode(encoding='utf-8')) print(h.hexdigest()) # 5f90dcd2211cd11601ce05195e3c5232
re模块(正则表达式)
只要有返回,即匹配到,否则,未匹配到。
常用正则表达式符号:
'.' 默认匹配除\n之外的任意一个字符,若指定flag DOTALL,则匹配任意字符,包括换行 '^' 匹配字符开头,若指定flags MULTILINE,这种也可以匹配上(r"^a","\nabc\neee",flags=re.MULTILINE) '$' 匹配字符结尾,或e.search("foo$","bfoo\nsdfsf",flags=re.MULTILINE).group()也可以 '*' 匹配*号前的字符0次或多次,re.findall("ab*","cabb3abcbbac") 结果为['abb', 'ab', 'a'] '+' 匹配前一个字符1次或多次,re.findall("ab+","ab+cd+abb+bba") 结果['ab', 'abb'] '?' 匹配前一个字符1次或0次 '{m}' 匹配前一个字符m次 '{n,m}' 匹配前一个字符n到m次,re.findall("ab{1,3}","abb abc abbcbbb") 结果'abb', 'ab', 'abb'] '|' 匹配|左或|右的字符,re.search("abc|ABC","ABCBabcCD").group() 结果'ABC' '(...)' 分组匹配,re.search("(abc){2}a(123|456)c", "abcabca456c").group() 结果 abcabca456c '\A' 只从字符开头匹配,re.search("\Aabc","alexabc") 是匹配不到的 '\Z' 匹配字符结尾,同$ '\d' 匹配数字0-9 '\D' 匹配非数字 '\w' 匹配[A-Za-z0-9] '\W' 匹配非[A-Za-z0-9] 's' 匹配空白字符、\t、\n、\r , re.search("\s+","ab\tc1\n3").group() 结果 '\t' '(?P<name>...)' 分组匹配 re.search("(?P<province>[0-9]{4})(?P<city>[0-9]{2})(?P<birthday>[0-9]{4})","371481199306143242").groupdict("city") 结果{'province': '3714', 'city': '81', 'birthday': '1993'}
最常用的匹配语法
re.match 从头开始匹配,^可有可无 获取结果:.group() re.search 匹配包含 获取结果:.group() 任意位置匹配 re.findall 把所有匹配到的字符放到以列表中的元素返回 re.splitall 以匹配到的字符当做列表分隔符 re.sub 匹配字符并替换
反斜杠的困扰
与大多数编程语言相同,正则表达式里使用"\"作为转义字符,这就可能造成反斜杠困扰。假如你需要匹配文本中的字符"\",那么使用编程语言表示的正则表达式里将需要4个反斜杠"\\\\":前两个和后两个分别用于在编程语言里转义成反斜杠,转换成两个反斜杠后再在正则表达式里转义成一个反斜杠。Python里的原生字符串很好地解决了这个问题,这个例子中的正则表达式可以使用r"\\"表示。同样,匹配一个数字的"\\d"可以写成r"\d"。有了原生字符串,你再也不用担心是不是漏写了反斜杠,写出来的表达式也更直观。
仅需轻轻知道的几个匹配模式:
re.I(re.IGNORECASE): 忽略大小写(括号内是完整写法,下同) M(MULTILINE): 多行模式,改变'^'和'$'的行为(参见上图) S(DOTALL): 点任意匹配模式,改变'.'的行为
小结:
^:字符串的开头,$:整个字符串的结尾,只要有.在,$就发挥不了作用
备注:【^:用在[ ]内时,表示的是补集(查源码得到的),也可理解为:匹配字符排除[ ]内部元素,遇到就截断,但可从中间开始匹配】
re.search("[^()]+", "20.3+((2.9-20.2)*(5.1/2))") # 20.3+ 分析:^+()=字符串
re.search("\([^(]+", "20.3+((2.9-20.2)*(5.1/2))") # (2.9-20.2)*
.:匹配任意字符,除了\n re.search("(\d./)+", "(10.15t/5+3)*4") # 5t/ 单数字+任意字符+/的组合
[a-z]:只匹配一个字符
[a-zA-Z]:只匹配一个大小写字符
[0-9]{1,3}:匹配1到3个0-9的数字
空格也不可以随便打,它会当做字符匹配的
match:取值group()
search:取值group(),字典groupdict() re.search("\(([^()]+)\)", "20.3+((2.9-20.2)*(5.1/2))").groups() #('2.9-20.2',) 元组 ()
findall:列表,没有group()方法
split:按条件分割成列表,没有group()方法
sub: sub("原值", "新值", "字符串", count=替换次数) 字符串替换,没有group()方法
运算符:
+:在()里需要转义,在[ ]无需转义; re.search("(\d\+)+", "(1+3+6)*4") # 1+3+ re.search("[\d+]+", "(1+3+6)*4") # 1+3+6
-:均无需转义;
*:只在[ ]中,无需转义; re.search("[\d*]+", "(1*3.3)*4") # 1*3
/:均无需转义;re.search("[\d/]+","(10/5+3)") # 10/5 或关系:多数字或多除号 re.search("(\d/)+","(10/5+3)") #0/ 多个单数字和单除号的组合
.:小数点,只在[ ],无需转义; re.search("[\d./]+", "(10.15/5+3)*4") # 10.15/5 或关系
备注:[ ]内+-同用时,需转义; re.search("[+\-*/]+", "10+2-20*5/2") # + 或关系 或调序 re.search("[+*-/]+", "10+2-20*5/2")# +
运算符的整体匹配 re.search("([\d.]+|\+|\*|-|/)+", "10.345+2.829-20.23*5.1/2") # 10.345+2.829-20.23*5.1/2
():经历了这么久的前戏,高潮必定会来,括号问题,迎刃而解
re.search("\(([^()]+)\)", "20.3+((2.9-20.2)*(5.1/2))") # 做了分组处理
[ ]:或关系,匹配其中的一个 res = re.search("[\d.]", "(1.2+3.3)*4") # 1 res = re.search("[\d\+]", "(1+3.3)*4") # 1
[ ]:或关系,匹配其中的多个 res = re.search("[\d\+]+", "(1+3.3)*4") # 1+3 res = re.search("[\d+]+", "(1+3+6)*4") # 1+3+6
():分组,组合匹配 res = re.search("(\d\.)", "(1+3.3)*4") # 3. res = re.search("(\d\.)", "(1+3)*4") # None
{}:匹配次数
\A:只从字符串开头匹配,类似于^,字符可以是数字
\Z:同$,我也不知道为什么要它?可能是装逼字符,迷糊大家的吧!
\d:匹配数字0-9
\D:匹配非数字,即除数字外的任意字符,包括特殊字符和\n
\w:只匹配大小写字母和数字,即除了特殊字符外
\W:只匹配特殊字符,即除了大小写字母和数字
\s:博客代码有误,不是s,而是\s,匹配空白字符、\t、\n、\r...
\S:匹配非空白字符、\t、\n、\r...
re.I:忽略大小写 res = re.search("[a-z]+", "usOn42d88cohui", flags=re.I) # usOn
re.M:多行匹配,改变^、$的行为 res = re.search("8$", "usOn\n42d88\ncohui", flags=re.M) # 8
re.S:使得.匹配任意字符包括\n res = re.search(".+", "\nabc\neee", flags=re.S) # match='\nabc\neee'
+:匹配一次或多次
*:匹配前一个字符0次或多次 re.search("[a-z]+\d*", "12uson") # uson
?符号解析: 匹配?前一个字符1次或0次(?前一个字符可有可无),从前往后仅匹配一次,每次匹配的是几个条件字符串,需要根据匹配几次的结果来确定。
(?P<name>[]{}):分组匹配,字典形式,取值方式有3种:1).groupdict();2).groupdict()["key"];3).group("key")。
最后献上最全的示例代码:
#!/usr/bin/env python # Author:Uson import re # python中未匹配到,会报错,命令行匹配不到无返回,但不报错 res = re.match("^yu", "yuxuesong") # 匹配以yu开头的字符串 print(res.group()) # 取值:只匹配到yu # res = re.match("^yu\d+", "yuxuesong123") # 匹配以yu开头后面是数字的字符串 \是正则的语法格式,d表示数字,+表示多个 # print(res.group()) # 未匹配到 res = re.match("^yu\d+", "yu123xuesong123") print(res.group()) # 匹配到yu123 # res = re.search("^x.+g$]", "yu123xueSong123") # ^字符串的开头,$字符串的结尾 .+匹配任意字符 # print(res) # None res = re.search("x.+2", "yu123xueSong123") # ^字符串的开头,$字符串的结尾 print(res.group()) # xueSong12 res = re.search("x.+b$", "yu123xueSong123b") # ^字符串的开头,$字符串的结尾 print(res.group()) # xueSong123b res = re.search("x[a-zA-Z]+", "yu123xueSong123") print(res.group()) # xueSong res = re.search("x[a-zA-Z]+\d+$", "yu123xueSong123") # 多个数字结尾,而不是一个数字结尾 print(res.group()) # xueSong123 res = re.search("x[a-zA-Z]+\d+$", "yu123xueSong3") # 一个数字结尾 +:一个或者多个 print(res.group()) # xueSong3 # ?符号解析: 匹配?前一个字符1次或多次,从前往后仅匹配一次,每次匹配的是几个条件字符串,需要根据匹配几次的结果来确定 res = re.search("u?", "usonuu") # 匹配1次-u,得到第一个u res = re.search("uu?", "usonuu") # 匹配1次-uu,得到末尾uu res = re.search("uu?", "uusonuu") # 匹配1次-uu,得到开头uu res = re.search("u?", "son") # 匹配0次-'',得到'',也算匹配到了 res = re.search("uuu?", "uuson") # 匹配1次-uuu,未匹配到,但匹配0次-uu,得到开头uu res = re.search("uuu?", "uuuuuuson")# 匹配1次-uuu,得到开头uuu res = re.search("uuu?", "uusonuuu")# 匹配0次-uu,得到开头uu 开头优先匹配(即?两个u必须有,第三个可有可无) res = re.search("uuu?", "sonuuu")# 匹配1次-uuu,得到末尾uuu res = re.search("uuu?", "usonuuu")# 匹配1次-uuu,得到末尾uuu res = re.search("uso?", "usson")# 匹配0次-us,得到开头us res = re.search("[0-9]{3}", "us2so123n")# 123 # 匹配3个数 res = re.search("[0-9]{1,3}", "us2465so183n")# 246 匹配1到3个数 res = re.search("[0-9]{4,5}", "us2465so183n")# 2465 匹配4到5个数 res = re.search("abc|ABC", "ABCwewabc")# ABC 先匹配abc,第一个没有匹配到,匹配ABC,返回 res = re.search("abc{2}", "ABCwewabcc")# abcc c匹配2次 # res = re.search("(abc){2}", "ABCabcdabc")# None res = re.search("(abc){2}", "ABCabcabc")# abcabc abc两次连续,即abcabc匹配 # 管道符是转义符(或) # res = re.search("(abc){2}|", "ABCabcabc|")# None 匹配abcabc| 或'', 开头满足或的第二个条件,返回None # 管道符不转义(需要当做字符显示出来,非或) res = re.search("(abc){2}\|", "ABCabcabc|")# abcabc| # 匹配abcabc|| 或 abcabc== 或 abcabc|= res = re.search("(abc){2}(\||=){2}", "ABCabcabc||")# abcabc|| res = re.search("(abc){2}(\||=){2}", "ABCabcabc==")# abcabc== res = re.search("(abc){2}(\||=){2}", "ABCabcabc|=")# abcabc|= res = re.search("(abc){2}(\||=){2}", "ABCabcabc|")# None res = re.search("(abc){2}", "(abcabc)") res1 = res.groups() # ('abc',) res2 = res.group() # abcabc # = 可以不转义 res = re.search("(abc){2}(\||\=){2}", "ABCabcabc|=")# abcabc|= res = re.search("(abc){2}(\||=){2}", "ABCabcabc|=")# abcabc|= res = re.search("(abc){2}(\|\|=){2}", "ABCabcabc||=||=")# abcabc||=||= # \A 只从字符串开头匹配 res = re.search("\A", "ABCabcabc||=||=")# None res = re.search("\A[0-9]+[a-z]\Z", "34byu") # None 不是以单个字母结尾 res = re.search("\A[a-zA-Z]+", "ABCabcabc||=||=")# ABCabcabc res = re.search("\A[a-z]+", "ABCabcabc||=||=")# None res = re.search("\A[A-Z]+", "ABCabcabc||=||=")# ABC res = re.search("\A[A-Z]+.+", "ABCabcabc||=||=")# ABCabcabc||=||= # \Z 同$ res = re.search("[A-Z]\Z", "ABCabcabc||=||=")# None res = re.search("=\Z", "ABCabcabc||=||=")# = res = re.search("\A[0-9]+[a-z]+\Z", "68ABCabcabc")# None res = re.search("\A[0-9]+[a-z]+\Z", "68abcabc")# 68abcabc res = re.search("\A[0-9]+[a-z]+\Z", "68abc")# 68abc # 匹配除数字外的任意字符 res = re.search("\D+", "68abc")# abc res = re.search("\D+", "68abc$- &#\n")# abc$- &#\n(换一行) res = re.search("\D+", "68abc$- &#\\n")# abc$- &#\n # 只匹配字母和数字,即除了特殊字符外 res = re.search("\w+", "68bAc$- &#\\n")# 68bAc # 只匹配特殊字符 res = re.search("\W+", "68bAc$- &#\\n") # $- &#\ res = re.search("\W+", "68bAc$- &#\n") # $- &#\n(换一行) # \s 匹配空白字符、\r、\n、\t res = re.search("\s", "68bAc$- &#\n") # ''(空格) res = re.search("\s+", "aa\na\r67") # <_sre.SRE_Match object; span=(2, 3), match='\n'> \r被截断 res = re.search("\s+", "68bAc$- \t&#") # ''(空格)和\t res = re.search("\s+", "\n \t&#") # \n(换行)和''(空格)和\t print(res.group()) res = re.search("\s+", " \t\n") # ''(空格) print(res) # <_sre.SRE_Match object; span=(0, 3), match=' \t\n'> # \S 匹配非空白字符、\r、\n、\t res = re.search("\S+", "68bAc$- &#\t") # 68bAc$- res = re.search("\S+", "# 68bAc$- &") # # print(res.group()) res = re.findall("[0-9]{1,4}", "u6s2465so183n")# ['6', '2465', '183'] 匹配所有1到4个的数字 res = re.findall("[0-9]{1}", "u6s24so183n")# ['6', '2', '4', '1', '8', '3'] 匹配所有1个的数字 res = re.findall("[0-9]{2}", "u6s24so183n")# ['24', '18'] 匹配所有2个的数字 res = re.findall("abc|ABC", "ABCwewabc")# ['ABC', 'abc'] 全匹配 print(res) # 高级装逼技巧(分组匹配) # 常用于django前端的url匹配模式 res = re.search("(?P<name>[0-9]+)", "#qwq123") print("高级装逼技巧:", res) # <_sre.SRE_Match object; span=(4, 7), match='123'> print("高级装逼技巧:", res.group()) # 123 print("高级装逼技巧:", res.groupdict()) # {'name': '123'} res = re.search("(?P<name>[0-9]{2})", "#qwq123") print("高级装逼技巧:", res.groupdict()) # {'name': '12'} res = re.search("(?P<id>[0-9]{2})(?P<name>[a-zA-Z]+)", "#qwq123uson#akaedu") # 注意分组的方式 print("高级装逼技巧:", res.groupdict()) # {'name': 'uson', 'id': '23'} print(res.groupdict()['name']) # uson print(res.group('id')) # 23 # 示例:分组匹配个人信息 res = re.search("(?P<addr>[a-zA-Z]{8})(?P<job>[A-Z]{2})(?P<born>[0-9]{4})", "ShanghaiIT1130") # addr:上海,job:IT,born:生日 print(res.groupdict()) # {'born': '1130', 'addr': 'Shanghai', 'job': 'IT'} res = re.search("(?P<Province>[0-9]{2})(?P<LuAn>[0-9]{4})(?P<Born>[0-9]{4})", "3415001130") print(res.groupdict()) # {'LuAn': '1500', 'Born': '1130', 'Province': '34'} # split 按数字分割成列表 res = re.split("[0-9]", "uson6shang88hai99job6IT") # ['uson', 'shang', '', 'hai', '', 'job', 'IT'] res = re.split("[0-9]+", "uson6shang88hai99job6IT") # ['uson', 'shang', 'hai', 'job', 'IT'] print(res) # sub 替换 sub("原值", "新值", "字符串", count=替换次数) res = re.sub("[0-9]+", "|", "uson88Job66IT9Shanghai") # uson|Job|IT|Shanghai res = re.sub("[0-9]+", "|", "uson88Job66IT9Shanghai", count=2) # uson|Job|IT9Shanghai print(res) # 反斜杠的匹配:4个\匹配字符串中的1个\,即r 2个反斜杠匹配1个\ # 8个\匹配字符串中的2个\,即r 4个反斜杠匹配2个\ res = re.search("\\\\", "uson\cohui") # \ res = re.search(r"\\", "uson\cohui") # \ res = re.search(r"\\d", "uson\\dcohui") # \d res = re.search(r"\\d", r"uson\dcohui") # \d res = re.search(r"\\\\d", "uson\\\\dcohui") # \\d res = re.search(r"\\\\d", r"uson\\dcohui") # \\d res = re.search("\\\\\\\\d", "uson\\\\\\\\dcohui") # \\d res = re.search("\\\\\\\\d", r"uson\\dcohui") # \\d # flags补充:1)re.I:忽略大小写;2)re.M:多行匹配,改变^、$的行为 res = re.search("[a-z]+", "usOn42d88cohui") # us res = re.search("[a-z]+", "usOn42d88cohui", flags=re.I) # usOn res = re.search("8$", "usOn\n42d88\ncohui", flags=re.M) # 8 res = re.search(r"^a", "\nabc\neee", flags=re.M) # a res = re.search(r"b$", "\nabc\neee", flags=re.M) # None res = re.search("c$", "\nabc\neee", flags=re.M) # c res = re.search(".+", "\nabc\neee") # abc print(res.group()) res = re.search(".+", "\nabc\neee", flags=re.S) print(res) # <_sre.SRE_Match object; span=(0, 8), match='\nabc\neee'> # 括号必须用\匹配 []:或关系,匹配其中一个, []+匹配多个; ()组合匹配 res = re.search("\(", "(1+3)*4") # ( res = re.search("(\d\.)", "(1.0+3)*4") # 1. res = re.search("(\d\.)", "(1+3.3)*4") # 3. res = re.search("(\d\.)", "(1+3)*4") # None res = re.search("[\d\.]", "(1+3)*4") # 1 res = re.search("[\d\.]", "(1.2+3.3)*4") # 1 res = re.search("[\d.]", "(1.2+3.3)*4") # 1 res = re.search("[\d\+]", "(1+3.3)*4") # 1 res = re.search("[\d\+]+", "(1+3.3)*4") # 1+3 res = re.search("(\d\+)+", "(1+3+6)*4") # 1+3+ res = re.search("[\d+]+", "(1+3+6)*4") # 1+3+6 res = re.search("[\d\+]+", "(1+3+6)*4") # 1+3+6 res = re.search("(\d+)+", "(1+3+6)*4") # 1 res = re.search("(\d\+)+", "(1+3+6)*4") # 1+3+ res = re.search("(\d-)+", "(1-3-6)*4") # 1-3- res = re.search("[\d-]+", "(1-3-6)*4") # 1-3-6 # * 匹配前一个字符0次或多次 * 只在[]可以不转义 res = re.search("(\d*)+", "(1*3-6)*4") # None res = re.search("[\d*]+", "(1*3-6)*4") # 1*3 res = re.search("(\d*)+", "88*(11*63-6)*4") # 88 = "\d*" res = re.search("(\d*)+", "uson(11*63-6)*4") # None res = re.search("\d*", "uson11*63-6*4") # None res = re.search("\(\d*", "(11*63-6*4") # (11 res = re.search("\(\d*", "()") # ( res = re.search("[a-z]+\d*", "12uson") # uson # * 只在[]可以不转义 res = re.search("[\d\*]+", "(1*3.3)*4") # 1*3 res = re.search("[\d*]+", "(1*3.3)*4") # 1*3 res = re.search("[\d*]+", "(1.2*3.3)*4") # 1 或关系 # / 除号 无需转义 res = re.search("[\d/]+", "(10/5+3)*4") # 10/5 或关系:多个数字或多个除号 res = re.search("(\d/)+", "(10/5+3)*4") # 0/ 多个,单个数字和单个除号的组合 res = re.search("(\d\/)+", "(10/5+3)*4") # 0/ 多个,单个数字和单个除号的组合 # . 小数点 只在[]无需转义 res = re.search("(\d./)+", "(10.15t/5+3)*4") # 5t/ 单数字+任意字符+/的组合 res = re.search("[\d./]+", "(10.15/5+3)*4") # 10.15/5 或关系 # 加减乘除 res = re.search("(\+|-|\*|/)+", "10+2-20*5/2") # + 或关系 res = re.search("[+\-*/]+", "10+2-20*5/2") # + 或关系 res = re.search("[+*-/]+", "10+2-20*5/2") # + 或关系 # 匹配整数或浮点数 res = re.search("[\d.]+", "10.345+2.829-20.23*5.1/2") # 10.345 或关系 # 运算符的整体匹配 res = re.search("([\d.]+[+*-/])+", "10.345+2.829-20.23*5.1/2") # 10.345+2.829-20.23*5.1/ # ()作为一组,若匹配到(数字或+-*/)中的任意一个,就将其作为一组,添加到指定列表中,用于判断是否非法输入。 res = re.search("([\d.]+|\+|\*|-|/)+", "10.345+2.829-20.23*5.1/2") # 10.345+2.829-20.23*5.1/2 # 括号问题怎么解决 res = re.search("([\d.]+|\+|\*|-|/|\(|\))+", "10.3+(2.9-20.2)*5.1/2") # 10.3+(2.9-20.2)*5.1/2 res = re.search("[(]+", "20.3+((2.9-20.2)*(5.1/2))") # (( # 【匹配字符排除[]内部元素,遇到就截断,但可以从中间开始匹配】,^在[]表示一个补集(查源码得到):^+uson=字符串 res = re.search("[^uson]+", "20.3+((2.9-20.2)*(5.1/2))") # 20.3+((2.9-20.2)*(5.1/2)) res = re.search("(^\d+\()", "20(.3+((2.9-20.2)*(5.1/2))") # 20( # 经过了这么久的前戏,高潮必然到来,括号问题迎刃而解 res = re.search("[^()]+", "20.3+((2.9-20.2)*(5.1/2))") # 20.3+ ^+()=字符串 res = re.search("\([^(]+", "20.3+((2.9-20.2)*(5.1/2))") # (2.9-20.2)* res = re.search("\([^()]+", "20.3+((2.9-20.2)*(5.1/2))") # (2.9-20.2 res = re.search("\([^()]+\)", "20.3+((2.9-20.2)*(5.1/2))") # (2.9-20.2) print(res.group()) res = re.search("\(([^()]+)\)", "20.3+((2.9-20.2)*(5.1/2))") print(res.group()) # (2.9-20.2) print(res.groups()) # ('2.9-20.2',) 元组 res = re.findall("\(([^()]+)\)", "20.3+((2.9-20.2)*(5.1/2))") print(res) # 列表 ['2.9-20.2', '5.1/2'] res = re.findall("\([^()]+\)", "20.3+((2.9-20.2)*(5.1/2))") print(res) # 列表 ['(2.9-20.2)', '(5.1/2)']
日志模块(logging)
很多程序都有记录日志的需求,并且日志中包含的信息即有正常的程序访问日志,还可能有错误、警告等信息输出,python的logging模块提供了标准的日志接口,你可以通过它存储各种格式的日志,logging的日志可以分为 debug()
, info()
, warning()
, error()
and critical() 5个级别,
下面我们看一下怎么用。
最简单用法
import logging logging.warning("user [alex] attempted wrong password more than 3 times") logging.critical("server is down") #输出 WARNING:root:user [alex] attempted wrong password more than 3 times CRITICAL:root:server is down
看一下这几个日志级别分别代表什么意思
Level | When it’s used |
---|---|
DEBUG |
Detailed information, typically of interest only when diagnosing problems. |
INFO |
Confirmation that things are working as expected. |
WARNING |
An indication that something unexpected happened, or indicative of some problem in the near future (e.g. ‘disk space low’). The software is still working as expected. |
ERROR |
Due to a more serious problem, the software has not been able to perform some function. |
CRITICAL |
A serious error, indicating that the program itself may be unable to continue running. |
如果想把日志写到文件里,也很简单
import logging logging.basicConfig(filename='example.log',level=logging.INFO) logging.debug('This message should go to the log file') logging.info('So should this') logging.warning('And this, too')
其中下面这句中的level=loggin.INFO意思是,把日志纪录级别设置为INFO,也就是说,只有比日志是INFO或比INFO级别更高的日志才会被纪录到文件里,在这个例子, 第一条日志是不会被纪录的,如果希望纪录debug的日志,那把日志级别改成DEBUG就行了。
logging.basicConfig(filename='example.log',level=logging.INFO)
感觉上面的日志格式忘记加上时间啦,日志不知道时间怎么行呢,下面就来加上!
import logging logging.basicConfig(format='%(asctime)s %(message)s', datefmt='%m/%d/%Y %I:%M:%S %p') logging.warning('is when this event was logged.') #输出 12/12/2010 11:46:36 AM is when this event was logged.
日志格式
%(name)s |
Logger的名字 |
%(levelno)s |
数字形式的日志级别 |
%(levelname)s |
文本形式的日志级别 |
%(pathname)s |
调用日志输出函数的模块的完整路径名,可能没有 |
%(filename)s |
调用日志输出函数的模块的文件名 |
%(module)s |
调用日志输出函数的模块名 |
%(funcName)s |
调用日志输出函数的函数名 |
%(lineno)d |
调用日志输出函数的语句所在的代码行 |
%(created)f |
当前时间,用UNIX标准的表示时间的浮 点数表示 |
%(relativeCreated)d |
输出日志信息时的,自Logger创建以 来的毫秒数 |
%(asctime)s |
字符串形式的当前时间。默认格式是 “2003-07-08 16:49:45,896”。逗号后面的是毫秒 |
%(thread)d |
线程ID。可能没有 |
%(threadName)s |
线程名。可能没有 |
%(process)d |
进程ID。可能没有 |
%(message)s |
用户输出的消息 |
如果想同时把log打印在屏幕和文件日志里,就需要了解一点复杂的知识 了
Python 使用logging模块记录日志涉及四个主要类,使用官方文档中的概括最为合适:
logger提供了应用程序可以直接使用的接口;
handler将(logger创建的)日志记录发送到合适的目的输出;
filter提供了细度设备来决定输出哪条日志记录;
formatter决定日志记录的最终输出格式。
logger
每个程序在输出信息之前都要获得一个Logger。Logger通常对应了程序的模块名,比如聊天工具的图形界面模块可以这样获得它的Logger:
LOG=logging.getLogger(”chat.gui”)
而核心模块可以这样:
LOG=logging.getLogger(”chat.kernel”)
Logger.setLevel(lel):指定最低的日志级别,低于lel的级别将被忽略。debug是最低的内置级别,critical为最高
Logger.addFilter(filt)、Logger.removeFilter(filt):添加或删除指定的filter
Logger.addHandler(hdlr)、Logger.removeHandler(hdlr):增加或删除指定的handler
Logger.debug()、Logger.info()、Logger.warning()、Logger.error()、Logger.critical():可以设置的日志级别
handler
handler对象负责发送相关的信息到指定目的地。Python的日志系统有多种Handler可以使用。有些Handler可以把信息输出到控制台,有些Logger可以把信息输出到文件,还有些
Handler可以把信息发送到网络上。如果觉得不够用,还可以编写自己的Handler。可以通过addHandler()方法添加多个多handler
Handler.setLevel(lel):指定被处理的信息级别,低于lel级别的信息将被忽略
Handler.setFormatter():给这个handler选择一个格式
Handler.addFilter(filt)、Handler.removeFilter(filt):新增或删除一个filter对象
每个Logger可以附加多个Handler。接下来我们就来介绍一些常用的Handler:
1) logging.StreamHandler
使用这个Handler可以向类似与sys.stdout或者sys.stderr的任何文件对象(file object)输出信息。它的构造函数是:
StreamHandler([strm])
其中strm参数是一个文件对象。默认是sys.stderr
2) logging.FileHandler
和StreamHandler类似,用于向一个文件输出日志信息。不过FileHandler会帮你打开这个文件。它的构造函数是:
FileHandler(filename[,mode])
filename是文件名,必须指定一个文件名。
mode是文件的打开方式。参见Python内置函数open()的用法。默认是’a',即添加到文件末尾。
3) logging.handlers.RotatingFileHandler
这个Handler类似于上面的FileHandler,但是它可以管理文件大小。当文件达到一定大小之后,它会自动将当前日志文件改名,然后创建
一个新的同名日志文件继续输出。比如日志文件是chat.log。当chat.log达到指定的大小之后,RotatingFileHandler自动把
文件改名为chat.log.1。不过,如果chat.log.1已经存在,会先把chat.log.1重命名为chat.log.2。。。最后重新创建
chat.log,继续输出日志信息。它的构造函数是:
RotatingFileHandler( filename[, mode[, maxBytes[, backupCount]]])
其中filename和mode两个参数和FileHandler一样。
maxBytes用于指定日志文件的最大文件大小。如果maxBytes为0,意味着日志文件可以无限大,这时上面描述的重命名过程就不会发生。
backupCount用于指定保留的备份文件的个数。比如,如果指定为2,当上面描述的重命名过程发生时,原有的chat.log.2并不会被更名,而是被删除。
4) logging.handlers.TimedRotatingFileHandler
这个Handler和RotatingFileHandler类似,不过,它没有通过判断文件大小来决定何时重新创建日志文件,而是间隔一定时间就
自动创建新的日志文件。重命名的过程与RotatingFileHandler类似,不过新的文件不是附加数字,而是当前时间。它的构造函数是:
TimedRotatingFileHandler( filename [,when [,interval [,backupCount]]])
其中filename参数和backupCount参数和RotatingFileHandler具有相同的意义。
interval是时间间隔。
when参数是一个字符串。表示时间间隔的单位,不区分大小写。它有以下取值:
S 秒
M 分
H 小时
D 天
W 每星期(interval==0时代表星期一)
midnight 每天凌晨
import logging #create logger logger = logging.getLogger('TEST-LOG') logger.setLevel(logging.DEBUG) # create console handler and set level to debug ch = logging.StreamHandler() ch.setLevel(logging.DEBUG) # create file handler and set level to warning fh = logging.FileHandler("access.log") fh.setLevel(logging.WARNING) # create formatter formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s') # add formatter to ch and fh ch.setFormatter(formatter) fh.setFormatter(formatter) # add ch and fh to logger logger.addHandler(ch) logger.addHandler(fh) # 'application' code logger.debug('debug message') logger.info('info message') logger.warn('warn message') logger.error('error message') logger.critical('critical message')
文件自动截断例子
import logging from logging import handlers logger = logging.getLogger(__name__) log_file = "timelog.log" #fh = handlers.RotatingFileHandler(filename=log_file,maxBytes=10,backupCount=3) fh = handlers.TimedRotatingFileHandler(filename=log_file,when="S",interval=5,backupCount=3) formatter = logging.Formatter('%(asctime)s %(module)s:%(lineno)d %(message)s') fh.setFormatter(formatter) logger.addHandler(fh) logger.warning("test1") logger.warning("test12") logger.warning("test13") logger.warning("test14")
2、开源模块(如:paramiko...)
3、自定义模块(自己写的.py文件)
后记
math模块:
import math math.ceil(10/3) >>4