模块 | 易学教程

模块定义、导入、本质

定义:用来从逻辑上组织python代码（变量，函数，类，逻辑，实现一个功能）

import本质：路径搜索和搜索路径

（1）import x(模块名)

x = all code(所有代码)

x.name，x.logger()　　调用变量或函数

（2）from x import name(方法)

name = 'uson'　　　　直接使用变量或函数

导入模块本质上就是.py结尾的python文件，将python文件解释一遍

(导入模块)import module_name -->(找到.py文件)module.py -->(找到路径)module.py路径【路径搜索】 -->sys.path(pycharm将相对路径转换成绝对路径的一个路径列表)【搜索路径】，如果导入的模块在当前目前下，导入成功，否则导入失败，怎么办？

将需要导入的模块添加到环境变量中， sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))

优化： sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) # ['D:\\python\\oldboy\\5\\module', ...]

导入方法：

（1）import 模块名1，模块名2，……

（2）from 模块名 import *(所有函数和方法) -->不建议

（3）from 模块名 import 函数名(方法名) as 重命名

（4）from 模块名 import m1, m2, m3, …… # 值得注意：模块名和m1,m2,m3必须在配置的环境路径下一级（导入包：同理）

包定义、导入、本质

定义：即一个目录（必须带有一个__init__.py文件），用来从逻辑上组织模块的

导入包的本质：执行该包下的__init__.py文件

（1）run __init__.py，需要在__init__.py文件中导入某个模块（即把导入语句放入了__init__.py中）

（2）譬如：导入相对于__init__.py当前目录的模块，from . import module_name # module_name = ‘module_name.py all code’

跨目录导入包示例如下：

在im文件里导入pychamr包：
1、在im文件里配置环境变量(至少到包的上级目录module)
2、import pychamr
3、调用方法：pychamr.模块名(__init__导入的模块名).变量或函数

模块小结：

在导入包时，发现奇怪问题，不知道为什么？

（1）导入当前目录的模块时，不能用 from . import 同级目录下的.py文件名，只能用 import 同级目录下的.py文件名，本文件才能正常执行

（2）反之，在导入包时，包目录下的__init__文件，不能用 import 同级目录下的.py文件名，只能用 from . import 同级目录下的.py文件名，调用文件才能正常执行

from 和 import后面的模块名（module目录同级）必须在配置的路径（目录5）下一级 ['D:\\python\\5', 'D:\\python\\5\\module'，...]

模块分类：

1、标准库（time, os, ...）

time模块（调用C库）

表示方式：

1）时间戳：单位s

2）格式化的时间字符串

3）元组（9个元素）：time.localtime() -->本地时间　　备注：周日：一周的第一天0

time.struct_time(tm_year=2019, tm_mon=9, tm_mday=13, tm_hour=13, tm_min=29, tm_sec=52, tm_wday=4, tm_yday=256, tm_isdst=0) #isdst:是否是夏令时

世界标准时间：UTC，在中国：UTC+8（东八区）

时区：360° / 15° = 24个时区，世界时间是以子午线（0°）为准的，中国与中间的经度（子午线）相差8格，即中国比标准时间要早8个小时（早8小时见到太阳，时间上就大于世界标准时间）

word_time = time.timezone
china_time = int(word_time) / 3600
print("中国与世界时间相差：%s小时" %china_time)
# 中国与世界时间相差：-8.0小时

help(time)可以查看所有time的方法

（1）time

示例代码：

import time

current = time.time()   # 时间戳（单位s） 1568350840.6369312
hour = current / 3600
day = hour / 24
year = day / 365
start_date = 2019 - int(year)
print("Unix系统诞生的时间：", start_date) # 起源时间： 1970

word_time = time.timezone
china_time = int(word_time) / 3600
print("中国与世界时间相差：%s小时" %china_time)
# 中国与世界时间相差：-8.0小时

print(time.altzone) # -32400 夏令时与UTC的时差
print(time.daylight) # 0 是否使用了夏令时
# time.sleep(1)

# （一）时间戳转元组格式
# (1)
print(time.gmtime()) # 默认为空，则为当前的UTC时区的时间 实际时间：2019-9-13 14:30
# time.struct_time(tm_year=2019, tm_mon=9, tm_mday=13, tm_hour=6, tm_min=30, tm_sec=12, tm_wday=4, tm_yday=256, tm_isdst=0)
print(time.gmtime(23422544738)) # UTC时区的时间 # 2027年。。。
print(time.gmtime(2)) # UTC时区的时间  1970年。。。
# (2)
# 元组形式（9个元素） isdst:是否是夏令时
print(time.localtime())  # 默认为空，即当前的本地时间（UTC+8）
local = time.localtime(123232)
print(local)
# time.struct_time(tm_year=1970, tm_mon=1, tm_mday=2, tm_hour=18, tm_min=13, tm_sec=52, tm_wday=4, tm_yday=2, tm_isdst=0)
print(local.tm_year, local.tm_mday) # 1970 2

# （二）元组格式为时间戳
local = time.localtime()  # 转本地时间为时间戳
print(time.mktime(local))   # 1568357166.0

# （三）元组格式转格式化的字符串
# time.strftime(format[, tuple])
t1 = time.strftime("%Y-%m-%d %H:%M:%S")
t2 = time.strftime("%Y-%m-%d %X")
t3 = time.strftime("%y-%m-%d %X")
print(t1, t2, t3) # 2019-09-13 14:54:30 2019-09-13 14:54:30 19-09-13 14:54:30

local = time.localtime(1232422)
f_local = time.strftime("%Y-%m-%d %H:%M:%S", local)
print(f_local) # 1970-01-15 14:20:22

# （四）格式化的字符串转元组格式
# time.strptime(string, format)
tup = time.strptime("1970-01-15 14:20:22", "%Y-%m-%d %H:%M:%S")
print(tup)
# time.struct_time(tm_year=1970, tm_mon=1, tm_mday=15, tm_hour=14, tm_min=20, tm_sec=22, tm_wday=3, tm_yday=15, tm_isdst=-1)

# 元组格式转字符串（周月日时分秒年：%a %b %d %H:%M:%S %Y）
local = time.localtime(1232422)
asc1 = time.asctime()       # 默认(本地时间)元组格式转字符串 Fri Sep 13 15:41:13 2019
asc2 = time.asctime(local) # 指定元组格式转字符串 Thu Jan 15 14:20:22 1970
print(asc1, asc2)

# 时间戳转字符串（周月日时分秒年：%a %b %d %H:%M:%S %Y）
local = time.ctime()        # (本地时间) Fri Sep 13 15:56:47 2019
stamp = time.ctime(123322)  # (指定时间戳) Fri Jan  2 18:15:22 1970
print(local, stamp)

时间三种方式之间的转换：

参考(H)：

（2）datetime

示例代码：

import  datetime, time
print(datetime.datetime.now()) #获取当前东八区时间 2019-09-13 16:08:23.626636
print(datetime.datetime.utcnow()) #获取当前utc时间 2019-09-13 08:08:23.626636
# timedelta: 时间增量  不可以单独使用
print(datetime.datetime.now() + datetime.timedelta(3)) #当前时间+3天
print(datetime.datetime.now() + datetime.timedelta(-3)) #当前时间-3天

print(datetime.datetime.now() + datetime.timedelta(hours=3)) #当前时间+3小时
print(datetime.datetime.now() + datetime.timedelta(minutes=30)) #当前时间+30分
print(datetime.datetime.now() + datetime.timedelta(seconds=30)) #当前时间+30秒

# 拓展
# 自定义日期和时间（时分秒）传参得到格式化的日期和时间字符串
print(datetime.datetime(year=2019, month=9, day=13, hour=16, minute=16, second=16)) # 2019-09-13 16:16:16
print(datetime.datetime(2019, 9, 13, 16, 16, 16)) # 2019-09-13 16:16:16
print(datetime.datetime(2019, 9, 13, 16, 16, 16, 42212)) # 2019-09-13 16:16:16.042212
# 自定义日期（年月日）传参得到格式化的日期字符串
print(datetime.date(year=2019, month=9, day=13)) # 2019-09-13
# 自定义时间（时分秒）传参得到格式化的时间字符串
print(datetime.time(hour=16, minute=16, second=16)) # 16:16:16
# 时间戳直接转成日期格式
print(datetime.date.fromtimestamp(time.time()) )  # 2019-09-13

# 修改当前获取的时间（时间替换）
print(datetime.datetime.now().replace(hour=9)) 
# 修改前时间：2019-09-13 16:32:30.843760
# 修改后时间：2019-09-13 09:32:30.843760

random模块

方法详解示例代码：

#!/usr/bin/env python
# Author：USON

import random

# 随机[0-1)浮点数 顾头不顾尾
print(random.random()) # 0.06706695529343099
# 随机[0-10)浮点数 顾头不顾尾
print(random.uniform(0, 10)) # 7.836148583000067
# 随机[1-3]整数 包含3
print(random.randint(1,3))
# 随机[1-2]整数 range顾头不顾尾
print(random.randrange(1,3))
# 随机取值：字符串、列表、元组...随机取1位
print(random.choice([1,2,4,5,6,8])) # 1
# 随机取值：字符串、列表、元组...随机取n个位
print(random.sample([1,2,4,5,6,8], 2)) # [6, 8]

# 洗牌功能: 完全打乱顺序
items = [1,2,4,6,7,8,10]
random.shuffle(items)
print(items) # [10, 6, 7, 8, 2, 4, 1]

验证码功能示例：

import random

checkcode = ''
#英文+数字随机组合验证码：
for i in range(4):
    rand_int = random.randint(0,3)
    if i == rand_int:
        #字母
        #checkcode += chr(random.randint(65, 90))
        tmp = chr(random.randint(65, 90))
    else:
        #数字
        #checkcode += str(random.randint(0, 9))
        tmp = random.randint(0, 9)
    checkcode += str(tmp)
print(checkcode)

#全英文验证码：
for i in range(4):
    checkcode += chr(random.randint(65, 90))
print(checkcode)

#全数字验证码：
for i in range(4):
    checkcode += str(random.randint(0, 9))
print(checkcode)

os模块：提供对操作系统进行调用的接口

os_path = os.path.join('run.BASE_DIR', 'database') # run.BASE_DIR\database

import os
# 提供对操作系统进行调用的接口

# print(os.getcwd()) # 获取当前工作目录，即当前python脚本工作的目录路径
# os.chdir("D:\\python\\uson") #  改变当前脚本工作目录；相当于shell下cd
# os.chdir(r"D:\python\uson") # 防止转义
# os.curdir  返回当前目录: ('.')
# os.pardir  返回上一级目录，获取当前目录的父目录字符串名：('..')
# os.makedirs('dirname1/dirname2')    可生成多层递归目录
# os.removedirs('dirname1')    若目录为空，则删除，并递归到上一级目录，如若也为空，则删除，依此类推
# os.mkdir('dirname')    生成单级目录；相当于shell中mkdir dirname
# os.rmdir('dirname')    删除单级空目录，若目录不为空则无法删除，报错；相当于shell中rmdir dirname
# os.listdir('dirname')    列出指定目录下的所有文件和子目录，包括隐藏文件，并以列表方式打印
# os.remove()  删除一个文件
# os.rename("oldname","newname")  重命名文件/目录
# os.stat('path/filename')  获取文件/目录信息
# os.sep    输出操作系统特定的路径分隔符，win下为"\\",Linux下为"/"
# os.linesep    输出当前平台使用的行终止符，win下为"\t\n",Linux下为"\n"
# os.pathsep    输出用于分割文件路径的字符串，win下为;
# os.name    输出字符串指示当前使用平台。win->'nt'; Linux->'posix'
# os.system("bash command")  运行shell命令，直接显示
# os.system('dir')
# os.system('ipconfig/all')
# os.environ  获取系统环境变量
# os.path.abspath(path)  返回path规范化的绝对路径
# os.path.split(path)  将path分割成目录和文件名二元组返回
# 'C:\\Users\\uson'  ---> ('C:\\Users', 'uson')

# 路径可以不存在
# os.path.dirname(path)  返回path的目录。其实就是os.path.split(path)的第一个元素
# os.path.basename(path)  返回path最后的文件名，文件为空，返回结尾目录名。如果path以／或\结尾，那么就会返回空值。即os.path.split(path)的第二个元素

# os.path.exists(path)  如果path存在，返回True；如果path不存在，返回False
# os.path.isabs(path)  如果path是绝对路径，返回True
# os.path.isfile(path)  如果path是一个存在的文件，返回True。否则返回False
# os.path.isdir(path)  如果path是一个存在的目录，则返回True。否则返回False

# os.path.join(path1[, path2[, ...]])  将多个路径组合后返回，第一个绝对路径之前的参数将被忽略 

# os.path.getatime(path)  返回path所指向的文件或者目录的最后存取时间（时间戳）
# os.path.getmtime(path)  返回path所指向的文件或者目录的最后修改时间（时间戳）

sys模块

sys.argv           命令行参数List，第一个元素是程序本身路径
sys.exit(n)        退出程序，正常退出时exit(0)
sys.version        获取Python解释程序的版本信息
sys.maxint         最大的Int值
sys.path           返回模块的搜索路径，初始化时使用PYTHONPATH环境变量的值
sys.platform       返回操作系统平台名称
sys.stdout.write('please:')
val = sys.stdin.readline()[:-1]

shutil模块

高级的文件、文件夹、压缩包处理模块

shutil.copyfileobj(fsrc, fdst[, length])
将文件内容拷贝到另一个文件中，可以部分内容 shutil.copyfileobj(f1, f2)

import shutil
f1 = open('os.py', 'r', encoding='utf-8') # 原文件
f2 = open('os2.py', 'w', encoding='utf-8') # 新文件
shutil.copyfileobj(f1, f2)

def copyfileobj(fsrc, fdst, length=16*1024):
    """copy data from file-like object fsrc to file-like object fdst"""
    while 1:
        buf = fsrc.read(length)
        if not buf:
            break
        fdst.write(buf)

源码

shutil.copyfile(src, dst)
拷贝文件 shutil.copyfile('os.py', 'os2.py')

def copyfile(src, dst):
    """Copy data from src to dst"""
    if _samefile(src, dst):
        raise Error("`%s` and `%s` are the same file" % (src, dst))

    for fn in [src, dst]:
        try:
            st = os.stat(fn)
        except OSError:
            # File most likely does not exist
            pass
        else:
            # XXX What about other special files? (sockets, devices...)
            if stat.S_ISFIFO(st.st_mode):
                raise SpecialFileError("`%s` is a named pipe" % fn)

    with open(src, 'rb') as fsrc:
        with open(dst, 'wb') as fdst:
            copyfileobj(fsrc, fdst)

源码

shutil.copymode(src, dst)
仅拷贝权限。内容、组、用户均不变

def copymode(src, dst):
    """Copy mode bits from src to dst"""
    if hasattr(os, 'chmod'):
        st = os.stat(src)
        mode = stat.S_IMODE(st.st_mode)
        os.chmod(dst, mode)

源码

shutil.copystat(src, dst)
拷贝状态的信息，包括：mode bits, atime, mtime, flags

def copystat(src, dst):
    """Copy all stat info (mode bits, atime, mtime, flags) from src to dst"""
    st = os.stat(src)
    mode = stat.S_IMODE(st.st_mode)
    if hasattr(os, 'utime'):
        os.utime(dst, (st.st_atime, st.st_mtime))
    if hasattr(os, 'chmod'):
        os.chmod(dst, mode)
    if hasattr(os, 'chflags') and hasattr(st, 'st_flags'):
        try:
            os.chflags(dst, st.st_flags)
        except OSError, why:
            for err in 'EOPNOTSUPP', 'ENOTSUP':
                if hasattr(errno, err) and why.errno == getattr(errno, err):
                    break
            else:
                raise

源码

shutil.copy(src, dst)
拷贝文件和权限

def copy(src, dst):
    """Copy data and mode bits ("cp src dst").

    The destination may be a directory.

    """
    if os.path.isdir(dst):
        dst = os.path.join(dst, os.path.basename(src))
    copyfile(src, dst)
    copymode(src, dst)

源码

shutil.copy2(src, dst)
拷贝文件和状态信息

def copy2(src, dst):
    """Copy data and all stat info ("cp -p src dst").

    The destination may be a directory.

    """
    if os.path.isdir(dst):
        dst = os.path.join(dst, os.path.basename(src))
    copyfile(src, dst)
    copystat(src, dst)

源码

shutil.ignore_patterns(*patterns)

shutil.copytree(src, dst, symlinks=False, ignore=None)
递归的去拷贝目录和文件

例如：copytree(source, destination, ignore=ignore_patterns('*.pyc', 'tmp*'))

def ignore_patterns(*patterns):
    """Function that can be used as copytree() ignore parameter.

    Patterns is a sequence of glob-style patterns
    that are used to exclude files"""
    def _ignore_patterns(path, names):
        ignored_names = []
        for pattern in patterns:
            ignored_names.extend(fnmatch.filter(names, pattern))
        return set(ignored_names)
    return _ignore_patterns

def copytree(src, dst, symlinks=False, ignore=None):
    """Recursively copy a directory tree using copy2().

    The destination directory must not already exist.
    If exception(s) occur, an Error is raised with a list of reasons.

    If the optional symlinks flag is true, symbolic links in the
    source tree result in symbolic links in the destination tree; if
    it is false, the contents of the files pointed to by symbolic
    links are copied.

    The optional ignore argument is a callable. If given, it
    is called with the `src` parameter, which is the directory
    being visited by copytree(), and `names` which is the list of
    `src` contents, as returned by os.listdir():

        callable(src, names) -> ignored_names

    Since copytree() is called recursively, the callable will be
    called once for each directory that is copied. It returns a
    list of names relative to the `src` directory that should
    not be copied.

    XXX Consider this example code rather than the ultimate tool.

    """
    names = os.listdir(src)
    if ignore is not None:
        ignored_names = ignore(src, names)
    else:
        ignored_names = set()

    os.makedirs(dst)
    errors = []
    for name in names:
        if name in ignored_names:
            continue
        srcname = os.path.join(src, name)
        dstname = os.path.join(dst, name)
        try:
            if symlinks and os.path.islink(srcname):
                linkto = os.readlink(srcname)
                os.symlink(linkto, dstname)
            elif os.path.isdir(srcname):
                copytree(srcname, dstname, symlinks, ignore)
            else:
                # Will raise a SpecialFileError for unsupported file types
                copy2(srcname, dstname)
        # catch the Error from the recursive copytree so that we can
        # continue with other files
        except Error, err:
            errors.extend(err.args[0])
        except EnvironmentError, why:
            errors.append((srcname, dstname, str(why)))
    try:
        copystat(src, dst)
    except OSError, why:
        if WindowsError is not None and isinstance(why, WindowsError):
            # Copying file access times may fail on Windows
            pass
        else:
            errors.append((src, dst, str(why)))
    if errors:
        raise Error, errors

源码

shutil.rmtree(path[, ignore_errors[, onerror]])
递归的去删除目录和文件

def rmtree(path, ignore_errors=False, onerror=None):
    """Recursively delete a directory tree.

    If ignore_errors is set, errors are ignored; otherwise, if onerror
    is set, it is called to handle the error with arguments (func,
    path, exc_info) where func is os.listdir, os.remove, or os.rmdir;
    path is the argument to that function that caused it to fail; and
    exc_info is a tuple returned by sys.exc_info().  If ignore_errors
    is false and onerror is None, an exception is raised.

    """
    if ignore_errors:
        def onerror(*args):
            pass
    elif onerror is None:
        def onerror(*args):
            raise
    try:
        if os.path.islink(path):
            # symlinks to directories are forbidden, see bug #1669
            raise OSError("Cannot call rmtree on a symbolic link")
    except OSError:
        onerror(os.path.islink, path, sys.exc_info())
        # can't continue even if onerror hook returns
        return
    names = []
    try:
        names = os.listdir(path)
    except os.error, err:
        onerror(os.listdir, path, sys.exc_info())
    for name in names:
        fullname = os.path.join(path, name)
        try:
            mode = os.lstat(fullname).st_mode
        except os.error:
            mode = 0
        if stat.S_ISDIR(mode):
            rmtree(fullname, ignore_errors, onerror)
        else:
            try:
                os.remove(fullname)
            except os.error, err:
                onerror(os.remove, fullname, sys.exc_info())
    try:
        os.rmdir(path)
    except os.error:
        onerror(os.rmdir, path, sys.exc_info())

源码

shutil.move(src, dst)
递归的去移动文件

def move(src, dst):
    """Recursively move a file or directory to another location. This is
    similar to the Unix "mv" command.

    If the destination is a directory or a symlink to a directory, the source
    is moved inside the directory. The destination path must not already
    exist.

    If the destination already exists but is not a directory, it may be
    overwritten depending on os.rename() semantics.

    If the destination is on our current filesystem, then rename() is used.
    Otherwise, src is copied to the destination and then removed.
    A lot more could be done here...  A look at a mv.c shows a lot of
    the issues this implementation glosses over.

    """
    real_dst = dst
    if os.path.isdir(dst):
        if _samefile(src, dst):
            # We might be on a case insensitive filesystem,
            # perform the rename anyway.
            os.rename(src, dst)
            return

        real_dst = os.path.join(dst, _basename(src))
        if os.path.exists(real_dst):
            raise Error, "Destination path '%s' already exists" % real_dst
    try:
        os.rename(src, real_dst)
    except OSError:
        if os.path.isdir(src):
            if _destinsrc(src, dst):
                raise Error, "Cannot move a directory '%s' into itself '%s'." % (src, dst)
            copytree(src, real_dst, symlinks=True)
            rmtree(src)
        else:
            copy2(src, real_dst)
            os.unlink(src)

源码

shutil.make_archive(base_name, format,...)

创建压缩包并返回文件路径，例如：zip、tar （archive：归档之意）

- base_name：压缩包的文件名，也可以是压缩包的路径。只是文件名时，则保存至当前目录，否则保存至指定路径，
  如：www =>保存至当前路径
  如：/Users/wupeiqi/www =>保存至/Users/wupeiqi/
- format：压缩包种类，“zip”, “tar”, “bztar”，“gztar” （ zip:打包和压缩，tar:只打包不压缩）
- root_dir：要压缩的文件夹路径（默认当前目录）
- owner：用户，默认当前用户
- group：组，默认当前组
- logger：用于记录日志，通常是logging.Logger对象

shutil.make_archive('modulezip', 'zip', 'D:\\python\\uson\\5\\复习\\module')

#将 /Users/wupeiqi/Downloads/test 下的文件打包放置当前程序目录
 
import shutil
ret = shutil.make_archive("wwwwwwwwww", 'gztar', root_dir='/Users/wupeiqi/Downloads/test')
 
#将 /Users/wupeiqi/Downloads/test 下的文件打包放置 /Users/wupeiqi/目录
import shutil
ret = shutil.make_archive("/Users/wupeiqi/wwwwwwwwww", 'gztar', root_dir='/Users/wupeiqi/Downloads/test')

#将 /Users/wupeiqi/Downloads/test 下的文件打包放置当前程序目录
 
import shutil
ret = shutil.make_archive("wwwwwwwwww", 'gztar', root_dir='/Users/wupeiqi/Downloads/test')
 
#将 /Users/wupeiqi/Downloads/test 下的文件打包放置 /Users/wupeiqi/目录
import shutil
ret = shutil.make_archive("/Users/wupeiqi/wwwwwwwwww", 'gztar', root_dir='/Users/wupeiqi/Downloads/test')

shutil 对压缩包的处理是调用 ZipFile 和 TarFile 两个模块来进行的，详细：

import zipfile
z = zipfile.ZipFile('os.zip', 'w')
z.write('os.py')
print("我还可以玩会")
z.write('os2.py')
z.close()

import zipfile

# 压缩
z = zipfile.ZipFile('laxi.zip', 'w')
z.write('a.log')
z.write('data.data')
z.close()

# 解压
z = zipfile.ZipFile('laxi.zip', 'r')
z.extractall()
z.close()

zipfile 压缩解压

import tarfile

# 压缩
tar = tarfile.open('your.tar','w')
tar.add('/Users/wupeiqi/PycharmProjects/bbs2.zip', arcname='bbs2.zip')
tar.add('/Users/wupeiqi/PycharmProjects/cmdb.zip', arcname='cmdb.zip')
tar.close()

# 解压
tar = tarfile.open('your.tar','r')
tar.extractall()  # 可设置解压地址
tar.close()

tarfile 压缩解压

class ZipFile(object):
    """ Class with methods to open, read, write, close, list zip files.

    z = ZipFile(file, mode="r", compression=ZIP_STORED, allowZip64=False)

    file: Either the path to the file, or a file-like object.
          If it is a path, the file will be opened and closed by ZipFile.
    mode: The mode can be either read "r", write "w" or append "a".
    compression: ZIP_STORED (no compression) or ZIP_DEFLATED (requires zlib).
    allowZip64: if True ZipFile will create files with ZIP64 extensions when
                needed, otherwise it will raise an exception when this would
                be necessary.

    """

    fp = None                   # Set here since __del__ checks it

    def __init__(self, file, mode="r", compression=ZIP_STORED, allowZip64=False):
        """Open the ZIP file with mode read "r", write "w" or append "a"."""
        if mode not in ("r", "w", "a"):
            raise RuntimeError('ZipFile() requires mode "r", "w", or "a"')

        if compression == ZIP_STORED:
            pass
        elif compression == ZIP_DEFLATED:
            if not zlib:
                raise RuntimeError,\
                      "Compression requires the (missing) zlib module"
        else:
            raise RuntimeError, "That compression method is not supported"

        self._allowZip64 = allowZip64
        self._didModify = False
        self.debug = 0  # Level of printing: 0 through 3
        self.NameToInfo = {}    # Find file info given name
        self.filelist = []      # List of ZipInfo instances for archive
        self.compression = compression  # Method of compression
        self.mode = key = mode.replace('b', '')[0]
        self.pwd = None
        self._comment = ''

        # Check if we were passed a file-like object
        if isinstance(file, basestring):
            self._filePassed = 0
            self.filename = file
            modeDict = {'r' : 'rb', 'w': 'wb', 'a' : 'r+b'}
            try:
                self.fp = open(file, modeDict[mode])
            except IOError:
                if mode == 'a':
                    mode = key = 'w'
                    self.fp = open(file, modeDict[mode])
                else:
                    raise
        else:
            self._filePassed = 1
            self.fp = file
            self.filename = getattr(file, 'name', None)

        try:
            if key == 'r':
                self._RealGetContents()
            elif key == 'w':
                # set the modified flag so central directory gets written
                # even if no files are added to the archive
                self._didModify = True
            elif key == 'a':
                try:
                    # See if file is a zip file
                    self._RealGetContents()
                    # seek to start of directory and overwrite
                    self.fp.seek(self.start_dir, 0)
                except BadZipfile:
                    # file is not a zip file, just append
                    self.fp.seek(0, 2)

                    # set the modified flag so central directory gets written
                    # even if no files are added to the archive
                    self._didModify = True
            else:
                raise RuntimeError('Mode must be "r", "w" or "a"')
        except:
            fp = self.fp
            self.fp = None
            if not self._filePassed:
                fp.close()
            raise

    def __enter__(self):
        return self

    def __exit__(self, type, value, traceback):
        self.close()

    def _RealGetContents(self):
        """Read in the table of contents for the ZIP file."""
        fp = self.fp
        try:
            endrec = _EndRecData(fp)
        except IOError:
            raise BadZipfile("File is not a zip file")
        if not endrec:
            raise BadZipfile, "File is not a zip file"
        if self.debug > 1:
            print endrec
        size_cd = endrec[_ECD_SIZE]             # bytes in central directory
        offset_cd = endrec[_ECD_OFFSET]         # offset of central directory
        self._comment = endrec[_ECD_COMMENT]    # archive comment

        # "concat" is zero, unless zip was concatenated to another file
        concat = endrec[_ECD_LOCATION] - size_cd - offset_cd
        if endrec[_ECD_SIGNATURE] == stringEndArchive64:
            # If Zip64 extension structures are present, account for them
            concat -= (sizeEndCentDir64 + sizeEndCentDir64Locator)

        if self.debug > 2:
            inferred = concat + offset_cd
            print "given, inferred, offset", offset_cd, inferred, concat
        # self.start_dir:  Position of start of central directory
        self.start_dir = offset_cd + concat
        fp.seek(self.start_dir, 0)
        data = fp.read(size_cd)
        fp = cStringIO.StringIO(data)
        total = 0
        while total < size_cd:
            centdir = fp.read(sizeCentralDir)
            if len(centdir) != sizeCentralDir:
                raise BadZipfile("Truncated central directory")
            centdir = struct.unpack(structCentralDir, centdir)
            if centdir[_CD_SIGNATURE] != stringCentralDir:
                raise BadZipfile("Bad magic number for central directory")
            if self.debug > 2:
                print centdir
            filename = fp.read(centdir[_CD_FILENAME_LENGTH])
            # Create ZipInfo instance to store file information
            x = ZipInfo(filename)
            x.extra = fp.read(centdir[_CD_EXTRA_FIELD_LENGTH])
            x.comment = fp.read(centdir[_CD_COMMENT_LENGTH])
            x.header_offset = centdir[_CD_LOCAL_HEADER_OFFSET]
            (x.create_version, x.create_system, x.extract_version, x.reserved,
                x.flag_bits, x.compress_type, t, d,
                x.CRC, x.compress_size, x.file_size) = centdir[1:12]
            x.volume, x.internal_attr, x.external_attr = centdir[15:18]
            # Convert date/time code to (year, month, day, hour, min, sec)
            x._raw_time = t
            x.date_time = ( (d>>9)+1980, (d>>5)&0xF, d&0x1F,
                                     t>>11, (t>>5)&0x3F, (t&0x1F) * 2 )

            x._decodeExtra()
            x.header_offset = x.header_offset + concat
            x.filename = x._decodeFilename()
            self.filelist.append(x)
            self.NameToInfo[x.filename] = x

            # update total bytes read from central directory
            total = (total + sizeCentralDir + centdir[_CD_FILENAME_LENGTH]
                     + centdir[_CD_EXTRA_FIELD_LENGTH]
                     + centdir[_CD_COMMENT_LENGTH])

            if self.debug > 2:
                print "total", total


    def namelist(self):
        """Return a list of file names in the archive."""
        l = []
        for data in self.filelist:
            l.append(data.filename)
        return l

    def infolist(self):
        """Return a list of class ZipInfo instances for files in the
        archive."""
        return self.filelist

    def printdir(self):
        """Print a table of contents for the zip file."""
        print "%-46s %19s %12s" % ("File Name", "Modified    ", "Size")
        for zinfo in self.filelist:
            date = "%d-%02d-%02d %02d:%02d:%02d" % zinfo.date_time[:6]
            print "%-46s %s %12d" % (zinfo.filename, date, zinfo.file_size)

    def testzip(self):
        """Read all the files and check the CRC."""
        chunk_size = 2 ** 20
        for zinfo in self.filelist:
            try:
                # Read by chunks, to avoid an OverflowError or a
                # MemoryError with very large embedded files.
                with self.open(zinfo.filename, "r") as f:
                    while f.read(chunk_size):     # Check CRC-32
                        pass
            except BadZipfile:
                return zinfo.filename

    def getinfo(self, name):
        """Return the instance of ZipInfo given 'name'."""
        info = self.NameToInfo.get(name)
        if info is None:
            raise KeyError(
                'There is no item named %r in the archive' % name)

        return info

    def setpassword(self, pwd):
        """Set default password for encrypted files."""
        self.pwd = pwd

    @property
    def comment(self):
        """The comment text associated with the ZIP file."""
        return self._comment

    @comment.setter
    def comment(self, comment):
        # check for valid comment length
        if len(comment) > ZIP_MAX_COMMENT:
            import warnings
            warnings.warn('Archive comment is too long; truncating to %d bytes'
                          % ZIP_MAX_COMMENT, stacklevel=2)
            comment = comment[:ZIP_MAX_COMMENT]
        self._comment = comment
        self._didModify = True

    def read(self, name, pwd=None):
        """Return file bytes (as a string) for name."""
        return self.open(name, "r", pwd).read()

    def open(self, name, mode="r", pwd=None):
        """Return file-like object for 'name'."""
        if mode not in ("r", "U", "rU"):
            raise RuntimeError, 'open() requires mode "r", "U", or "rU"'
        if not self.fp:
            raise RuntimeError, \
                  "Attempt to read ZIP archive that was already closed"

        # Only open a new file for instances where we were not
        # given a file object in the constructor
        if self._filePassed:
            zef_file = self.fp
            should_close = False
        else:
            zef_file = open(self.filename, 'rb')
            should_close = True

        try:
            # Make sure we have an info object
            if isinstance(name, ZipInfo):
                # 'name' is already an info object
                zinfo = name
            else:
                # Get info object for name
                zinfo = self.getinfo(name)

            zef_file.seek(zinfo.header_offset, 0)

            # Skip the file header:
            fheader = zef_file.read(sizeFileHeader)
            if len(fheader) != sizeFileHeader:
                raise BadZipfile("Truncated file header")
            fheader = struct.unpack(structFileHeader, fheader)
            if fheader[_FH_SIGNATURE] != stringFileHeader:
                raise BadZipfile("Bad magic number for file header")

            fname = zef_file.read(fheader[_FH_FILENAME_LENGTH])
            if fheader[_FH_EXTRA_FIELD_LENGTH]:
                zef_file.read(fheader[_FH_EXTRA_FIELD_LENGTH])

            if fname != zinfo.orig_filename:
                raise BadZipfile, \
                        'File name in directory "%s" and header "%s" differ.' % (
                            zinfo.orig_filename, fname)

            # check for encrypted flag & handle password
            is_encrypted = zinfo.flag_bits & 0x1
            zd = None
            if is_encrypted:
                if not pwd:
                    pwd = self.pwd
                if not pwd:
                    raise RuntimeError, "File %s is encrypted, " \
                        "password required for extraction" % name

                zd = _ZipDecrypter(pwd)
                # The first 12 bytes in the cypher stream is an encryption header
                #  used to strengthen the algorithm. The first 11 bytes are
                #  completely random, while the 12th contains the MSB of the CRC,
                #  or the MSB of the file time depending on the header type
                #  and is used to check the correctness of the password.
                bytes = zef_file.read(12)
                h = map(zd, bytes[0:12])
                if zinfo.flag_bits & 0x8:
                    # compare against the file type from extended local headers
                    check_byte = (zinfo._raw_time >> 8) & 0xff
                else:
                    # compare against the CRC otherwise
                    check_byte = (zinfo.CRC >> 24) & 0xff
                if ord(h[11]) != check_byte:
                    raise RuntimeError("Bad password for file", name)

            return ZipExtFile(zef_file, mode, zinfo, zd,
                    close_fileobj=should_close)
        except:
            if should_close:
                zef_file.close()
            raise

    def extract(self, member, path=None, pwd=None):
        """Extract a member from the archive to the current working directory,
           using its full name. Its file information is extracted as accurately
           as possible. `member' may be a filename or a ZipInfo object. You can
           specify a different directory using `path'.
        """
        if not isinstance(member, ZipInfo):
            member = self.getinfo(member)

        if path is None:
            path = os.getcwd()

        return self._extract_member(member, path, pwd)

    def extractall(self, path=None, members=None, pwd=None):
        """Extract all members from the archive to the current working
           directory. `path' specifies a different directory to extract to.
           `members' is optional and must be a subset of the list returned
           by namelist().
        """
        if members is None:
            members = self.namelist()

        for zipinfo in members:
            self.extract(zipinfo, path, pwd)

    def _extract_member(self, member, targetpath, pwd):
        """Extract the ZipInfo object 'member' to a physical
           file on the path targetpath.
        """
        # build the destination pathname, replacing
        # forward slashes to platform specific separators.
        arcname = member.filename.replace('/', os.path.sep)

        if os.path.altsep:
            arcname = arcname.replace(os.path.altsep, os.path.sep)
        # interpret absolute pathname as relative, remove drive letter or
        # UNC path, redundant separators, "." and ".." components.
        arcname = os.path.splitdrive(arcname)[1]
        arcname = os.path.sep.join(x for x in arcname.split(os.path.sep)
                    if x not in ('', os.path.curdir, os.path.pardir))
        if os.path.sep == '\\':
            # filter illegal characters on Windows
            illegal = ':<>|"?*'
            if isinstance(arcname, unicode):
                table = {ord(c): ord('_') for c in illegal}
            else:
                table = string.maketrans(illegal, '_' * len(illegal))
            arcname = arcname.translate(table)
            # remove trailing dots
            arcname = (x.rstrip('.') for x in arcname.split(os.path.sep))
            arcname = os.path.sep.join(x for x in arcname if x)

        targetpath = os.path.join(targetpath, arcname)
        targetpath = os.path.normpath(targetpath)

        # Create all upper directories if necessary.
        upperdirs = os.path.dirname(targetpath)
        if upperdirs and not os.path.exists(upperdirs):
            os.makedirs(upperdirs)

        if member.filename[-1] == '/':
            if not os.path.isdir(targetpath):
                os.mkdir(targetpath)
            return targetpath

        with self.open(member, pwd=pwd) as source, \
             file(targetpath, "wb") as target:
            shutil.copyfileobj(source, target)

        return targetpath

    def _writecheck(self, zinfo):
        """Check for errors before writing a file to the archive."""
        if zinfo.filename in self.NameToInfo:
            import warnings
            warnings.warn('Duplicate name: %r' % zinfo.filename, stacklevel=3)
        if self.mode not in ("w", "a"):
            raise RuntimeError, 'write() requires mode "w" or "a"'
        if not self.fp:
            raise RuntimeError, \
                  "Attempt to write ZIP archive that was already closed"
        if zinfo.compress_type == ZIP_DEFLATED and not zlib:
            raise RuntimeError, \
                  "Compression requires the (missing) zlib module"
        if zinfo.compress_type not in (ZIP_STORED, ZIP_DEFLATED):
            raise RuntimeError, \
                  "That compression method is not supported"
        if not self._allowZip64:
            requires_zip64 = None
            if len(self.filelist) >= ZIP_FILECOUNT_LIMIT:
                requires_zip64 = "Files count"
            elif zinfo.file_size > ZIP64_LIMIT:
                requires_zip64 = "Filesize"
            elif zinfo.header_offset > ZIP64_LIMIT:
                requires_zip64 = "Zipfile size"
            if requires_zip64:
                raise LargeZipFile(requires_zip64 +
                                   " would require ZIP64 extensions")

    def write(self, filename, arcname=None, compress_type=None):
        """Put the bytes from filename into the archive under the name
        arcname."""
        if not self.fp:
            raise RuntimeError(
                  "Attempt to write to ZIP archive that was already closed")

        st = os.stat(filename)
        isdir = stat.S_ISDIR(st.st_mode)
        mtime = time.localtime(st.st_mtime)
        date_time = mtime[0:6]
        # Create ZipInfo instance to store file information
        if arcname is None:
            arcname = filename
        arcname = os.path.normpath(os.path.splitdrive(arcname)[1])
        while arcname[0] in (os.sep, os.altsep):
            arcname = arcname[1:]
        if isdir:
            arcname += '/'
        zinfo = ZipInfo(arcname, date_time)
        zinfo.external_attr = (st[0] & 0xFFFF) << 16L      # Unix attributes
        if compress_type is None:
            zinfo.compress_type = self.compression
        else:
            zinfo.compress_type = compress_type

        zinfo.file_size = st.st_size
        zinfo.flag_bits = 0x00
        zinfo.header_offset = self.fp.tell()    # Start of header bytes

        self._writecheck(zinfo)
        self._didModify = True

        if isdir:
            zinfo.file_size = 0
            zinfo.compress_size = 0
            zinfo.CRC = 0
            zinfo.external_attr |= 0x10  # MS-DOS directory flag
            self.filelist.append(zinfo)
            self.NameToInfo[zinfo.filename] = zinfo
            self.fp.write(zinfo.FileHeader(False))
            return

        with open(filename, "rb") as fp:
            # Must overwrite CRC and sizes with correct data later
            zinfo.CRC = CRC = 0
            zinfo.compress_size = compress_size = 0
            # Compressed size can be larger than uncompressed size
            zip64 = self._allowZip64 and \
                    zinfo.file_size * 1.05 > ZIP64_LIMIT
            self.fp.write(zinfo.FileHeader(zip64))
            if zinfo.compress_type == ZIP_DEFLATED:
                cmpr = zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION,
                     zlib.DEFLATED, -15)
            else:
                cmpr = None
            file_size = 0
            while 1:
                buf = fp.read(1024 * 8)
                if not buf:
                    break
                file_size = file_size + len(buf)
                CRC = crc32(buf, CRC) & 0xffffffff
                if cmpr:
                    buf = cmpr.compress(buf)
                    compress_size = compress_size + len(buf)
                self.fp.write(buf)
        if cmpr:
            buf = cmpr.flush()
            compress_size = compress_size + len(buf)
            self.fp.write(buf)
            zinfo.compress_size = compress_size
        else:
            zinfo.compress_size = file_size
        zinfo.CRC = CRC
        zinfo.file_size = file_size
        if not zip64 and self._allowZip64:
            if file_size > ZIP64_LIMIT:
                raise RuntimeError('File size has increased during compressing')
            if compress_size > ZIP64_LIMIT:
                raise RuntimeError('Compressed size larger than uncompressed size')
        # Seek backwards and write file header (which will now include
        # correct CRC and file sizes)
        position = self.fp.tell()       # Preserve current position in file
        self.fp.seek(zinfo.header_offset, 0)
        self.fp.write(zinfo.FileHeader(zip64))
        self.fp.seek(position, 0)
        self.filelist.append(zinfo)
        self.NameToInfo[zinfo.filename] = zinfo

    def writestr(self, zinfo_or_arcname, bytes, compress_type=None):
        """Write a file into the archive.  The contents is the string
        'bytes'.  'zinfo_or_arcname' is either a ZipInfo instance or
        the name of the file in the archive."""
        if not isinstance(zinfo_or_arcname, ZipInfo):
            zinfo = ZipInfo(filename=zinfo_or_arcname,
                            date_time=time.localtime(time.time())[:6])

            zinfo.compress_type = self.compression
            if zinfo.filename[-1] == '/':
                zinfo.external_attr = 0o40775 << 16   # drwxrwxr-x
                zinfo.external_attr |= 0x10           # MS-DOS directory flag
            else:
                zinfo.external_attr = 0o600 << 16     # ?rw-------
        else:
            zinfo = zinfo_or_arcname

        if not self.fp:
            raise RuntimeError(
                  "Attempt to write to ZIP archive that was already closed")

        if compress_type is not None:
            zinfo.compress_type = compress_type

        zinfo.file_size = len(bytes)            # Uncompressed size
        zinfo.header_offset = self.fp.tell()    # Start of header bytes
        self._writecheck(zinfo)
        self._didModify = True
        zinfo.CRC = crc32(bytes) & 0xffffffff       # CRC-32 checksum
        if zinfo.compress_type == ZIP_DEFLATED:
            co = zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION,
                 zlib.DEFLATED, -15)
            bytes = co.compress(bytes) + co.flush()
            zinfo.compress_size = len(bytes)    # Compressed size
        else:
            zinfo.compress_size = zinfo.file_size
        zip64 = zinfo.file_size > ZIP64_LIMIT or \
                zinfo.compress_size > ZIP64_LIMIT
        if zip64 and not self._allowZip64:
            raise LargeZipFile("Filesize would require ZIP64 extensions")
        self.fp.write(zinfo.FileHeader(zip64))
        self.fp.write(bytes)
        if zinfo.flag_bits & 0x08:
            # Write CRC and file sizes after the file data
            fmt = '<LQQ' if zip64 else '<LLL'
            self.fp.write(struct.pack(fmt, zinfo.CRC, zinfo.compress_size,
                  zinfo.file_size))
        self.fp.flush()
        self.filelist.append(zinfo)
        self.NameToInfo[zinfo.filename] = zinfo

    def __del__(self):
        """Call the "close()" method in case the user forgot."""
        self.close()

    def close(self):
        """Close the file, and for mode "w" and "a" write the ending
        records."""
        if self.fp is None:
            return

        try:
            if self.mode in ("w", "a") and self._didModify: # write ending records
                pos1 = self.fp.tell()
                for zinfo in self.filelist:         # write central directory
                    dt = zinfo.date_time
                    dosdate = (dt[0] - 1980) << 9 | dt[1] << 5 | dt[2]
                    dostime = dt[3] << 11 | dt[4] << 5 | (dt[5] // 2)
                    extra = []
                    if zinfo.file_size > ZIP64_LIMIT \
                            or zinfo.compress_size > ZIP64_LIMIT:
                        extra.append(zinfo.file_size)
                        extra.append(zinfo.compress_size)
                        file_size = 0xffffffff
                        compress_size = 0xffffffff
                    else:
                        file_size = zinfo.file_size
                        compress_size = zinfo.compress_size

                    if zinfo.header_offset > ZIP64_LIMIT:
                        extra.append(zinfo.header_offset)
                        header_offset = 0xffffffffL
                    else:
                        header_offset = zinfo.header_offset

                    extra_data = zinfo.extra
                    if extra:
                        # Append a ZIP64 field to the extra's
                        extra_data = struct.pack(
                                '<HH' + 'Q'*len(extra),
                                1, 8*len(extra), *extra) + extra_data

                        extract_version = max(45, zinfo.extract_version)
                        create_version = max(45, zinfo.create_version)
                    else:
                        extract_version = zinfo.extract_version
                        create_version = zinfo.create_version

                    try:
                        filename, flag_bits = zinfo._encodeFilenameFlags()
                        centdir = struct.pack(structCentralDir,
                        stringCentralDir, create_version,
                        zinfo.create_system, extract_version, zinfo.reserved,
                        flag_bits, zinfo.compress_type, dostime, dosdate,
                        zinfo.CRC, compress_size, file_size,
                        len(filename), len(extra_data), len(zinfo.comment),
                        0, zinfo.internal_attr, zinfo.external_attr,
                        header_offset)
                    except DeprecationWarning:
                        print >>sys.stderr, (structCentralDir,
                        stringCentralDir, create_version,
                        zinfo.create_system, extract_version, zinfo.reserved,
                        zinfo.flag_bits, zinfo.compress_type, dostime, dosdate,
                        zinfo.CRC, compress_size, file_size,
                        len(zinfo.filename), len(extra_data), len(zinfo.comment),
                        0, zinfo.internal_attr, zinfo.external_attr,
                        header_offset)
                        raise
                    self.fp.write(centdir)
                    self.fp.write(filename)
                    self.fp.write(extra_data)
                    self.fp.write(zinfo.comment)

                pos2 = self.fp.tell()
                # Write end-of-zip-archive record
                centDirCount = len(self.filelist)
                centDirSize = pos2 - pos1
                centDirOffset = pos1
                requires_zip64 = None
                if centDirCount > ZIP_FILECOUNT_LIMIT:
                    requires_zip64 = "Files count"
                elif centDirOffset > ZIP64_LIMIT:
                    requires_zip64 = "Central directory offset"
                elif centDirSize > ZIP64_LIMIT:
                    requires_zip64 = "Central directory size"
                if requires_zip64:
                    # Need to write the ZIP64 end-of-archive records
                    if not self._allowZip64:
                        raise LargeZipFile(requires_zip64 +
                                           " would require ZIP64 extensions")
                    zip64endrec = struct.pack(
                            structEndArchive64, stringEndArchive64,
                            44, 45, 45, 0, 0, centDirCount, centDirCount,
                            centDirSize, centDirOffset)
                    self.fp.write(zip64endrec)

                    zip64locrec = struct.pack(
                            structEndArchive64Locator,
                            stringEndArchive64Locator, 0, pos2, 1)
                    self.fp.write(zip64locrec)
                    centDirCount = min(centDirCount, 0xFFFF)
                    centDirSize = min(centDirSize, 0xFFFFFFFF)
                    centDirOffset = min(centDirOffset, 0xFFFFFFFF)

                endrec = struct.pack(structEndArchive, stringEndArchive,
                                    0, 0, centDirCount, centDirCount,
                                    centDirSize, centDirOffset, len(self._comment))
                self.fp.write(endrec)
                self.fp.write(self._comment)
                self.fp.flush()
        finally:
            fp = self.fp
            self.fp = None
            if not self._filePassed:
                fp.close()

ZipFile 源码

class TarFile(object):
    """The TarFile Class provides an interface to tar archives.
    """

    debug = 0                   # May be set from 0 (no msgs) to 3 (all msgs)

    dereference = False         # If true, add content of linked file to the
                                # tar file, else the link.

    ignore_zeros = False        # If true, skips empty or invalid blocks and
                                # continues processing.

    errorlevel = 1              # If 0, fatal errors only appear in debug
                                # messages (if debug >= 0). If > 0, errors
                                # are passed to the caller as exceptions.

    format = DEFAULT_FORMAT     # The format to use when creating an archive.

    encoding = ENCODING         # Encoding for 8-bit character strings.

    errors = None               # Error handler for unicode conversion.

    tarinfo = TarInfo           # The default TarInfo class to use.

    fileobject = ExFileObject   # The default ExFileObject class to use.

    def __init__(self, name=None, mode="r", fileobj=None, format=None,
            tarinfo=None, dereference=None, ignore_zeros=None, encoding=None,
            errors=None, pax_headers=None, debug=None, errorlevel=None):
        """Open an (uncompressed) tar archive `name'. `mode' is either 'r' to
           read from an existing archive, 'a' to append data to an existing
           file or 'w' to create a new file overwriting an existing one. `mode'
           defaults to 'r'.
           If `fileobj' is given, it is used for reading or writing data. If it
           can be determined, `mode' is overridden by `fileobj's mode.
           `fileobj' is not closed, when TarFile is closed.
        """
        modes = {"r": "rb", "a": "r+b", "w": "wb"}
        if mode not in modes:
            raise ValueError("mode must be 'r', 'a' or 'w'")
        self.mode = mode
        self._mode = modes[mode]

        if not fileobj:
            if self.mode == "a" and not os.path.exists(name):
                # Create nonexistent files in append mode.
                self.mode = "w"
                self._mode = "wb"
            fileobj = bltn_open(name, self._mode)
            self._extfileobj = False
        else:
            if name is None and hasattr(fileobj, "name"):
                name = fileobj.name
            if hasattr(fileobj, "mode"):
                self._mode = fileobj.mode
            self._extfileobj = True
        self.name = os.path.abspath(name) if name else None
        self.fileobj = fileobj

        # Init attributes.
        if format is not None:
            self.format = format
        if tarinfo is not None:
            self.tarinfo = tarinfo
        if dereference is not None:
            self.dereference = dereference
        if ignore_zeros is not None:
            self.ignore_zeros = ignore_zeros
        if encoding is not None:
            self.encoding = encoding

        if errors is not None:
            self.errors = errors
        elif mode == "r":
            self.errors = "utf-8"
        else:
            self.errors = "strict"

        if pax_headers is not None and self.format == PAX_FORMAT:
            self.pax_headers = pax_headers
        else:
            self.pax_headers = {}

        if debug is not None:
            self.debug = debug
        if errorlevel is not None:
            self.errorlevel = errorlevel

        # Init datastructures.
        self.closed = False
        self.members = []       # list of members as TarInfo objects
        self._loaded = False    # flag if all members have been read
        self.offset = self.fileobj.tell()
                                # current position in the archive file
        self.inodes = {}        # dictionary caching the inodes of
                                # archive members already added

        try:
            if self.mode == "r":
                self.firstmember = None
                self.firstmember = self.next()

            if self.mode == "a":
                # Move to the end of the archive,
                # before the first empty block.
                while True:
                    self.fileobj.seek(self.offset)
                    try:
                        tarinfo = self.tarinfo.fromtarfile(self)
                        self.members.append(tarinfo)
                    except EOFHeaderError:
                        self.fileobj.seek(self.offset)
                        break
                    except HeaderError, e:
                        raise ReadError(str(e))

            if self.mode in "aw":
                self._loaded = True

                if self.pax_headers:
                    buf = self.tarinfo.create_pax_global_header(self.pax_headers.copy())
                    self.fileobj.write(buf)
                    self.offset += len(buf)
        except:
            if not self._extfileobj:
                self.fileobj.close()
            self.closed = True
            raise

    def _getposix(self):
        return self.format == USTAR_FORMAT
    def _setposix(self, value):
        import warnings
        warnings.warn("use the format attribute instead", DeprecationWarning,
                      2)
        if value:
            self.format = USTAR_FORMAT
        else:
            self.format = GNU_FORMAT
    posix = property(_getposix, _setposix)

    #--------------------------------------------------------------------------
    # Below are the classmethods which act as alternate constructors to the
    # TarFile class. The open() method is the only one that is needed for
    # public use; it is the "super"-constructor and is able to select an
    # adequate "sub"-constructor for a particular compression using the mapping
    # from OPEN_METH.
    #
    # This concept allows one to subclass TarFile without losing the comfort of
    # the super-constructor. A sub-constructor is registered and made available
    # by adding it to the mapping in OPEN_METH.

    @classmethod
    def open(cls, name=None, mode="r", fileobj=None, bufsize=RECORDSIZE, **kwargs):
        """Open a tar archive for reading, writing or appending. Return
           an appropriate TarFile class.

           mode:
           'r' or 'r:*' open for reading with transparent compression
           'r:'         open for reading exclusively uncompressed
           'r:gz'       open for reading with gzip compression
           'r:bz2'      open for reading with bzip2 compression
           'a' or 'a:'  open for appending, creating the file if necessary
           'w' or 'w:'  open for writing without compression
           'w:gz'       open for writing with gzip compression
           'w:bz2'      open for writing with bzip2 compression

           'r|*'        open a stream of tar blocks with transparent compression
           'r|'         open an uncompressed stream of tar blocks for reading
           'r|gz'       open a gzip compressed stream of tar blocks
           'r|bz2'      open a bzip2 compressed stream of tar blocks
           'w|'         open an uncompressed stream for writing
           'w|gz'       open a gzip compressed stream for writing
           'w|bz2'      open a bzip2 compressed stream for writing
        """

        if not name and not fileobj:
            raise ValueError("nothing to open")

        if mode in ("r", "r:*"):
            # Find out which *open() is appropriate for opening the file.
            for comptype in cls.OPEN_METH:
                func = getattr(cls, cls.OPEN_METH[comptype])
                if fileobj is not None:
                    saved_pos = fileobj.tell()
                try:
                    return func(name, "r", fileobj, **kwargs)
                except (ReadError, CompressionError), e:
                    if fileobj is not None:
                        fileobj.seek(saved_pos)
                    continue
            raise ReadError("file could not be opened successfully")

        elif ":" in mode:
            filemode, comptype = mode.split(":", 1)
            filemode = filemode or "r"
            comptype = comptype or "tar"

            # Select the *open() function according to
            # given compression.
            if comptype in cls.OPEN_METH:
                func = getattr(cls, cls.OPEN_METH[comptype])
            else:
                raise CompressionError("unknown compression type %r" % comptype)
            return func(name, filemode, fileobj, **kwargs)

        elif "|" in mode:
            filemode, comptype = mode.split("|", 1)
            filemode = filemode or "r"
            comptype = comptype or "tar"

            if filemode not in ("r", "w"):
                raise ValueError("mode must be 'r' or 'w'")

            stream = _Stream(name, filemode, comptype, fileobj, bufsize)
            try:
                t = cls(name, filemode, stream, **kwargs)
            except:
                stream.close()
                raise
            t._extfileobj = False
            return t

        elif mode in ("a", "w"):
            return cls.taropen(name, mode, fileobj, **kwargs)

        raise ValueError("undiscernible mode")

    @classmethod
    def taropen(cls, name, mode="r", fileobj=None, **kwargs):
        """Open uncompressed tar archive name for reading or writing.
        """
        if mode not in ("r", "a", "w"):
            raise ValueError("mode must be 'r', 'a' or 'w'")
        return cls(name, mode, fileobj, **kwargs)

    @classmethod
    def gzopen(cls, name, mode="r", fileobj=None, compresslevel=9, **kwargs):
        """Open gzip compressed tar archive name for reading or writing.
           Appending is not allowed.
        """
        if mode not in ("r", "w"):
            raise ValueError("mode must be 'r' or 'w'")

        try:
            import gzip
            gzip.GzipFile
        except (ImportError, AttributeError):
            raise CompressionError("gzip module is not available")

        try:
            fileobj = gzip.GzipFile(name, mode, compresslevel, fileobj)
        except OSError:
            if fileobj is not None and mode == 'r':
                raise ReadError("not a gzip file")
            raise

        try:
            t = cls.taropen(name, mode, fileobj, **kwargs)
        except IOError:
            fileobj.close()
            if mode == 'r':
                raise ReadError("not a gzip file")
            raise
        except:
            fileobj.close()
            raise
        t._extfileobj = False
        return t

    @classmethod
    def bz2open(cls, name, mode="r", fileobj=None, compresslevel=9, **kwargs):
        """Open bzip2 compressed tar archive name for reading or writing.
           Appending is not allowed.
        """
        if mode not in ("r", "w"):
            raise ValueError("mode must be 'r' or 'w'.")

        try:
            import bz2
        except ImportError:
            raise CompressionError("bz2 module is not available")

        if fileobj is not None:
            fileobj = _BZ2Proxy(fileobj, mode)
        else:
            fileobj = bz2.BZ2File(name, mode, compresslevel=compresslevel)

        try:
            t = cls.taropen(name, mode, fileobj, **kwargs)
        except (IOError, EOFError):
            fileobj.close()
            if mode == 'r':
                raise ReadError("not a bzip2 file")
            raise
        except:
            fileobj.close()
            raise
        t._extfileobj = False
        return t

    # All *open() methods are registered here.
    OPEN_METH = {
        "tar": "taropen",   # uncompressed tar
        "gz":  "gzopen",    # gzip compressed tar
        "bz2": "bz2open"    # bzip2 compressed tar
    }

    #--------------------------------------------------------------------------
    # The public methods which TarFile provides:

    def close(self):
        """Close the TarFile. In write-mode, two finishing zero blocks are
           appended to the archive.
        """
        if self.closed:
            return

        if self.mode in "aw":
            self.fileobj.write(NUL * (BLOCKSIZE * 2))
            self.offset += (BLOCKSIZE * 2)
            # fill up the end with zero-blocks
            # (like option -b20 for tar does)
            blocks, remainder = divmod(self.offset, RECORDSIZE)
            if remainder > 0:
                self.fileobj.write(NUL * (RECORDSIZE - remainder))

        if not self._extfileobj:
            self.fileobj.close()
        self.closed = True

    def getmember(self, name):
        """Return a TarInfo object for member `name'. If `name' can not be
           found in the archive, KeyError is raised. If a member occurs more
           than once in the archive, its last occurrence is assumed to be the
           most up-to-date version.
        """
        tarinfo = self._getmember(name)
        if tarinfo is None:
            raise KeyError("filename %r not found" % name)
        return tarinfo

    def getmembers(self):
        """Return the members of the archive as a list of TarInfo objects. The
           list has the same order as the members in the archive.
        """
        self._check()
        if not self._loaded:    # if we want to obtain a list of
            self._load()        # all members, we first have to
                                # scan the whole archive.
        return self.members

    def getnames(self):
        """Return the members of the archive as a list of their names. It has
           the same order as the list returned by getmembers().
        """
        return [tarinfo.name for tarinfo in self.getmembers()]

    def gettarinfo(self, name=None, arcname=None, fileobj=None):
        """Create a TarInfo object for either the file `name' or the file
           object `fileobj' (using os.fstat on its file descriptor). You can
           modify some of the TarInfo's attributes before you add it using
           addfile(). If given, `arcname' specifies an alternative name for the
           file in the archive.
        """
        self._check("aw")

        # When fileobj is given, replace name by
        # fileobj's real name.
        if fileobj is not None:
            name = fileobj.name

        # Building the name of the member in the archive.
        # Backward slashes are converted to forward slashes,
        # Absolute paths are turned to relative paths.
        if arcname is None:
            arcname = name
        drv, arcname = os.path.splitdrive(arcname)
        arcname = arcname.replace(os.sep, "/")
        arcname = arcname.lstrip("/")

        # Now, fill the TarInfo object with
        # information specific for the file.
        tarinfo = self.tarinfo()
        tarinfo.tarfile = self

        # Use os.stat or os.lstat, depending on platform
        # and if symlinks shall be resolved.
        if fileobj is None:
            if hasattr(os, "lstat") and not self.dereference:
                statres = os.lstat(name)
            else:
                statres = os.stat(name)
        else:
            statres = os.fstat(fileobj.fileno())
        linkname = ""

        stmd = statres.st_mode
        if stat.S_ISREG(stmd):
            inode = (statres.st_ino, statres.st_dev)
            if not self.dereference and statres.st_nlink > 1 and \
                    inode in self.inodes and arcname != self.inodes[inode]:
                # Is it a hardlink to an already
                # archived file?
                type = LNKTYPE
                linkname = self.inodes[inode]
            else:
                # The inode is added only if its valid.
                # For win32 it is always 0.
                type = REGTYPE
                if inode[0]:
                    self.inodes[inode] = arcname
        elif stat.S_ISDIR(stmd):
            type = DIRTYPE
        elif stat.S_ISFIFO(stmd):
            type = FIFOTYPE
        elif stat.S_ISLNK(stmd):
            type = SYMTYPE
            linkname = os.readlink(name)
        elif stat.S_ISCHR(stmd):
            type = CHRTYPE
        elif stat.S_ISBLK(stmd):
            type = BLKTYPE
        else:
            return None

        # Fill the TarInfo object with all
        # information we can get.
        tarinfo.name = arcname
        tarinfo.mode = stmd
        tarinfo.uid = statres.st_uid
        tarinfo.gid = statres.st_gid
        if type == REGTYPE:
            tarinfo.size = statres.st_size
        else:
            tarinfo.size = 0L
        tarinfo.mtime = statres.st_mtime
        tarinfo.type = type
        tarinfo.linkname = linkname
        if pwd:
            try:
                tarinfo.uname = pwd.getpwuid(tarinfo.uid)[0]
            except KeyError:
                pass
        if grp:
            try:
                tarinfo.gname = grp.getgrgid(tarinfo.gid)[0]
            except KeyError:
                pass

        if type in (CHRTYPE, BLKTYPE):
            if hasattr(os, "major") and hasattr(os, "minor"):
                tarinfo.devmajor = os.major(statres.st_rdev)
                tarinfo.devminor = os.minor(statres.st_rdev)
        return tarinfo

    def list(self, verbose=True):
        """Print a table of contents to sys.stdout. If `verbose' is False, only
           the names of the members are printed. If it is True, an `ls -l'-like
           output is produced.
        """
        self._check()

        for tarinfo in self:
            if verbose:
                print filemode(tarinfo.mode),
                print "%s/%s" % (tarinfo.uname or tarinfo.uid,
                                 tarinfo.gname or tarinfo.gid),
                if tarinfo.ischr() or tarinfo.isblk():
                    print "%10s" % ("%d,%d" \
                                    % (tarinfo.devmajor, tarinfo.devminor)),
                else:
                    print "%10d" % tarinfo.size,
                print "%d-%02d-%02d %02d:%02d:%02d" \
                      % time.localtime(tarinfo.mtime)[:6],

            print tarinfo.name + ("/" if tarinfo.isdir() else ""),

            if verbose:
                if tarinfo.issym():
                    print "->", tarinfo.linkname,
                if tarinfo.islnk():
                    print "link to", tarinfo.linkname,
            print

    def add(self, name, arcname=None, recursive=True, exclude=None, filter=None):
        """Add the file `name' to the archive. `name' may be any type of file
           (directory, fifo, symbolic link, etc.). If given, `arcname'
           specifies an alternative name for the file in the archive.
           Directories are added recursively by default. This can be avoided by
           setting `recursive' to False. `exclude' is a function that should
           return True for each filename to be excluded. `filter' is a function
           that expects a TarInfo object argument and returns the changed
           TarInfo object, if it returns None the TarInfo object will be
           excluded from the archive.
        """
        self._check("aw")

        if arcname is None:
            arcname = name

        # Exclude pathnames.
        if exclude is not None:
            import warnings
            warnings.warn("use the filter argument instead",
                    DeprecationWarning, 2)
            if exclude(name):
                self._dbg(2, "tarfile: Excluded %r" % name)
                return

        # Skip if somebody tries to archive the archive...
        if self.name is not None and os.path.abspath(name) == self.name:
            self._dbg(2, "tarfile: Skipped %r" % name)
            return

        self._dbg(1, name)

        # Create a TarInfo object from the file.
        tarinfo = self.gettarinfo(name, arcname)

        if tarinfo is None:
            self._dbg(1, "tarfile: Unsupported type %r" % name)
            return

        # Change or exclude the TarInfo object.
        if filter is not None:
            tarinfo = filter(tarinfo)
            if tarinfo is None:
                self._dbg(2, "tarfile: Excluded %r" % name)
                return

        # Append the tar header and data to the archive.
        if tarinfo.isreg():
            with bltn_open(name, "rb") as f:
                self.addfile(tarinfo, f)

        elif tarinfo.isdir():
            self.addfile(tarinfo)
            if recursive:
                for f in os.listdir(name):
                    self.add(os.path.join(name, f), os.path.join(arcname, f),
                            recursive, exclude, filter)

        else:
            self.addfile(tarinfo)

    def addfile(self, tarinfo, fileobj=None):
        """Add the TarInfo object `tarinfo' to the archive. If `fileobj' is
           given, tarinfo.size bytes are read from it and added to the archive.
           You can create TarInfo objects using gettarinfo().
           On Windows platforms, `fileobj' should always be opened with mode
           'rb' to avoid irritation about the file size.
        """
        self._check("aw")

        tarinfo = copy.copy(tarinfo)

        buf = tarinfo.tobuf(self.format, self.encoding, self.errors)
        self.fileobj.write(buf)
        self.offset += len(buf)

        # If there's data to follow, append it.
        if fileobj is not None:
            copyfileobj(fileobj, self.fileobj, tarinfo.size)
            blocks, remainder = divmod(tarinfo.size, BLOCKSIZE)
            if remainder > 0:
                self.fileobj.write(NUL * (BLOCKSIZE - remainder))
                blocks += 1
            self.offset += blocks * BLOCKSIZE

        self.members.append(tarinfo)

    def extractall(self, path=".", members=None):
        """Extract all members from the archive to the current working
           directory and set owner, modification time and permissions on
           directories afterwards. `path' specifies a different directory
           to extract to. `members' is optional and must be a subset of the
           list returned by getmembers().
        """
        directories = []

        if members is None:
            members = self

        for tarinfo in members:
            if tarinfo.isdir():
                # Extract directories with a safe mode.
                directories.append(tarinfo)
                tarinfo = copy.copy(tarinfo)
                tarinfo.mode = 0700
            self.extract(tarinfo, path)

        # Reverse sort directories.
        directories.sort(key=operator.attrgetter('name'))
        directories.reverse()

        # Set correct owner, mtime and filemode on directories.
        for tarinfo in directories:
            dirpath = os.path.join(path, tarinfo.name)
            try:
                self.chown(tarinfo, dirpath)
                self.utime(tarinfo, dirpath)
                self.chmod(tarinfo, dirpath)
            except ExtractError, e:
                if self.errorlevel > 1:
                    raise
                else:
                    self._dbg(1, "tarfile: %s" % e)

    def extract(self, member, path=""):
        """Extract a member from the archive to the current working directory,
           using its full name. Its file information is extracted as accurately
           as possible. `member' may be a filename or a TarInfo object. You can
           specify a different directory using `path'.
        """
        self._check("r")

        if isinstance(member, basestring):
            tarinfo = self.getmember(member)
        else:
            tarinfo = member

        # Prepare the link target for makelink().
        if tarinfo.islnk():
            tarinfo._link_target = os.path.join(path, tarinfo.linkname)

        try:
            self._extract_member(tarinfo, os.path.join(path, tarinfo.name))
        except EnvironmentError, e:
            if self.errorlevel > 0:
                raise
            else:
                if e.filename is None:
                    self._dbg(1, "tarfile: %s" % e.strerror)
                else:
                    self._dbg(1, "tarfile: %s %r" % (e.strerror, e.filename))
        except ExtractError, e:
            if self.errorlevel > 1:
                raise
            else:
                self._dbg(1, "tarfile: %s" % e)

    def extractfile(self, member):
        """Extract a member from the archive as a file object. `member' may be
           a filename or a TarInfo object. If `member' is a regular file, a
           file-like object is returned. If `member' is a link, a file-like
           object is constructed from the link's target. If `member' is none of
           the above, None is returned.
           The file-like object is read-only and provides the following
           methods: read(), readline(), readlines(), seek() and tell()
        """
        self._check("r")

        if isinstance(member, basestring):
            tarinfo = self.getmember(member)
        else:
            tarinfo = member

        if tarinfo.isreg():
            return self.fileobject(self, tarinfo)

        elif tarinfo.type not in SUPPORTED_TYPES:
            # If a member's type is unknown, it is treated as a
            # regular file.
            return self.fileobject(self, tarinfo)

        elif tarinfo.islnk() or tarinfo.issym():
            if isinstance(self.fileobj, _Stream):
                # A small but ugly workaround for the case that someone tries
                # to extract a (sym)link as a file-object from a non-seekable
                # stream of tar blocks.
                raise StreamError("cannot extract (sym)link as file object")
            else:
                # A (sym)link's file object is its target's file object.
                return self.extractfile(self._find_link_target(tarinfo))
        else:
            # If there's no data associated with the member (directory, chrdev,
            # blkdev, etc.), return None instead of a file object.
            return None

    def _extract_member(self, tarinfo, targetpath):
        """Extract the TarInfo object tarinfo to a physical
           file called targetpath.
        """
        # Fetch the TarInfo object for the given name
        # and build the destination pathname, replacing
        # forward slashes to platform specific separators.
        targetpath = targetpath.rstrip("/")
        targetpath = targetpath.replace("/", os.sep)

        # Create all upper directories.
        upperdirs = os.path.dirname(targetpath)
        if upperdirs and not os.path.exists(upperdirs):
            # Create directories that are not part of the archive with
            # default permissions.
            os.makedirs(upperdirs)

        if tarinfo.islnk() or tarinfo.issym():
            self._dbg(1, "%s -> %s" % (tarinfo.name, tarinfo.linkname))
        else:
            self._dbg(1, tarinfo.name)

        if tarinfo.isreg():
            self.makefile(tarinfo, targetpath)
        elif tarinfo.isdir():
            self.makedir(tarinfo, targetpath)
        elif tarinfo.isfifo():
            self.makefifo(tarinfo, targetpath)
        elif tarinfo.ischr() or tarinfo.isblk():
            self.makedev(tarinfo, targetpath)
        elif tarinfo.islnk() or tarinfo.issym():
            self.makelink(tarinfo, targetpath)
        elif tarinfo.type not in SUPPORTED_TYPES:
            self.makeunknown(tarinfo, targetpath)
        else:
            self.makefile(tarinfo, targetpath)

        self.chown(tarinfo, targetpath)
        if not tarinfo.issym():
            self.chmod(tarinfo, targetpath)
            self.utime(tarinfo, targetpath)

    #--------------------------------------------------------------------------
    # Below are the different file methods. They are called via
    # _extract_member() when extract() is called. They can be replaced in a
    # subclass to implement other functionality.

    def makedir(self, tarinfo, targetpath):
        """Make a directory called targetpath.
        """
        try:
            # Use a safe mode for the directory, the real mode is set
            # later in _extract_member().
            os.mkdir(targetpath, 0700)
        except EnvironmentError, e:
            if e.errno != errno.EEXIST:
                raise

    def makefile(self, tarinfo, targetpath):
        """Make a file called targetpath.
        """
        source = self.extractfile(tarinfo)
        try:
            with bltn_open(targetpath, "wb") as target:
                copyfileobj(source, target)
        finally:
            source.close()

    def makeunknown(self, tarinfo, targetpath):
        """Make a file from a TarInfo object with an unknown type
           at targetpath.
        """
        self.makefile(tarinfo, targetpath)
        self._dbg(1, "tarfile: Unknown file type %r, " \
                     "extracted as regular file." % tarinfo.type)

    def makefifo(self, tarinfo, targetpath):
        """Make a fifo called targetpath.
        """
        if hasattr(os, "mkfifo"):
            os.mkfifo(targetpath)
        else:
            raise ExtractError("fifo not supported by system")

    def makedev(self, tarinfo, targetpath):
        """Make a character or block device called targetpath.
        """
        if not hasattr(os, "mknod") or not hasattr(os, "makedev"):
            raise ExtractError("special devices not supported by system")

        mode = tarinfo.mode
        if tarinfo.isblk():
            mode |= stat.S_IFBLK
        else:
            mode |= stat.S_IFCHR

        os.mknod(targetpath, mode,
                 os.makedev(tarinfo.devmajor, tarinfo.devminor))

    def makelink(self, tarinfo, targetpath):
        """Make a (symbolic) link called targetpath. If it cannot be created
          (platform limitation), we try to make a copy of the referenced file
          instead of a link.
        """
        if hasattr(os, "symlink") and hasattr(os, "link"):
            # For systems that support symbolic and hard links.
            if tarinfo.issym():
                if os.path.lexists(targetpath):
                    os.unlink(targetpath)
                os.symlink(tarinfo.linkname, targetpath)
            else:
                # See extract().
                if os.path.exists(tarinfo._link_target):
                    if os.path.lexists(targetpath):
                        os.unlink(targetpath)
                    os.link(tarinfo._link_target, targetpath)
                else:
                    self._extract_member(self._find_link_target(tarinfo), targetpath)
        else:
            try:
                self._extract_member(self._find_link_target(tarinfo), targetpath)
            except KeyError:
                raise ExtractError("unable to resolve link inside archive")

    def chown(self, tarinfo, targetpath):
        """Set owner of targetpath according to tarinfo.
        """
        if pwd and hasattr(os, "geteuid") and os.geteuid() == 0:
            # We have to be root to do so.
            try:
                g = grp.getgrnam(tarinfo.gname)[2]
            except KeyError:
                g = tarinfo.gid
            try:
                u = pwd.getpwnam(tarinfo.uname)[2]
            except KeyError:
                u = tarinfo.uid
            try:
                if tarinfo.issym() and hasattr(os, "lchown"):
                    os.lchown(targetpath, u, g)
                else:
                    if sys.platform != "os2emx":
                        os.chown(targetpath, u, g)
            except EnvironmentError, e:
                raise ExtractError("could not change owner")

    def chmod(self, tarinfo, targetpath):
        """Set file permissions of targetpath according to tarinfo.
        """
        if hasattr(os, 'chmod'):
            try:
                os.chmod(targetpath, tarinfo.mode)
            except EnvironmentError, e:
                raise ExtractError("could not change mode")

    def utime(self, tarinfo, targetpath):
        """Set modification time of targetpath according to tarinfo.
        """
        if not hasattr(os, 'utime'):
            return
        try:
            os.utime(targetpath, (tarinfo.mtime, tarinfo.mtime))
        except EnvironmentError, e:
            raise ExtractError("could not change modification time")

    #--------------------------------------------------------------------------
    def next(self):
        """Return the next member of the archive as a TarInfo object, when
           TarFile is opened for reading. Return None if there is no more
           available.
        """
        self._check("ra")
        if self.firstmember is not None:
            m = self.firstmember
            self.firstmember = None
            return m

        # Read the next block.
        self.fileobj.seek(self.offset)
        tarinfo = None
        while True:
            try:
                tarinfo = self.tarinfo.fromtarfile(self)
            except EOFHeaderError, e:
                if self.ignore_zeros:
                    self._dbg(2, "0x%X: %s" % (self.offset, e))
                    self.offset += BLOCKSIZE
                    continue
            except InvalidHeaderError, e:
                if self.ignore_zeros:
                    self._dbg(2, "0x%X: %s" % (self.offset, e))
                    self.offset += BLOCKSIZE
                    continue
                elif self.offset == 0:
                    raise ReadError(str(e))
            except EmptyHeaderError:
                if self.offset == 0:
                    raise ReadError("empty file")
            except TruncatedHeaderError, e:
                if self.offset == 0:
                    raise ReadError(str(e))
            except SubsequentHeaderError, e:
                raise ReadError(str(e))
            break

        if tarinfo is not None:
            self.members.append(tarinfo)
        else:
            self._loaded = True

        return tarinfo

    #--------------------------------------------------------------------------
    # Little helper methods:

    def _getmember(self, name, tarinfo=None, normalize=False):
        """Find an archive member by name from bottom to top.
           If tarinfo is given, it is used as the starting point.
        """
        # Ensure that all members have been loaded.
        members = self.getmembers()

        # Limit the member search list up to tarinfo.
        if tarinfo is not None:
            members = members[:members.index(tarinfo)]

        if normalize:
            name = os.path.normpath(name)

        for member in reversed(members):
            if normalize:
                member_name = os.path.normpath(member.name)
            else:
                member_name = member.name

            if name == member_name:
                return member

    def _load(self):
        """Read through the entire archive file and look for readable
           members.
        """
        while True:
            tarinfo = self.next()
            if tarinfo is None:
                break
        self._loaded = True

    def _check(self, mode=None):
        """Check if TarFile is still open, and if the operation's mode
           corresponds to TarFile's mode.
        """
        if self.closed:
            raise IOError("%s is closed" % self.__class__.__name__)
        if mode is not None and self.mode not in mode:
            raise IOError("bad operation for mode %r" % self.mode)

    def _find_link_target(self, tarinfo):
        """Find the target member of a symlink or hardlink member in the
           archive.
        """
        if tarinfo.issym():
            # Always search the entire archive.
            linkname = "/".join(filter(None, (os.path.dirname(tarinfo.name), tarinfo.linkname)))
            limit = None
        else:
            # Search the archive before the link, because a hard link is
            # just a reference to an already archived file.
            linkname = tarinfo.linkname
            limit = tarinfo

        member = self._getmember(linkname, tarinfo=limit, normalize=True)
        if member is None:
            raise KeyError("linkname %r not found" % linkname)
        return member

    def __iter__(self):
        """Provide an iterator object.
        """
        if self._loaded:
            return iter(self.members)
        else:
            return TarIter(self)

    def _dbg(self, level, msg):
        """Write debugging output to sys.stderr.
        """
        if level <= self.debug:
            print >> sys.stderr, msg

    def __enter__(self):
        self._check()
        return self

    def __exit__(self, type, value, traceback):
        if type is None:
            self.close()
        else:
            # An exception occurred. We must not call close() because
            # it would try to write end-of-archive blocks and padding.
            if not self._extfileobj:
                self.fileobj.close()
            self.closed = True
# class TarFile

TarFile 源码

shelve模块

shelve模块是一个简单的k,v将内存数据通过文件持久化的模块，可以持久化任何pickle可支持的python数据格式，但仅支持pickle,对pickle的上一层的封装（不用担心dumps多次造成累积的问题）

#!/usr/bin/env python
# -*- coding:utf-8 -*-
# Author:Uson

import shelve, datetime
# 仅支持pickle
info = {
    'name': 'uson',
    'age': 27,
    'gender': 'M',
    'job': 'IT',
}
addr = ['SH', 'BJ', 'HF']
# d = shelve.open('review', 'w') # 打开一个文件 # dbm.error: need 'c' or 'n' flag to open new db
d = shelve.open('review') # 打开一个文件，最后会创建三个文件.bak .dat .dir
d['personInfo'] = info # 持久化字典
d['addr'] = addr        # 持久化列表
d['date'] = datetime.datetime.now()
d.close()

d = shelve.open('review')
print(d.get('personInfo'))
print(d.get('date'))
print(d.get('addr'))
'''
{'age': 27, 'job': 'IT', 'gender': 'M', 'name': 'uson'}
2019-09-13 19:57:17.561936
['SH', 'BJ', 'HF']
'''

import shelve
 
d = shelve.open('shelve_test') #打开一个文件
 
class Test(object):
    def __init__(self,n):
        self.n = n
 
 
t = Test(123) 
t2 = Test(123334)
 
name = ["alex","rain","test"]
d["test"] = name #持久化列表
d["t1"] = t      #持久化类
d["t2"] = t2
 
d.close()

示例2

Xml模块

xml是实现不同语言或程序之间进行数据交换的协议，跟json差不多，但json使用起来更简单，不过，古时候，在json还没诞生的黑暗年代，大家只能选择用xml呀，至今很多传统公司如金融行业的很多系统的接口还主要是xml。

xml的格式如下，就是通过<>节点来区别数据结构的:

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank updated="yes">2</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank updated="yes">5</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank updated="yes">69</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>

xml模板文件.xml

xml协议在各个语言里的都是支持的，在python中可以用以下模块操作xml：

#!/usr/bin/env python
# -*- coding:utf-8 -*-
# Author:Uson

import xml.etree.ElementTree as ET

tree = ET.parse("xmltest.xml")  # 将xmltest.xml解析成python可以识别的语言
root = tree.getroot()           # 再通过getroot方法得到整个文件对象
print(root.tag)                 # obj.tag 类似于前端语法  root.tag: data标签，相当于html标签

# 遍历xml文档
for child in root:              # child: 所有孩子的标签，不包含孙子以及以下的标签
    # print(child.tag, child.attrib)
    '''
    country {'name': 'Liechtenstein'}
    country {'name': 'Singapore'}
    country {'name': 'Panama'}
    '''
    for i in child:             # i：孩子的孩子的标签
        # print(i.tag, i.text)    # 仅能获取到标签和标签内容
        print(i.tag, i.text, i.attrib)    # tag：标签，text：标签内容，attrib：标签属性

# 只遍历year 节点
for node in root.iter('year'): # 迭代所有孩子的year标签
    print(node.tag, node.text)

python处理-遍历xml内容

import xml.etree.ElementTree as ET

tree = ET.parse("xmltest.xml")      # 将xmltest.xml解析成python可以识别的语言
root = tree.getroot()               # 再通过getroot方法得到整个文件对象

# 修改
for node in root.iter('year'):      # 迭代所有孩子的year标签
    new_year = int(node.text) + 1
    node.text = str(new_year)
    node.set("python", "uson")      # 给标签设置属性set()

tree.write("xmltest.xml")           # 写回原文件

# 删除node
for country in root.findall('country'):     # 找到所有的country标签循环处理每一个标签
    rank = int(country.find('rank').text)
    if rank > 50:
        root.remove(country)

tree.write('output.xml')            # 写进了新文件

xml的修改和删除

#!/usr/bin/env python
# -*- coding:utf-8 -*-
# Author:Uson

'''
import xml.etree.ElementTree as ET
tree = ET.parse("xmltest.xml")  # 将xmltest.xml解析成python可以识别的语言
root = tree.getroot()           # 再通过getroot方法得到整个文件对象
print(root.tag)                 # obj.tag 类似于前端语法  root.tag: data标签，相当于html标签

# 遍历xml文档
for child in root:              # child: 所有孩子的标签，不包含孙子以及以下的标签
    # print(child.tag, child.attrib)
    # country {'name': 'Liechtenstein'}
    # country {'name': 'Singapore'}
    # country {'name': 'Panama'}
    for i in child:             # i：孩子的孩子的标签
        # print(i.tag, i.text)    # 仅能获取到标签和标签内容
        print(i.tag, i.text, i.attrib)    # tag：标签，text：标签内容，attrib：标签属性

# 只遍历year 节点
for node in root.iter('year'): # 迭代所有孩子的year标签
    print(node.tag, node.text)
'''

# 修改和删除
'''
import xml.etree.ElementTree as ET

tree = ET.parse("xmltest.xml")      # 将xmltest.xml解析成python可以识别的语言
root = tree.getroot()               # 再通过getroot方法得到整个文件对象

# 修改
for node in root.iter('year'):      # 迭代所有孩子的year标签
    new_year = int(node.text) + 1
    node.text = str(new_year)
    node.set("python", "uson")      # 给标签设置属性set()

tree.write("xmltest.xml")           # 写回原文件

# 删除node
for country in root.findall('country'):     # 找到所有的country标签循环处理每一个标签
    rank = int(country.find('rank').text)
    if rank > 50:
        root.remove(country)

tree.write('output.xml')            # 写进了新文件
'''

# 创建
import xml.etree.ElementTree as ET

new_xml = ET.Element("personlist")           # 创建根标签
person = ET.SubElement(new_xml, "person", attrib={"enrolled": "yes"})   # 根标签personlist下创建孩子标签person
name = ET.SubElement(person, "name", attrib={"create": "uson"})
age = ET.SubElement(person, "age", attrib={"checked": "no"})
sex = ET.SubElement(person, "sex")
age.text = '27'
name.text = 'Uson'
# person2 = ET.SubElement(new_xml, "person", attrib={"enrolled": "no"}) # person2是变量，不是标签名
person = ET.SubElement(new_xml, "person", attrib={"enrolled": "no"})    # 可以全部使用同一变量名person
name = ET.SubElement(person, "name")
age = ET.SubElement(person, "age")
age.text = '19'

et = ET.ElementTree(new_xml)  # 生成文档对象
et.write("test.xml", encoding="utf-8", xml_declaration=True)      
# xml_declaration=True：声明是xml文件类型<?xml version='1.0' encoding='utf-8'?>

ET.dump(new_xml)  # 打印生成的格式

创建xml.xml

<?xml version='1.0' encoding='utf-8'?>
<personlist>
    <person enrolled="yes">
        <name create="uson">Uson</name>
        <age checked="no">27</age>
        <sex /></person>
    <person enrolled="no">
        <name />
        <age>19</age>
    </person>
</personlist>

xml文件创建的结果内容

yaml模块：主要用来做配置文件的

类似于json，load出来后是个字典

参考：https://pyyaml.org/wiki/PyYAMLDocumentation

ConfigParser模块：

用于生成和修改常见配置文档，当前模块的名称在 python 3.x 版本中变更为 configparser（常用于mysql,ngix）。

来看一个好多软件的常见文档格式如下

[DEFAULT]
ServerAliveInterval = 45
Compression = yes
CompressionLevel = 9
ForwardX11 = yes
 
[bitbucket.org]
User = hg
 
[topsecret.server.com]
Port = 50022
ForwardX11 = no

如果想用python生成一个这样的文档怎么做呢？

import configparser
 
config = configparser.ConfigParser()
config["DEFAULT"] = {'ServerAliveInterval': '45',
                      'Compression': 'yes',
                     'CompressionLevel': '9'}
 
config['bitbucket.org'] = {}
config['bitbucket.org']['User'] = 'hg'
config['topsecret.server.com'] = {}
topsecret = config['topsecret.server.com']
topsecret['Host Port'] = '50022'     # mutates the parser
topsecret['ForwardX11'] = 'no'  # same here
config['DEFAULT']['ForwardX11'] = 'yes'
with open('example.ini', 'w') as configfile:
   config.write(configfile)

创建

写完了还可以再读出来哈。

>>> import configparser
>>> config = configparser.ConfigParser()
>>> config.sections()
[]
>>> config.read('example.ini')
['example.ini']
>>> config.sections()
['bitbucket.org', 'topsecret.server.com']
>>> 'bitbucket.org' in config
True
>>> 'bytebong.com' in config
False
>>> config['bitbucket.org']['User']
'hg'
>>> config['DEFAULT']['Compression']
'yes'
>>> topsecret = config['topsecret.server.com']
>>> topsecret['ForwardX11']
'no'
>>> topsecret['Port']
'50022'
>>> for key in config['bitbucket.org']: print(key)
...
user
compressionlevel
serveraliveinterval
compression
forwardx11
>>> config['bitbucket.org']['ForwardX11']
'yes'

命令行读的内容

configparser增删改查语法

[section1]
k1 = v1
k2:v2
  
[section2]
k1 = v1
 
import ConfigParser
  
config = ConfigParser.ConfigParser()
config.read('i.cfg')
  
# ########## 读 ##########
#secs = config.sections()
#print secs
#options = config.options('group2')
#print options
  
#item_list = config.items('group2')
#print item_list
  
#val = config.get('group1','key')
#val = config.getint('group1','key')
  
# ########## 改写 ##########
#sec = config.remove_section('group1')
#config.write(open('i.cfg', "w"))
  
#sec = config.has_section('wupeiqi')
#sec = config.add_section('wupeiqi')
#config.write(open('i.cfg', "w"))
  
  
#config.set('group2','k1',11111)
#config.write(open('i.cfg', "w"))
  
#config.remove_option('group2','age')
#config.write(open('i.cfg', "w"))

增删改查

hashlib模块（字典是哈希做的）

用于加密相关的操作，3.x里代替了md5模块和sha模块，主要提供 SHA1, SHA224, SHA256, SHA384, SHA512 ，MD5 算法

加密，不能反解

import sha

hash = sha.new()
hash.update('admin')
print hash.hexdigest()

sha - 废弃

import md5
hash = md5.new()
hash.update('admin')
print hash.hexdigest()

md5 - 废弃

import hashlib
 
m = hashlib.md5()
m.update(b"Hello")
m.update(b"It's me")
print(m.digest())
m.update(b"It's been a long time since last time we ...")
print(m.hexdigest())  # 16进制格式hash
# a0e9894503cb9f1a14aa073f3caefaa5

m2 = hashlib.md5()
m2.update(b"HelloIt's meIt's been a long time since last time we ...")
print(m.hexdigest())  # 16进制格式hash
# a0e9894503cb9f1a14aa073f3caefaa5
 
print(m.digest()) #2进制格式hash
print(len(m.hexdigest())) #16进制格式hash

'''
def digest(self, *args, **kwargs): # real signature unknown
    """ Return the digest value as a string of binary data. """
    pass
 
def hexdigest(self, *args, **kwargs): # real signature unknown
    """ Return the digest value as a string of hexadecimal digits. """
    pass
 
'''
import hashlib
 
#越复杂越安全，但效率越低
# ######## md5 ########
 
hash = hashlib.md5()
hash.update('admin')
print(hash.hexdigest())
 
# ######## sha1 即将淘汰########
 
hash = hashlib.sha1()
hash.update('admin')
print(hash.hexdigest())　　可以：hashlib.sha1('admin').hexdigest()
 
# ######## sha256 (新)########
 
hash = hashlib.sha256()
hash.update('admin')
print(hash.hexdigest())
 
 
# ######## sha384 ########
 
hash = hashlib.sha384()
hash.update('admin')
print(hash.hexdigest())
 
# ######## sha512 (新)########
 
hash = hashlib.sha512()
hash.update('admin')
print(hash.hexdigest())

hmac模块（双层加密）

它内部对我们创建 key 和内容再进行处理然后再加密：hmac.new(key, value)

散列消息鉴别码，简称HMAC，是一种基于消息鉴别码MAC（Message Authentication Code）的鉴别机制。使用HMAC时,消息通讯的双方，通过验证消息中加入的鉴别密钥K来鉴别消息的真伪；

一般用于网络通信中消息加密，前提是双方先要约定好key,就像接头暗号一样，然后消息发送把用key把消息加密，接收方用key ＋消息明文再加密，拿加密后的值跟发送者的相对比是否相等，这样就能验证消息的真实性，及发送者的合法性了。

import hmac
h = hmac.new(b'name', b'Uson')
print(h.hexdigest())

h = hmac.new(b'name', b'Uson')
h.update(b'hellowo')  # 追加到value里面
print(h.hexdigest())

h = hmac.new(b'name', b'Usonhellowo')
print(h.hexdigest())
# 3e7fbc4012a9454baa43f07a3c5010cf
# f34b270493c17b4d6247546b645e411b
# f34b270493c17b4d6247546b645e411b

h = hmac.new('天王盖地虎'.encode(encoding='utf-8'), '宝塔镇河妖'.encode(encoding='utf-8'))
print(h.hexdigest())
# 5f90dcd2211cd11601ce05195e3c5232

re模块（正则表达式）

只要有返回，即匹配到，否则，未匹配到。

常用正则表达式符号:

'.'     默认匹配除\n之外的任意一个字符，若指定flag DOTALL,则匹配任意字符，包括换行
'^'     匹配字符开头，若指定flags MULTILINE,这种也可以匹配上(r"^a","\nabc\neee",flags=re.MULTILINE)
'$'     匹配字符结尾，或e.search("foo$","bfoo\nsdfsf",flags=re.MULTILINE).group()也可以
'*'     匹配*号前的字符0次或多次，re.findall("ab*","cabb3abcbbac")  结果为['abb', 'ab', 'a']
'+'     匹配前一个字符1次或多次，re.findall("ab+","ab+cd+abb+bba") 结果['ab', 'abb']
'?'     匹配前一个字符1次或0次
'{m}'   匹配前一个字符m次
'{n,m}' 匹配前一个字符n到m次，re.findall("ab{1,3}","abb abc abbcbbb") 结果'abb', 'ab', 'abb']
'|'     匹配|左或|右的字符，re.search("abc|ABC","ABCBabcCD").group() 结果'ABC'
'(...)' 分组匹配，re.search("(abc){2}a(123|456)c", "abcabca456c").group() 结果 abcabca456c
 
 
'\A'    只从字符开头匹配，re.search("\Aabc","alexabc") 是匹配不到的
'\Z'    匹配字符结尾，同$
'\d'    匹配数字0-9
'\D'    匹配非数字
'\w'    匹配[A-Za-z0-9]
'\W'    匹配非[A-Za-z0-9]
's'     匹配空白字符、\t、\n、\r , re.search("\s+","ab\tc1\n3").group() 结果 '\t'
 
'(?P<name>...)' 分组匹配 re.search("(?P<province>[0-9]{4})(?P<city>[0-9]{2})(?P<birthday>[0-9]{4})","371481199306143242").groupdict("city") 结果{'province': '3714', 'city': '81', 'birthday': '1993'}

View Code

最常用的匹配语法

re.match 从头开始匹配，^可有可无  获取结果：.group()
re.search 匹配包含  获取结果：.group()  任意位置匹配
re.findall 把所有匹配到的字符放到以列表中的元素返回
re.splitall 以匹配到的字符当做列表分隔符
re.sub      匹配字符并替换

反斜杠的困扰
与大多数编程语言相同，正则表达式里使用"\"作为转义字符，这就可能造成反斜杠困扰。假如你需要匹配文本中的字符"\"，那么使用编程语言表示的正则表达式里将需要4个反斜杠"\\\\"：前两个和后两个分别用于在编程语言里转义成反斜杠，转换成两个反斜杠后再在正则表达式里转义成一个反斜杠。Python里的原生字符串很好地解决了这个问题，这个例子中的正则表达式可以使用r"\\"表示。同样，匹配一个数字的"\\d"可以写成r"\d"。有了原生字符串，你再也不用担心是不是漏写了反斜杠，写出来的表达式也更直观。

仅需轻轻知道的几个匹配模式：

re.I(re.IGNORECASE): 忽略大小写（括号内是完整写法，下同）
M(MULTILINE): 多行模式，改变'^'和'$'的行为（参见上图）
S(DOTALL): 点任意匹配模式，改变'.'的行为

小结：

^：字符串的开头，$：整个字符串的结尾，只要有.在，$就发挥不了作用

备注:【^：用在[ ]内时，表示的是补集（查源码得到的），也可理解为：匹配字符排除[ ]内部元素，遇到就截断，但可从中间开始匹配】

re.search("[^()]+", "20.3+((2.9-20.2)*(5.1/2))") # 20.3+ 分析：^+()=字符串

re.search("\([^(]+", "20.3+((2.9-20.2)*(5.1/2))") # (2.9-20.2)*

.：匹配任意字符，除了\n re.search("(\d./)+", "(10.15t/5+3)*4") # 5t/ 单数字+任意字符+/的组合

[a-z]:只匹配一个字符

[a-zA-Z]:只匹配一个大小写字符

[0-9]{1,3}：匹配1到3个0-9的数字

空格也不可以随便打，它会当做字符匹配的

match:取值group()

search:取值group()，字典groupdict() re.search("$([^()]+)$", "20.3+((2.9-20.2)*(5.1/2))").groups() #('2.9-20.2',) 元组 ()

findall:列表，没有group()方法

split：按条件分割成列表，没有group()方法

sub： sub("原值", "新值", "字符串", count=替换次数) 字符串替换，没有group()方法

运算符：

+：在()里需要转义，在[ ]无需转义； re.search("(\d\+)+", "(1+3+6)*4") # 1+3+ re.search("[\d+]+", "(1+3+6)*4") # 1+3+6

-：均无需转义；

*：只在[ ]中，无需转义； re.search("[\d*]+", "(1*3.3)*4") # 1*3

/：均无需转义；re.search("[\d/]+","(10/5+3)") # 10/5 或关系:多数字或多除号 re.search("(\d/)+","(10/5+3)") #0/ 多个单数字和单除号的组合

.：小数点，只在[ ]，无需转义； re.search("[\d./]+", "(10.15/5+3)*4") # 10.15/5 或关系

备注：[ ]内+-同用时，需转义； re.search("[+\-*/]+", "10+2-20*5/2") # + 或关系或调序 re.search("[+*-/]+", "10+2-20*5/2")# +

运算符的整体匹配 re.search("([\d.]+|\+|\*|-|/)+", "10.345+2.829-20.23*5.1/2") # 10.345+2.829-20.23*5.1/2

()：经历了这么久的前戏，高潮必定会来，括号问题，迎刃而解

re.search("$([^()]+)$", "20.3+((2.9-20.2)*(5.1/2))") # 做了分组处理

[ ]：或关系，匹配其中的一个 res = re.search("[\d.]", "(1.2+3.3)*4") # 1 res = re.search("[\d\+]", "(1+3.3)*4") # 1

[ ]：或关系，匹配其中的多个 res = re.search("[\d\+]+", "(1+3.3)*4") # 1+3 res = re.search("[\d+]+", "(1+3+6)*4") # 1+3+6

()：分组，组合匹配 res = re.search("(\d\.)", "(1+3.3)*4") # 3. res = re.search("(\d\.)", "(1+3)*4") # None

{}：匹配次数

\A：只从字符串开头匹配，类似于^，字符可以是数字

\Z：同$，我也不知道为什么要它？可能是装逼字符，迷糊大家的吧！

\d：匹配数字0-9

\D：匹配非数字，即除数字外的任意字符，包括特殊字符和\n

\w：只匹配大小写字母和数字，即除了特殊字符外

\W：只匹配特殊字符，即除了大小写字母和数字

\s：博客代码有误，不是s，而是\s，匹配空白字符、\t、\n、\r...

\S：匹配非空白字符、\t、\n、\r...

re.I：忽略大小写 res = re.search("[a-z]+", "usOn42d88cohui", flags=re.I) # usOn

re.M：多行匹配，改变^、$的行为 res = re.search("8$", "usOn\n42d88\ncohui", flags=re.M) # 8

re.S：使得.匹配任意字符包括\n res = re.search(".+", "\nabc\neee", flags=re.S) # match='\nabc\neee'

+：匹配一次或多次

*：匹配前一个字符0次或多次 re.search("[a-z]+\d*", "12uson") # uson

?符号解析: 匹配?前一个字符1次或0次（？前一个字符可有可无），从前往后仅匹配一次，每次匹配的是几个条件字符串，需要根据匹配几次的结果来确定。

(?P<name>[]{})：分组匹配，字典形式，取值方式有3种：1）.groupdict()；2）.groupdict()["key"]；3）.group("key")。

最后献上最全的示例代码：

#!/usr/bin/env python
# Author：Uson

import re
# python中未匹配到，会报错，命令行匹配不到无返回，但不报错

res = re.match("^yu", "yuxuesong")   # 匹配以yu开头的字符串
print(res.group()) # 取值：只匹配到yu

# res = re.match("^yu\d+", "yuxuesong123")   # 匹配以yu开头后面是数字的字符串 \是正则的语法格式，d表示数字，+表示多个
# print(res.group()) # 未匹配到
res = re.match("^yu\d+", "yu123xuesong123")
print(res.group()) # 匹配到yu123

# res = re.search("^x.+g$]", "yu123xueSong123") # ^字符串的开头，$字符串的结尾 .+匹配任意字符
# print(res) # None
res = re.search("x.+2", "yu123xueSong123") # ^字符串的开头，$字符串的结尾
print(res.group()) # xueSong12
res = re.search("x.+b$", "yu123xueSong123b") # ^字符串的开头，$字符串的结尾
print(res.group()) # xueSong123b
res = re.search("x[a-zA-Z]+", "yu123xueSong123")
print(res.group()) # xueSong
res = re.search("x[a-zA-Z]+\d+$", "yu123xueSong123") # 多个数字结尾，而不是一个数字结尾
print(res.group()) # xueSong123
res = re.search("x[a-zA-Z]+\d+$", "yu123xueSong3") # 一个数字结尾  +：一个或者多个
print(res.group()) # xueSong3

# ?符号解析: 匹配?前一个字符1次或多次，从前往后仅匹配一次，每次匹配的是几个条件字符串，需要根据匹配几次的结果来确定
res = re.search("u?", "usonuu")     # 匹配1次-u，得到第一个u
res = re.search("uu?", "usonuu")    # 匹配1次-uu，得到末尾uu
res = re.search("uu?", "uusonuu")   # 匹配1次-uu，得到开头uu
res = re.search("u?", "son")        # 匹配0次-''，得到'',也算匹配到了
res = re.search("uuu?", "uuson")    # 匹配1次-uuu,未匹配到，但匹配0次-uu，得到开头uu
res = re.search("uuu?", "uuuuuuson")# 匹配1次-uuu，得到开头uuu
res = re.search("uuu?", "uusonuuu")# 匹配0次-uu，得到开头uu   开头优先匹配(即?两个u必须有，第三个可有可无)
res = re.search("uuu?", "sonuuu")# 匹配1次-uuu，得到末尾uuu
res = re.search("uuu?", "usonuuu")# 匹配1次-uuu，得到末尾uuu
res = re.search("uso?", "usson")# 匹配0次-us，得到开头us

res = re.search("[0-9]{3}", "us2so123n")# 123   # 匹配3个数
res = re.search("[0-9]{1,3}", "us2465so183n")# 246 匹配1到3个数
res = re.search("[0-9]{4,5}", "us2465so183n")# 2465 匹配4到5个数

res = re.search("abc|ABC", "ABCwewabc")# ABC 先匹配abc,第一个没有匹配到，匹配ABC，返回

res = re.search("abc{2}", "ABCwewabcc")# abcc c匹配2次
# res = re.search("(abc){2}", "ABCabcdabc")# None
res = re.search("(abc){2}", "ABCabcabc")# abcabc abc两次连续，即abcabc匹配

# 管道符是转义符（或）
# res = re.search("(abc){2}|", "ABCabcabc|")# None    匹配abcabc| 或'', 开头满足或的第二个条件，返回None
# 管道符不转义（需要当做字符显示出来，非或）
res = re.search("(abc){2}\|", "ABCabcabc|")#  abcabc|

# 匹配abcabc|| 或 abcabc== 或 abcabc|=
res = re.search("(abc){2}(\||=){2}", "ABCabcabc||")#  abcabc||
res = re.search("(abc){2}(\||=){2}", "ABCabcabc==")#  abcabc==
res = re.search("(abc){2}(\||=){2}", "ABCabcabc|=")#  abcabc|=
res = re.search("(abc){2}(\||=){2}", "ABCabcabc|")#  None

res = re.search("(abc){2}", "(abcabc)") 
res1 = res.groups() # ('abc',)
res2 = res.group()  # abcabc

# = 可以不转义
res = re.search("(abc){2}(\||\=){2}", "ABCabcabc|=")#  abcabc|=
res = re.search("(abc){2}(\||=){2}", "ABCabcabc|=")#  abcabc|=
res = re.search("(abc){2}(\|\|=){2}", "ABCabcabc||=||=")#  abcabc||=||=

# \A 只从字符串开头匹配
res = re.search("\A", "ABCabcabc||=||=")#  None
res = re.search("\A[0-9]+[a-z]\Z", "34byu") # None 不是以单个字母结尾
res = re.search("\A[a-zA-Z]+", "ABCabcabc||=||=")#  ABCabcabc
res = re.search("\A[a-z]+", "ABCabcabc||=||=")#  None
res = re.search("\A[A-Z]+", "ABCabcabc||=||=")#  ABC
res = re.search("\A[A-Z]+.+", "ABCabcabc||=||=")#  ABCabcabc||=||=

# \Z 同$
res = re.search("[A-Z]\Z", "ABCabcabc||=||=")# None
res = re.search("=\Z", "ABCabcabc||=||=")# =

res = re.search("\A[0-9]+[a-z]+\Z", "68ABCabcabc")# None
res = re.search("\A[0-9]+[a-z]+\Z", "68abcabc")# 68abcabc
res = re.search("\A[0-9]+[a-z]+\Z", "68abc")# 68abc

# 匹配除数字外的任意字符
res = re.search("\D+", "68abc")# abc
res = re.search("\D+", "68abc$- &#\n")# abc$- &#\n（换一行）
res = re.search("\D+", "68abc$- &#\\n")# abc$- &#\n

# 只匹配字母和数字，即除了特殊字符外
res = re.search("\w+", "68bAc$- &#\\n")# 68bAc
# 只匹配特殊字符
res = re.search("\W+", "68bAc$- &#\\n") # $- &#\
res = re.search("\W+", "68bAc$- &#\n")  # $- &#\n（换一行）

# \s 匹配空白字符、\r、\n、\t
res = re.search("\s", "68bAc$- &#\n")  # ''（空格）
res = re.search("\s+", "aa\na\r67") # <_sre.SRE_Match object; span=(2, 3), match='\n'> \r被截断
res = re.search("\s+", "68bAc$- \t&#")  # ''（空格）和\t
res = re.search("\s+", "\n \t&#")  # \n（换行）和''（空格）和\t
print(res.group())
res = re.search("\s+", " \t\n")  # ''（空格）
print(res)  # <_sre.SRE_Match object; span=(0, 3), match=' \t\n'>

# \S 匹配非空白字符、\r、\n、\t
res = re.search("\S+", "68bAc$- &#\t")  # 68bAc$-
res = re.search("\S+", "# 68bAc$- &")  # #
print(res.group())

res = re.findall("[0-9]{1,4}", "u6s2465so183n")# ['6', '2465', '183'] 匹配所有1到4个的数字
res = re.findall("[0-9]{1}", "u6s24so183n")# ['6', '2', '4', '1', '8', '3'] 匹配所有1个的数字
res = re.findall("[0-9]{2}", "u6s24so183n")# ['24', '18'] 匹配所有2个的数字

res = re.findall("abc|ABC", "ABCwewabc")# ['ABC', 'abc'] 全匹配
print(res)

# 高级装逼技巧（分组匹配） # 常用于django前端的url匹配模式
res = re.search("(?P<name>[0-9]+)", "#qwq123")
print("高级装逼技巧：", res) # <_sre.SRE_Match object; span=(4, 7), match='123'>
print("高级装逼技巧：", res.group()) # 123
print("高级装逼技巧：", res.groupdict()) # {'name': '123'}

res = re.search("(?P<name>[0-9]{2})", "#qwq123")
print("高级装逼技巧：", res.groupdict()) # {'name': '12'}

res = re.search("(?P<id>[0-9]{2})(?P<name>[a-zA-Z]+)", "#qwq123uson#akaedu") # 注意分组的方式
print("高级装逼技巧：", res.groupdict()) # {'name': 'uson', 'id': '23'}
print(res.groupdict()['name'])          # uson
print(res.group('id'))                  # 23
# 示例：分组匹配个人信息
res = re.search("(?P<addr>[a-zA-Z]{8})(?P<job>[A-Z]{2})(?P<born>[0-9]{4})", "ShanghaiIT1130") # addr:上海，job:IT，born:生日
print(res.groupdict())      # {'born': '1130', 'addr': 'Shanghai', 'job': 'IT'}
res = re.search("(?P<Province>[0-9]{2})(?P<LuAn>[0-9]{4})(?P<Born>[0-9]{4})", "3415001130")
print(res.groupdict())      # {'LuAn': '1500', 'Born': '1130', 'Province': '34'}

# split 按数字分割成列表
res = re.split("[0-9]", "uson6shang88hai99job6IT")  # ['uson', 'shang', '', 'hai', '', 'job', 'IT']
res = re.split("[0-9]+", "uson6shang88hai99job6IT")  # ['uson', 'shang', 'hai', 'job', 'IT']
print(res)

# sub 替换    sub("原值", "新值", "字符串", count=替换次数)
res = re.sub("[0-9]+", "|", "uson88Job66IT9Shanghai") # uson|Job|IT|Shanghai
res = re.sub("[0-9]+", "|", "uson88Job66IT9Shanghai", count=2) # uson|Job|IT9Shanghai
print(res)

# 反斜杠的匹配：4个\匹配字符串中的1个\,即r 2个反斜杠匹配1个\
#            8个\匹配字符串中的2个\,即r 4个反斜杠匹配2个\
res = re.search("\\\\", "uson\cohui")   # \
res = re.search(r"\\", "uson\cohui")   # \

res = re.search(r"\\d", "uson\\dcohui")   # \d
res = re.search(r"\\d", r"uson\dcohui")   # \d

res = re.search(r"\\\\d", "uson\\\\dcohui")   # \\d
res = re.search(r"\\\\d", r"uson\\dcohui")   # \\d

res = re.search("\\\\\\\\d", "uson\\\\\\\\dcohui")   # \\d
res = re.search("\\\\\\\\d", r"uson\\dcohui")   # \\d

# flags补充：1)re.I：忽略大小写；2)re.M：多行匹配，改变^、$的行为
res = re.search("[a-z]+", "usOn42d88cohui")   # us
res = re.search("[a-z]+", "usOn42d88cohui", flags=re.I)   # usOn

res = re.search("8$", "usOn\n42d88\ncohui", flags=re.M)   # 8
res = re.search(r"^a", "\nabc\neee", flags=re.M)   # a
res = re.search(r"b$", "\nabc\neee", flags=re.M)   # None
res = re.search("c$", "\nabc\neee", flags=re.M)   # c

res = re.search(".+", "\nabc\neee")   # abc
print(res.group())
res = re.search(".+", "\nabc\neee", flags=re.S)
print(res)  # <_sre.SRE_Match object; span=(0, 8), match='\nabc\neee'>

# 括号必须用\匹配 []：或关系，匹配其中一个, []+匹配多个; ()组合匹配
res = re.search("\(", "(1+3)*4")   # (
res = re.search("(\d\.)", "(1.0+3)*4")   # 1.
res = re.search("(\d\.)", "(1+3.3)*4")   # 3.
res = re.search("(\d\.)", "(1+3)*4")   # None

res = re.search("[\d\.]", "(1+3)*4")   # 1
res = re.search("[\d\.]", "(1.2+3.3)*4")   # 1
res = re.search("[\d.]", "(1.2+3.3)*4")   # 1
res = re.search("[\d\+]", "(1+3.3)*4")   # 1

res = re.search("[\d\+]+", "(1+3.3)*4")   # 1+3
res = re.search("(\d\+)+", "(1+3+6)*4")   # 1+3+

res = re.search("[\d+]+", "(1+3+6)*4")   # 1+3+6
res = re.search("[\d\+]+", "(1+3+6)*4")   # 1+3+6

res = re.search("(\d+)+", "(1+3+6)*4")   # 1
res = re.search("(\d\+)+", "(1+3+6)*4")   # 1+3+

res = re.search("(\d-)+", "(1-3-6)*4")   # 1-3-
res = re.search("[\d-]+", "(1-3-6)*4")   # 1-3-6

# * 匹配前一个字符0次或多次 * 只在[]可以不转义
res = re.search("(\d*)+", "(1*3-6)*4")   # None
res = re.search("[\d*]+", "(1*3-6)*4")   # 1*3
res = re.search("(\d*)+", "88*(11*63-6)*4")   # 88  = "\d*"
res = re.search("(\d*)+", "uson(11*63-6)*4")   # None
res = re.search("\d*", "uson11*63-6*4")   # None
res = re.search("\(\d*", "(11*63-6*4")   # (11
res = re.search("\(\d*", "()")   # (
res = re.search("[a-z]+\d*", "12uson")   # uson

# * 只在[]可以不转义
res = re.search("[\d\*]+", "(1*3.3)*4")   # 1*3
res = re.search("[\d*]+", "(1*3.3)*4")   # 1*3

res = re.search("[\d*]+", "(1.2*3.3)*4")   # 1 或关系

# / 除号 无需转义
res = re.search("[\d/]+", "(10/5+3)*4")   # 10/5 或关系:多个数字或多个除号
res = re.search("(\d/)+", "(10/5+3)*4")   # 0/ 多个，单个数字和单个除号的组合
res = re.search("(\d\/)+", "(10/5+3)*4")   # 0/ 多个，单个数字和单个除号的组合

# . 小数点 只在[]无需转义
res = re.search("(\d./)+", "(10.15t/5+3)*4")   # 5t/ 单数字+任意字符+/的组合
res = re.search("[\d./]+", "(10.15/5+3)*4")   # 10.15/5 或关系

# 加减乘除
res = re.search("(\+|-|\*|/)+", "10+2-20*5/2")   # + 或关系
res = re.search("[+\-*/]+", "10+2-20*5/2")   # + 或关系
res = re.search("[+*-/]+", "10+2-20*5/2")   # + 或关系

# 匹配整数或浮点数
res = re.search("[\d.]+", "10.345+2.829-20.23*5.1/2")   # 10.345  或关系

# 运算符的整体匹配
res = re.search("([\d.]+[+*-/])+", "10.345+2.829-20.23*5.1/2") # 10.345+2.829-20.23*5.1/
# ()作为一组，若匹配到（数字或+-*/）中的任意一个，就将其作为一组，添加到指定列表中，用于判断是否非法输入。
res = re.search("([\d.]+|\+|\*|-|/)+", "10.345+2.829-20.23*5.1/2") # 10.345+2.829-20.23*5.1/2

# 括号问题怎么解决
res = re.search("([\d.]+|\+|\*|-|/|\(|\))+", "10.3+(2.9-20.2)*5.1/2") # 10.3+(2.9-20.2)*5.1/2
res = re.search("[(]+", "20.3+((2.9-20.2)*(5.1/2))") # ((

# 【匹配字符排除[]内部元素，遇到就截断，但可以从中间开始匹配】,^在[]表示一个补集（查源码得到）：^+uson=字符串
res = re.search("[^uson]+", "20.3+((2.9-20.2)*(5.1/2))") # 20.3+((2.9-20.2)*(5.1/2))
res = re.search("(^\d+\()", "20(.3+((2.9-20.2)*(5.1/2))") # 20(

# 经过了这么久的前戏，高潮必然到来，括号问题迎刃而解
res = re.search("[^()]+", "20.3+((2.9-20.2)*(5.1/2))") # 20.3+  ^+()=字符串
res = re.search("\([^(]+", "20.3+((2.9-20.2)*(5.1/2))") # (2.9-20.2)*
res = re.search("\([^()]+", "20.3+((2.9-20.2)*(5.1/2))") # (2.9-20.2

res = re.search("\([^()]+\)", "20.3+((2.9-20.2)*(5.1/2))") # (2.9-20.2)
print(res.group())
res = re.search("\(([^()]+)\)", "20.3+((2.9-20.2)*(5.1/2))")
print(res.group())  # (2.9-20.2)
print(res.groups()) # ('2.9-20.2',) 元组

res = re.findall("\(([^()]+)\)", "20.3+((2.9-20.2)*(5.1/2))")
print(res) # 列表 ['2.9-20.2', '5.1/2']
res = re.findall("\([^()]+\)", "20.3+((2.9-20.2)*(5.1/2))")
print(res) # 列表 ['(2.9-20.2)', '(5.1/2)']

日志模块(logging)

很多程序都有记录日志的需求，并且日志中包含的信息即有正常的程序访问日志，还可能有错误、警告等信息输出，python的logging模块提供了标准的日志接口，你可以通过它存储各种格式的日志，logging的日志可以分为 debug(), info(), warning(), error() and critical() 5个级别，下面我们看一下怎么用。

最简单用法

import logging
 
logging.warning("user [alex] attempted wrong password more than 3 times")
logging.critical("server is down")
 
#输出
WARNING:root:user [alex] attempted wrong password more than 3 times
CRITICAL:root:server is down

看一下这几个日志级别分别代表什么意思

Level	When it’s used
`DEBUG`	Detailed information, typically of interest only when diagnosing problems.
`INFO`	Confirmation that things are working as expected.
`WARNING`	An indication that something unexpected happened, or indicative of some problem in the near future (e.g. ‘disk space low’). The software is still working as expected.
`ERROR`	Due to a more serious problem, the software has not been able to perform some function.
`CRITICAL`	A serious error, indicating that the program itself may be unable to continue running.

如果想把日志写到文件里，也很简单

import logging
 
logging.basicConfig(filename='example.log',level=logging.INFO)
logging.debug('This message should go to the log file')
logging.info('So should this')
logging.warning('And this, too')

其中下面这句中的level=loggin.INFO意思是，把日志纪录级别设置为INFO，也就是说，只有比日志是INFO或比INFO级别更高的日志才会被纪录到文件里，在这个例子，第一条日志是不会被纪录的，如果希望纪录debug的日志，那把日志级别改成DEBUG就行了。

logging.basicConfig(filename='example.log',level=logging.INFO)

感觉上面的日志格式忘记加上时间啦，日志不知道时间怎么行呢，下面就来加上!

import logging
logging.basicConfig(format='%(asctime)s %(message)s', datefmt='%m/%d/%Y %I:%M:%S %p')
logging.warning('is when this event was logged.')
 
#输出
12/12/2010 11:46:36 AM is when this event was logged.

日志格式

%(name)s	Logger的名字
%(levelno)s	数字形式的日志级别
%(levelname)s	文本形式的日志级别
%(pathname)s	调用日志输出函数的模块的完整路径名，可能没有
%(filename)s	调用日志输出函数的模块的文件名
%(module)s	调用日志输出函数的模块名
%(funcName)s	调用日志输出函数的函数名
%(lineno)d	调用日志输出函数的语句所在的代码行
%(created)f	当前时间，用UNIX标准的表示时间的浮点数表示
%(relativeCreated)d	输出日志信息时的，自Logger创建以来的毫秒数
%(asctime)s	字符串形式的当前时间。默认格式是 “2003-07-08 16:49:45,896”。逗号后面的是毫秒
%(thread)d	线程ID。可能没有
%(threadName)s	线程名。可能没有
%(process)d	进程ID。可能没有
%(message)s	用户输出的消息

如果想同时把log打印在屏幕和文件日志里，就需要了解一点复杂的知识了

Python 使用logging模块记录日志涉及四个主要类，使用官方文档中的概括最为合适：

logger提供了应用程序可以直接使用的接口；

handler将(logger创建的)日志记录发送到合适的目的输出；

filter提供了细度设备来决定输出哪条日志记录；

formatter决定日志记录的最终输出格式。

logger
每个程序在输出信息之前都要获得一个Logger。Logger通常对应了程序的模块名，比如聊天工具的图形界面模块可以这样获得它的Logger：
LOG=logging.getLogger(”chat.gui”)
而核心模块可以这样：
LOG=logging.getLogger(”chat.kernel”)

Logger.setLevel(lel):指定最低的日志级别，低于lel的级别将被忽略。debug是最低的内置级别，critical为最高
Logger.addFilter(filt)、Logger.removeFilter(filt):添加或删除指定的filter
Logger.addHandler(hdlr)、Logger.removeHandler(hdlr)：增加或删除指定的handler
Logger.debug()、Logger.info()、Logger.warning()、Logger.error()、Logger.critical()：可以设置的日志级别

handler

handler对象负责发送相关的信息到指定目的地。Python的日志系统有多种Handler可以使用。有些Handler可以把信息输出到控制台，有些Logger可以把信息输出到文件，还有些 Handler可以把信息发送到网络上。如果觉得不够用，还可以编写自己的Handler。可以通过addHandler()方法添加多个多handler
Handler.setLevel(lel):指定被处理的信息级别，低于lel级别的信息将被忽略
Handler.setFormatter()：给这个handler选择一个格式
Handler.addFilter(filt)、Handler.removeFilter(filt)：新增或删除一个filter对象
每个Logger可以附加多个Handler。接下来我们就来介绍一些常用的Handler：
1) logging.StreamHandler
使用这个Handler可以向类似与sys.stdout或者sys.stderr的任何文件对象(file object)输出信息。它的构造函数是：
StreamHandler([strm])
其中strm参数是一个文件对象。默认是sys.stderr
2) logging.FileHandler
和StreamHandler类似，用于向一个文件输出日志信息。不过FileHandler会帮你打开这个文件。它的构造函数是：
FileHandler(filename[,mode])
filename是文件名，必须指定一个文件名。
mode是文件的打开方式。参见Python内置函数open()的用法。默认是’a'，即添加到文件末尾。

3) logging.handlers.RotatingFileHandler
这个Handler类似于上面的FileHandler，但是它可以管理文件大小。当文件达到一定大小之后，它会自动将当前日志文件改名，然后创建一个新的同名日志文件继续输出。比如日志文件是chat.log。当chat.log达到指定的大小之后，RotatingFileHandler自动把文件改名为chat.log.1。不过，如果chat.log.1已经存在，会先把chat.log.1重命名为chat.log.2。。。最后重新创建 chat.log，继续输出日志信息。它的构造函数是：
RotatingFileHandler( filename[, mode[, maxBytes[, backupCount]]])
其中filename和mode两个参数和FileHandler一样。
maxBytes用于指定日志文件的最大文件大小。如果maxBytes为0，意味着日志文件可以无限大，这时上面描述的重命名过程就不会发生。
backupCount用于指定保留的备份文件的个数。比如，如果指定为2，当上面描述的重命名过程发生时，原有的chat.log.2并不会被更名，而是被删除。
4) logging.handlers.TimedRotatingFileHandler
这个Handler和RotatingFileHandler类似，不过，它没有通过判断文件大小来决定何时重新创建日志文件，而是间隔一定时间就自动创建新的日志文件。重命名的过程与RotatingFileHandler类似，不过新的文件不是附加数字，而是当前时间。它的构造函数是：
TimedRotatingFileHandler( filename [,when [,interval [,backupCount]]])
其中filename参数和backupCount参数和RotatingFileHandler具有相同的意义。
interval是时间间隔。
when参数是一个字符串。表示时间间隔的单位，不区分大小写。它有以下取值：
S 秒
M 分
H 小时
D 天
W 每星期（interval==0时代表星期一）
midnight 每天凌晨

import logging
 
#create logger
logger = logging.getLogger('TEST-LOG')
logger.setLevel(logging.DEBUG)
 
 
# create console handler and set level to debug
ch = logging.StreamHandler()
ch.setLevel(logging.DEBUG)
 
# create file handler and set level to warning
fh = logging.FileHandler("access.log")
fh.setLevel(logging.WARNING)
# create formatter
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
 
# add formatter to ch and fh
ch.setFormatter(formatter)
fh.setFormatter(formatter)
 
# add ch and fh to logger
logger.addHandler(ch)
logger.addHandler(fh)
 
# 'application' code
logger.debug('debug message')
logger.info('info message')
logger.warn('warn message')
logger.error('error message')
logger.critical('critical message')

文件自动截断例子

import logging

from logging import handlers

logger = logging.getLogger(__name__)

log_file = "timelog.log"
#fh = handlers.RotatingFileHandler(filename=log_file,maxBytes=10,backupCount=3)
fh = handlers.TimedRotatingFileHandler(filename=log_file,when="S",interval=5,backupCount=3)


formatter = logging.Formatter('%(asctime)s %(module)s:%(lineno)d %(message)s')

fh.setFormatter(formatter)

logger.addHandler(fh)


logger.warning("test1")
logger.warning("test12")
logger.warning("test13")
logger.warning("test14")

2、开源模块（如：paramiko...）

3、自定义模块（自己写的.py文件）

后记

math模块：