How can I create a zip archive of a directory structure in Python?
How can I create a zip archive of a directory structure in Python?
In Python 2.7+, shutil
has a make_archive
function.
from shutil import make_archive
make_archive(
'zipfile_name',
'zip', # the archive format - or tar, bztar, gztar
root_dir=None, # root for archive - current working dir if None
base_dir=None) # start archiving from here - cwd if None too
Here the zipped archive will be named zipfile_name.zip
. If base_dir
is farther down from root_dir
it will exclude files not in the base_dir
, but still archive the files in the parent dirs up to the root_dir
.
I did have an issue testing this on Cygwin with 2.7 - it wants a root_dir argument, for cwd:
make_archive('zipfile_name', 'zip', root_dir='.')
You can do this with Python from the shell also using the zipfile
module:
$ python -m zipfile -c zipname sourcedir
Where zipname
is the name of the destination file you want (add .zip
if you want it, it won't do it automatically) and sourcedir is the path to the directory.
If you're trying to zip up a python package with a __init__.py
and __main__.py
, and you don't want the parent dir, it's
$ python -m zipfile -c zipname sourcedir/*
And
$ python zipname
would run the package. (Note that you can't run subpackages as the entry point from a zipped archive.)
If you have python3.5+, and specifically want to zip up a Python package, use zipapp:
$ python -m zipapp myapp
$ python myapp.pyz
This function will recursively zip up a directory tree, compressing the files, and recording the correct relative filenames in the archive. The archive entries are the same as those generated by zip -r output.zip source_dir
.
import os
import zipfile
def make_zipfile(output_filename, source_dir):
relroot = os.path.abspath(os.path.join(source_dir, os.pardir))
with zipfile.ZipFile(output_filename, "w", zipfile.ZIP_DEFLATED) as zip:
for root, dirs, files in os.walk(source_dir):
# add directory (needed for empty dirs)
zip.write(root, os.path.relpath(root, relroot))
for file in files:
filename = os.path.join(root, file)
if os.path.isfile(filename): # regular files only
arcname = os.path.join(os.path.relpath(root, relroot), file)
zip.write(filename, arcname)
So many answers here, and I hope I might contribute with my own version, which is based on the original answer (by the way), but with a more graphical perspective, also using context for each zipfile
setup and sorting os.walk()
, in order to have a ordered output.
Having these folders and them files (among other folders), I wanted to create a .zip
for each cap_
folder:
$ tree -d
.
├── cap_01
| ├── 0101000001.json
| ├── 0101000002.json
| ├── 0101000003.json
|
├── cap_02
| ├── 0201000001.json
| ├── 0201000002.json
| ├── 0201001003.json
|
├── cap_03
| ├── 0301000001.json
| ├── 0301000002.json
| ├── 0301000003.json
|
├── docs
| ├── map.txt
| ├── main_data.xml
|
├── core_files
├── core_master
├── core_slave
Here's what I applied, with comments for better understanding of the process.
$ cat zip_cap_dirs.py
""" Zip 'cap_*' directories. """
import os
import zipfile as zf
for root, dirs, files in sorted(os.walk('.')):
if 'cap_' in root:
print(f"Compressing: {root}")
# Defining .zip name, according to Capítulo.
cap_dir_zip = '{}.zip'.format(root)
# Opening zipfile context for current root dir.
with zf.ZipFile(cap_dir_zip, 'w', zf.ZIP_DEFLATED) as new_zip:
# Iterating over os.walk list of files for the current root dir.
for f in files:
# Defining relative path to files from current root dir.
f_path = os.path.join(root, f)
# Writing the file on the .zip file of the context
new_zip.write(f_path)
Basically, for each iteration over os.walk(path)
, I'm opening a context for zipfile
setup and afterwards, iterating iterating over files
, which is a list
of files from root
directory, forming the relative path for each file based on the current root
directory, appending to the zipfile
context which is running.
And the output is presented like this:
$ python3 zip_cap_dirs.py
Compressing: ./cap_01
Compressing: ./cap_02
Compressing: ./cap_03
To see the contents of each .zip
directory, you can use less
command:
$ less cap_01.zip
Archive: cap_01.zip
Length Method Size Cmpr Date Time CRC-32 Name
-------- ------ ------- ---- ---------- ----- -------- ----
22017 Defl:N 2471 89% 2019-09-05 08:05 7a3b5ec6 cap_01/0101000001.json
21998 Defl:N 2471 89% 2019-09-05 08:05 155bece7 cap_01/0101000002.json
23236 Defl:N 2573 89% 2019-09-05 08:05 55fced20 cap_01/0101000003.json
-------- ------- --- -------
67251 7515 89% 3 files
You probably want to look at the zipfile
module; there's documentation at http://docs.python.org/library/zipfile.html.
You may also want os.walk()
to index the directory structure.
I prepared a function by consolidating Mark Byers' solution with Reimund and Morten Zilmer's comments (relative path and including empty directories). As a best practice, with
is used in ZipFile's file construction.
The function also prepares a default zip file name with the zipped directory name and '.zip' extension. Therefore, it works with only one argument: the source directory to be zipped.
import os
import zipfile
def zip_dir(path_dir, path_file_zip=''):
if not path_file_zip:
path_file_zip = os.path.join(
os.path.dirname(path_dir), os.path.basename(path_dir)+'.zip')
with zipfile.ZipFile(path_file_zip, 'wb', zipfile.ZIP_DEFLATED) as zip_file:
for root, dirs, files in os.walk(path_dir):
for file_or_dir in files + dirs:
zip_file.write(
os.path.join(root, file_or_dir),
os.path.relpath(os.path.join(root, file_or_dir),
os.path.join(path_dir, os.path.pardir)))
Modern Python (3.6+) using the pathlib module for concise OOP-like handling of paths, and pathlib.Path.rglob() for recursive globbing. As far as I can tell, this is equivalent to George V. Reilly's answer: zips with compression, the topmost element is a directory, keeps empty dirs, uses relative paths.
from pathlib import Path
from zipfile import ZIP_DEFLATED, ZipFile
from os import PathLike
from typing import Union
def zip_dir(zip_name: str, source_dir: Union[str, PathLike]):
src_path = Path(source_dir).expanduser().resolve(strict=True)
with ZipFile(zip_name, 'w', ZIP_DEFLATED) as zf:
for file in src_path.rglob('*'):
zf.write(file, file.relative_to(src_path.parent))
Note: as optional type hints indicate, zip_name
can't be a Path object (would be fixed in 3.6.2+).