Uncompress OpenOffice files for better storage in version control

时光怂恿深爱的人放手 提交于 2019-11-28 23:22:37

You may consider to store documents in FODT-format - flat XML format.
This is relatively new alternative solution available.

Document is just stored unzipped.

More info is available at https://wiki.documentfoundation.org/Libreoffice_and_subversion.

First, version control system you want to use should support hooks which are invoked to transform file from version in repository to the one in working area, like for example clean / smudge filters in Git from gitattributes.

Second, you can find such filter, instead of writing one yourself, for example rezip from "Management of opendocument (openoffice.org) files in git" thread on git mailing list (but see warning in "Followup: management of OO files - warning about "rezip" approach"),

You can also browse answers in "Tracking OpenOffice files/other compressed files with Git" thread, or try to find the answer inside "[PATCH 2/2] Add keyword unexpansion support to convert.c" thread.

Hope That Helps

Jason Grout

I've modified the python program in Craig McQueen's answer just a bit. Changes include:

  • Actually checking the return of testZip (according to the docs, it appears that the original program will happily proceed with a corrupt zip file past the checkzip step).

  • Rewrite the for-loop to check for already-uncompressed files to be a single if-statement.

Here is the new program:

#!/usr/bin/python
# Note, written for Python 2.6

import sys
import shutil
import zipfile

# Get a single command-line argument containing filename
commandlineFileName = sys.argv[1]

backupFileName = commandlineFileName + ".bak"
inFileName = backupFileName
outFileName = commandlineFileName
checkFilename = commandlineFileName

# Check input file
# First, check it is valid (not corrupted)
checkZipFile = zipfile.ZipFile(checkFilename)

if checkZipFile.testzip() is not None:
    raise Exception("Zip file is corrupted")

# Second, check that it's not already uncompressed
if all(f.compress_type==zipfile.ZIP_STORED for f in checkZipFile.infolist()):
    raise Exception("File is already uncompressed")

checkZipFile.close()

# Copy to "backup" file and use that as the input
shutil.copy(commandlineFileName, backupFileName)
inputZipFile = zipfile.ZipFile(inFileName)

outputZipFile = zipfile.ZipFile(outFileName, "w", zipfile.ZIP_STORED)

# Copy each input file's data to output, making sure it's uncompressed
for fileObject in inputZipFile.infolist():
    fileData = inputZipFile.read(fileObject)
    outFileObject = fileObject
    outFileObject.compress_type = zipfile.ZIP_STORED
    outputZipFile.writestr(outFileObject, fileData)

outputZipFile.close()

Here's another program I stumbled across: store_zippies_uncompressed by Mirko Friedenhagen.

The wiki also shows how to integrate it with Mercurial.

Craig McQueen

Here is a Python script that I've put together. It's had minimal testing so far. I've done basic testing in Python 2.6. But I prefer the idea of Python in general because it should abort with an exception if any error occurs, whereas a bash script may not.

This first checks that the input file is valid and not already uncompressed. Then it copies the input file to a "backup" file with ".bak" extension. Then it uncompresses the original file, overwriting it.

I'm sure there are things I've overlooked. Please feel free to give feedback.


#!/usr/bin/python
# Note, written for Python 2.6

import sys
import shutil
import zipfile

# Get a single command-line argument containing filename
commandlineFileName = sys.argv[1]

backupFileName = commandlineFileName + ".bak"
inFileName = backupFileName
outFileName = commandlineFileName
checkFilename = commandlineFileName

# Check input file
# First, check it is valid (not corrupted)
checkZipFile = zipfile.ZipFile(checkFilename)
checkZipFile.testzip()

# Second, check that it's not already uncompressed
isCompressed = False
for fileObject in checkZipFile.infolist():
    if fileObject.compress_type != zipfile.ZIP_STORED:
        isCompressed = True
if isCompressed == False:
    raise Exception("File is already uncompressed")

checkZipFile.close()

# Copy to "backup" file and use that as the input
shutil.copy(commandlineFileName, backupFileName)
inputZipFile = zipfile.ZipFile(inFileName)

outputZipFile = zipfile.ZipFile(outFileName, "w", zipfile.ZIP_STORED)

# Copy each input file's data to output, making sure it's uncompressed
for fileObject in inputZipFile.infolist():
    fileData = inputZipFile.read(fileObject)
    outFileObject = fileObject
    outFileObject.compress_type = zipfile.ZIP_STORED
    outputZipFile.writestr(outFileObject, fileData)

outputZipFile.close()

This is in a Mercurial repository in BitBucket.

If you don't need the storage savings, but just want to be able to diff OpenOffice.org files stored in your version control system, you can use the instructions on the oodiff page, which tells how to make oodiff the default diff for OpenDocument formats under git and mercurial. (It also mentions SVN, but it's been so long since I used SVN regularly I'm not sure if those are instructions or limitations.)

(I found this using Mirko Friedenhagen's page (cited by Craig McQueen above))

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!