I am in charge of several Excel files and SQL schema files. How should I perform better document version control on these files?
I need to know the part modified (di
This Excel utility works very well for me:
Version Control for Excel
It is a quite straightforward versioning tool for workbooks and VBA macros. Once you commit a version, it is saved to a Git repository on your PC. I never tried it re. SQL schema files, but I'm sure there's a way around.
Tante recommended a very simple approach in Managing ZIP-based file formats in Git:
Open your ~/.gitconfig file (create if not existing already) and add the following stanza:
[diff "zip"] textconv = unzip -c -a
As mentioned in another answer's comment, .xlsx files are just XML.
To get to the XML directory (which is git
-able), you have to "unzip" the .xlsx file to a directory. A quick way see this on Windows is to rename the file <filename>.xlsx to <filename>.zip, and you'll see the inner contents. I'd store this along with the binary so that when you checkout, you do not have to do other steps in order to open the document in Excel.
I've been struggling with this exact problem for the last few days and have written a small .NET utility to extract and normalise Excel files in such a way that they're much easier to store in source control. I've published the executable here:
https://bitbucket.org/htilabs/ooxmlunpack/downloads/OoXmlUnpack.exe
..and the source here:
https://bitbucket.org/htilabs/ooxmlunpack
If there's any interest I'm happy to make this more configurable, but at the moment, you should put the executable in a folder (e.g. the root of your source repository) and when you run it, it will:
Clearly not all of these things are necessary, but the end result is a spreadsheet file that will still open in Excel, but which is much more amenable to diffing and incremental compression. Also, storing the extracted files as well makes it much more obvious in the version history what changes have been applied in each version.
If there's any appetite out there, I'm happy to make the tool more configurable since I guess not everyone will want the contents extracted, or possibly the values removed from formula cells, but these are both very useful to me at the moment.
In tests, a 2 MB spreadsheet 'unpacks' to 21 MB, but then I was able to store five versions of it with small changes between each, in a 1.9 MB Mercurial data file, and visualise the differences between versions effectively using Beyond Compare in text mode.
NB: although I'm using Mercurial, I read this question while researching my solution and there's nothing Mercurial-specific about the solution, should work fine for Git or any other VCS.
We've built an open-source Git command line extension for Excel workbooks: https://www.xltrail.com/git-xltrail.
In a nutshell, the main feature is that it makes git diff
work on any workbook file formats so that it shows the diff on the workbook's VBA content (at some point, we'll make this work for the worksheets content, too).
It's still early days but it might help.
Use the open document extension .fods
. It's a plain, uncompressed XML markup format that both Excel and LibreOffice can open, and the diffs will look good.