Will Subversion efficiently store OpenXML Office documents?

被刻印的时光 ゝ 提交于 2019-11-27 21:14:07
Wim Coenen

From the OpenXML article on wikipedia:

An Office Open XML file is a ZIP-compatible OPC package containing XML documents and other resources.

In other words, OpenXML files are actually zip files with XML files in them. Compression or encryption "scrambles" the data, sabotaging subversion's ability to generate deltas between revisions. This is not related to the svn:mimetype. Subversion considers all files to be binary when generating deltas.

In Dutch we have a saying "measuring is knowing". The graph below shows the results of an experiment where I imported a 500K OpenXML document in a SVN 1.6 repository (revision 1). I then added a paragraph from another document, saved and committed. This was repeated 5 times (revision 2 to 6).

As you can see, committing a new docx revision that just adds a paragraph will cost you about 150K disk space. This is still much more efficient than just storing a copy of each revision without the help of a version control system.

I also repeated the experiment with a separate test repository by uncompressing each revision of the docx. As you can see, the storage of the document revisions would be much more efficient if it wasn't compressed. It's also interesting to see that subversion's own data compression is about as efficient as zip. Storing the first revision of an uncompressed docx in subversion takes about the same space as the original docx.

YMMV.

Stefan

Subversion handles binary files quite well. It does not store a full copy for every commit but only an efficient binary diff.

See the FAQ about this.

Sadly, you can't currently do this with Subversion, but there has been some discussion around this:

http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=651443

Have you ever tried to open an OpenXML file in a text editor?

To make it short: it is not text, it is still binary. So no, you can’t make Subversion handle it any different.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!