I\'ve been playing around with git
and hg
lately and then suddenly it occurred to me that this kind of thing will be great for documents.
I\'ve
I used SVN (yes, in 2020 :-)) with TortoiseSVN on Windows. It has a built-in function to compare DOCX files (it opens Microsoft Word in a mode where your screen is divided into four parts: the file after the changes, before the changes, with changes highlighted and a list of changes). Screenshot below (sorry for the Polish version of MS Word). I also checked TortoiseGIT and it also has this functionality. I've read that TortoiseHG has it as well.
http://tortoisehg.bitbucket.io/ includes a plugin called docdiff that integrates Word and Excel diff'ing.
This article outlines the solution for Docx using Pandoc While this post outlines solution for PDF using pdf2html.
Only for docx, I compiled instructions for multiple places here: https://gist.github.com/nachocab/6429893
# download docx2txt by Sandeep Kumar
wget -O docx2txt.pl http://www.cs.indiana.edu/~kinzler/home/binp/docx2txt
# make a wrapper
echo '#!/bin/bash
docx2txt.pl $1 -' > docx2txt
chmod +x docx2txt
# make sure docx2txt.pl and docx2txt are your current PATH. Here's a guide
http://shapeshed.com/using_custom_shell_scripts_on_osx_or_linux/
mv docx2txt docx2txt.pl ~/bin/
# set .gitattributes (unfortunately I don't this can't be set by default, you have to create it for every project)
echo "*.docx diff=word" > .git/info/attributes
# add the following to ~/.gitconfig
[diff "word"]
binary = true
textconv = docx2txt
# add a new alias
[alias]
wdiff = diff --color-words
# try it
git init
# create my_file.docx, add some content
git add my_file.docx
git commit -m "Initial commit"
# change something in my_file.docx
git wdiff my_file.docx
# awesome!
It works great on OSX
If you happen to use a Mac, I wrote a git merge driver that can use Microsoft Word and tracked changes to merge and show conflicts between any file types Word can read & write.
http://github.com/jasmas/wordMerge
I say 'if you happen to use a Mac' because the driver I wrote uses AppleScript, primarily to accomplish this task.
It'd be nice to add a vbscript version to the project, but at the moment I don't have a Windows environment for testing. Anyone with some basic scripting knowledge should be able to take a look at what I'm doing and duplicate it in vbscript, powershell or whatever on Windows.
You can use Beyond Compare as external diff tool for hg. Add to/change your user mercurial.ini as:
[extdiff]
cmd.vdiff = c:/path/to/BCompare.exe
Then get Beyond Compare file viewer rule for docx.
Now you should be able to compare two versions of docx in Beyond Compare.