Preamble
I\'m using git as a version control system for a paper that my lab is writing, in LaTeX. There are several people collaborating.
I\
You could try this:
instead of swapping out a merge engine (hard) you can do some kind of 'normalization' (canonicalization, if you will). I don't speak LateX, but let me illustrate as follows:
Say you have input like test.raw
curve ball well received {misfit} whatever
proprietary format extinction {benefit}.
You want it to diff/merge word-by-word. Add the following .gitattributes
file
*.raw filter=wordbyword
Then
git config --global filter.wordbyword.clean /home/username/bin/wordbyword.clean
git config --global filter.wordbyword.smudge /home/username/bin/wordbyword.smudge
A minimalist implementation of the filters would be
#!/usr/bin/perl
use strict;
use warnings;
while (<>)
{
print "$_\n" foreach (m/(.*?\s+)/go);
print '#@#DELIM#@#' . "\n";
}
#!/usr/bin/perl
use strict;
use warnings;
while (<>)
{
chomp; '#@#DELIM#@#' eq $_ and print "\n" or print;
}
After committing the file, inspect the raw contents of the committed blob with `git show
HEAD:test.raw`:
curve
ball
well
received
{misfit}
whatever
#@#DELIM#@#
proprietary
format
extinction
{benefit}.
#@#DELIM#@#
After changing the contents of test.raw to
curve ball welled repreived {misfit} whatever
proprietary extinction format {benefit}.
The output of git diff --patch-with-stat
will probably what you wanted:
test.raw | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/test.raw b/test.raw
index b0b0b88..ed8c393 100644
--- a/test.raw
+++ b/test.raw
@@ -1,14 +1,14 @@
curve
ball
-well
-received
+welled
+repreived
{misfit}
whatever
#@#DELIM#@#
proprietary
-format
extinction
+format
{benefit}.
#@#DELIM#@#
You can see how this would work magically for merges resulting in word-by-word diffing and merging. Q.E.D.
(I hope you like my creative use of .gitattributes. If not, I enjoyed making this little exercise)
Here's a solution in the same vein as sehe's, with a few changes which hopefully will address your comments:
As in saha's solution make a (or append to) .gittatributes
.
*.tex filter=sentencebreak
Now to implement the clean and smudge filters:
git config filter.sentencebreak.clean "perl -pe \"s/[.]*?(\\?|\\!|\\.|'') /$&%NL%\\n/g unless m/%/||m/^[\\ *\\\\\\]/\""
git config filter.sentencebreak.smudge "perl -pe \"s/%NL%\n//gm\""
I've created a test file with the following contents, notice the one-line paragraph.
\chapter{Tumbling Tumbleweeds. Intro}
A way out west there was a fella, fella I want to tell you about, fella by the name of Jeff Lebowski. At least, that was the handle his lovin' parents gave him, but he never had much use for it himself. This Lebowski, he called himself the Dude. Now, Dude, that's a name no one would self-apply where I come from. But then, there was a lot about the Dude that didn't make a whole lot of sense to me. And a lot about where he lived, like- wise. But then again, maybe that's why I found the place s'durned innarestin'.
This line has two sentences. But it also ends with a comment. % here
After we commit it to the local repo, we can see the raw contents.
$ git show HEAD:test.tex
\chapter{Tumbling Tumbleweeds. Intro}
A way out west there was a fella, fella I want to tell you about, fella by the name of Jeff Lebowski. %NL%
At least, that was the handle his lovin' parents gave him, but he never had much use for it himself. %NL%
This Lebowski, he called himself the Dude. %NL%
Now, Dude, that's a name no one would self-apply where I come from. %NL%
But then, there was a lot about the Dude that didn't make a whole lot of sense to me. %NL%
And a lot about where he lived, like- wise. %NL%
But then again, maybe that's why I found the place s'durned innarestin'.
This line has two sentences. But it also ends with a comment. % here
So the rules of the clean filter are whenever it finds a string of text that ends with .
or ?
or !
or ''
(that's the latex way to do double quotes) then a space, it will add %NL% and a newline character. But it ignores lines that start with \ (latex commands) or contain a comment anywhere (so that comments cannot become part of the main text).
The smudge filter removes %NL% and the newline.
Diffing and merging is done on the 'clean' files so changes to paragraphs are merged sentence by sentence. This is the desired behavior.
The nice thing is that the latex file should compile in either the clean or smudged state, so there is some hope for collaborators to not need to do anything. Finally, you could put the git config
commands in a shell script that is part of the repo so a collaborator would just have to run it in the root of the repo to get configured.
#!/bin/bash
git config filter.sentencebreak.clean "perl -pe \"s/[.]*?(\\?|\\!|\\.|'') /$&%NL%\\n/g unless m/%/||m/^[\\ *\\\\\\]/\""
git config filter.sentencebreak.smudge "perl -pe \"s/%NL%\n//gm\""
fileArray=($(find . -iname "*.tex"))
for (( i=0; i<${#fileArray[@]}; i++ ));
do
perl -pe "s/%NL%\n//gm" < ${fileArray[$i]} > temp
mv temp ${fileArray[$i]}
done
That last little bit is a hack because when this script is first run, the branch is already checked out (in the clean form) and it doesn't get smudged automatically.
You can add this script and the .gitattributes file to the repo, then new users just need to clone, then run the script in the root of the repo.
I think this script even runs on windows git if done in git bash.
Drawbacks:
I believe the git merge algorithm is quite simple (even though you can make it work harder with the "patience" merge strategy).
Its work item will remain the line.
But the general idea is to delegate any fine-grained detection§resolution mechanism to a third-party tool you can setup with git config mergetool.
If some words within a long line differs, that external tool (KDiff3
, DiffMerge
, ...) will be able to pick up that change and present it to you.