In git how to diff microsoft word documents?

∥☆過路亽.° 提交于 2019-12-08 00:06:57

问题


I've been following this guide here on how to diff Microsoft Word documents, but I ran into this error:

Usage:  /usr/bin/docx2txt.pl [infile.docx|-|-h] [outfile.txt|-]
        /usr/bin/docx2txt.pl < infile.docx
        /usr/bin/docx2txt.pl < infile.docx > outfile.txt

        In second usage, output is dumped on STDOUT.

        Use '-h' as the first argument to get this usage information.

        Use '-' as the infile name to read the docx file from STDIN.

        Use '-' as the outfile name to dump the text on STDOUT.
        Output is saved in infile.txt if second argument is omitted.

Note:   infile.docx can also be a directory name holding the unzipped content
        of concerned .docx file.

fatal: unable to read files to diff

To explain how I came to that error: I created a .gitattributes in the repository I want to diff from. .gitattributes looks like this:

*.docx diff=word
*.docx difftool=word

I've installed docx2txt. I'm on Linux. I've created a file called docx2txt which contains this:

#!/bin/bash
docx2txt.pl $1 -

I $ chmod a+x docx2txt and I put docx2txt in /usr/bin/

I did:

$ git config diff.word.textconv docx2txt

then tried to diff two microsoft word documents. That's when I got the error I mentioned above.

What am I missing? How do I resolve this error?

PS: I don't know if my shell can find docx2txt because when I do this:

$ docx2txt

my terminal freezes, processing something, but doesn't output anything, and when I do these commands this happens:

$ man docx2txt
No manual entry for docx2txt
$ docx2txt --help
Can't read docx file <--help>!

UPDATE on progress: I changed docx2txt to

#!/bin/bash
docx2txt.pl "$1" -

as pmod suggested, and now git diff <commit> works from the command line! Yay! However, when I try

$ git difftool <commit>

git launches kdiff3 and, I get this pop-up error:

Some input characters could not be converted to valid unicode.
You might be using the wrong codec. (e.g. UTF-8 for non UTF-8 files).
Don't save the result if unsure. Continue at your own risk.
Affected input files are in A, B.

...and all of the characters in the files are mumbo jumbo. The command line displays the diff text correctly, but kdiff3 does not display the text from the diff correctly for some reason.

How do I display the text for the diff correctly in kdiff3 or another gui tool? Should I change kdiff3 to another tool?

Extra: My shell doesn't seem to be able to find docx2txt, because of these commands:

$ which doctxt
which: no doctxt in (/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/lib/jvm/default/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl)

$ which docx2txt
/usr/bin/docx2txt

回答1:


doc2txt.pl expects exactly two arguments or zero according to usage. In the first (your) case arguments either filenames or "-". So, your wrapper script looks correct expect for the case when there is at least one space in filename passed as first argument. In this case, after expansion of $1 filename parts will be passed as separate arguments, thus tool outputs usage info because it reads more than 2 arguments.

Try using quotes to avoid filename splitting:

#!/bin/bash
docx2txt.pl "$1" -

PS: I don't know if my shell can find docx2txt

You can check this with

$ which docx2txt

If you see the path, then tool (binary or runnable script) can be found (based on PATH environment variable).

because when I do this:

$ docx2txt

my terminal freezes, processing something, but doesn't output anything

Without arguments your script will execute doc2txt.pl - which according to tool's usage expects input file passed through STDIN, i.e. what you're typing. Thus, it looks like hanging and processing something, but actually only captures your input.




回答2:


You can use pandoc to convert to markdown

pandoc -f docx -t markdown -o outfile.md infile.docx

and then use meld which is a great gui, to compare the documents

https://askubuntu.com/questions/515900/how-to-compare-two-files




回答3:


install tortoisegit or bcompare . they can do the diff



来源:https://stackoverflow.com/questions/34023396/in-git-how-to-diff-microsoft-word-documents

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!