Is it possible to diff PowerPoint version-controlled with git?

旧街凉风 提交于 2020-02-18 05:19:31

问题


I have some PowerPoint documents that I keep version-controlled with git. I want to know what differences are between versions of a file. Text is most important, images and formatting not so much (at least not at this point).


回答1:


I wrote this for use with git on the command-line (requires Python and the python-pptx library):

"""
Setup -- Add these lines to the following files:
--- .gitattributes
*.pptx diff=pptx

--- .gitconfig (or repo\.git\config    or your_user_home\.gitconfig) (change the path to point to your local copy of the script)
[diff "pptx"]
    binary = true
    textconv = python C:/Python27/Scripts/git-pptx-textconv.py

usage:
git diff your_powerpoint.pptx


Thanks to the  python-pptx docs and this snippet:
http://python-pptx.readthedocs.org/en/latest/user/quickstart.html#extract-all-text-from-slides-in-presentation
"""

import sys
from pptx import Presentation


if __name__ == '__main__':
    if len(sys.argv) != 2:
        print "Usage: git-pptx-textconv file.xslx"

    path_to_presentation = sys.argv[1]

    prs = Presentation(path_to_presentation)

    for slide in prs.slides:
        for shape in slide.shapes:
            if not shape.has_text_frame:
                continue
            for paragraph in shape.text_frame.paragraphs:
                par_text = ''
                for run in paragraph.runs:
                    s = run.text
                    s = s.replace(r"\\", "\\\\")
                    s = s.replace(r"\n", " ")
                    s = s.replace(r"\r", " ")
                    s = s.replace(r"\t", " ")
                    s = s.rstrip('\r\n')

                    # Convert left and right-hand quotes from Unicode to ASCII
                    # found http://stackoverflow.com/questions/816285/where-is-pythons-best-ascii-for-this-unicode-database
                    # go here if more power is needed  http://code.activestate.com/recipes/251871/
                    # or here                          https://pypi.python.org/pypi/Unidecode/0.04.1
                    punctuation = { 0x2018:0x27, 0x2019:0x27, 0x201C:0x22, 0x201D:0x22 }
                    s.translate(punctuation).encode('ascii', 'ignore')
                    s = s.encode('utf-8')
                    if s:
                        par_text += s
                print par_text



回答2:


I was unable to install python-pptx, as suggested by the accepted answer, so I looked for a node.js solution (that may also work for several other file formats that it can handle).

Install https://github.com/dbashford/textract (npm install --global textract).

Define how to diff "textract" in your .git config. For my Windows machine,

[diff "textract"]
    binary = true
    textconv=textract.cmd

Define in your .gitattributes that *.pptx file should use diff "textract"

*.pptx diff=textract

git diff happily.




回答3:


Not really. PowerPoint file is essentially an archive (zip) of the folder full of files. Git will treat it as a binary file (cause it is).

Maybe there's a 3rd party extension to do it but I've never heard of it.




回答4:


I can't speak directly to git as we use Visual Studio + TFS at work. However, a bit of research reveals this should work. What I do on VS is to integrate WinMerge and its plugin which supports a text comparison of MS Office and PDF files. This allows me to do diffs of pptx, docx, pdf, etc. files published to version control.

For git, the way it should work is:

1) Get WinMerge with the xdocdiff plugin: http://freemind.s57.xrea.com/xdocdiffPlugin/en/index.html 2) Integrate WinMerge with git: https://coderwall.com/p/76wmzq/winmerge-as-git-difftool-on-windows

Hopefully this will allow you to see the text-based diffs for your PowerPoint.



来源:https://stackoverflow.com/questions/32259943/is-it-possible-to-diff-powerpoint-version-controlled-with-git

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!