Bluring the Image which is under specific Heading using Python docx

前提是你 提交于 2020-01-04 04:00:47

问题


I am planning to search the specific heading in the document, and then i have to strike out all the contents in that heading. The document has many headings, each heading may have paragraph, tables, images altogether or in any combinations.

I have installed docx, i was able to search the specific heading, strike out paragraph, tables.

Now i am not able to access the images under that Heading. To indicate that, the image is strikeout, we are trying to blur the image

Problem 1: I am able to get the Image ID (Resource ID), Image Name for all the images in the document. But i don't know how to get the resource id for the images which is under Specific Heading, and then i have to blur it.

Problem 2: I have enabled Track Changes option using VBMacro from python code. But whatever changes i did using docx (strikeout) is not highlighted for Tracking.


回答1:


These are two separate questions (or three, depending on how you count). I'll address the first one here, you can post the other question as a separate new question. (Maybe: "How use python-pptx to track changes in Word document").

Regarding blurring the image, you have two challenges:

  1. Identify images associated with a particular area in the document.

  2. Blur the image.

There is no direct API support for either of these operations in python-docx. However, you can use python-docx to access the underlying XML and make the changes using lxml calls (which python-docx uses internally). Such efforts are commonly called "workaround functions", so if you search Google on 'python-docx OR python-pptx workaround function' you will find examples.

An in-line image is stored at the Run level. So you can iterate over all the runs in the section of interest and see if any of them have images. This analysis page from the python-docx project has some of the details you'll need: http://python-docx.readthedocs.io/en/latest/dev/analysis/features/shapes/shapes-inline.html

Basically you'd do something like this:

for run in runs:  # however you decide to get the runs
    r = run._element  # this is the `<w:r>` XML element for the run
    pics = r.xpath('.//w:drawing/wp:inline/a:graphic/a:graphicData/pic:pic')
    if not pics:
        break
    print(r.xml)  # if you want to see the XML for this run

This will print the XML for run elements containing a picture.

Regarding the actual blurring, there are two approaches I can think of:

  1. Replace the current picture with a "blurred" version.
  2. Change the transparency of the image in Word to make it look much lighter. This does not remove detail from the image and the actual image is still "behind", unchanged, if for example the user wanted to right click and pick "Save image...".

The second approach is much easier. You'll have to decide whether it meets your requirements.

Once you decide which way you want to go you can search for solutions to that problem or submit a new question focused on that topic.



来源:https://stackoverflow.com/questions/47347505/bluring-the-image-which-is-under-specific-heading-using-python-docx

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!