问题
I am planning to search the specific heading in the document, and then i have to strike out all the contents in that heading. The document has many headings, each heading may have paragraph, tables, images altogether or in any combinations.
I have installed docx, i was able to search the specific heading, strike out paragraph, tables.
Now i am not able to access the images under that Heading. To indicate that, the image is strikeout, we are trying to blur the image
Problem 1: I am able to get the Image ID (Resource ID), Image Name for all the images in the document. But i don't know how to get the resource id for the images which is under Specific Heading, and then i have to blur it.
Problem 2: I have enabled Track Changes option using VBMacro from python code. But whatever changes i did using docx (strikeout) is not highlighted for Tracking.
回答1:
These are two separate questions (or three, depending on how you count). I'll address the first one here, you can post the other question as a separate new question. (Maybe: "How use python-pptx to track changes in Word document").
Regarding blurring the image, you have two challenges:
Identify images associated with a particular area in the document.
Blur the image.
There is no direct API support for either of these operations in python-docx
. However, you can use python-docx
to access the underlying XML and make the changes using lxml
calls (which python-docx
uses internally). Such efforts are commonly called "workaround functions", so if you search Google on 'python-docx OR python-pptx workaround function' you will find examples.
An in-line image is stored at the Run
level. So you can iterate over all the runs in the section of interest and see if any of them have images. This analysis page from the python-docx
project has some of the details you'll need: http://python-docx.readthedocs.io/en/latest/dev/analysis/features/shapes/shapes-inline.html
Basically you'd do something like this:
for run in runs: # however you decide to get the runs
r = run._element # this is the `<w:r>` XML element for the run
pics = r.xpath('.//w:drawing/wp:inline/a:graphic/a:graphicData/pic:pic')
if not pics:
break
print(r.xml) # if you want to see the XML for this run
This will print the XML for run elements containing a picture.
Regarding the actual blurring, there are two approaches I can think of:
- Replace the current picture with a "blurred" version.
- Change the transparency of the image in Word to make it look much lighter. This does not remove detail from the image and the actual image is still "behind", unchanged, if for example the user wanted to right click and pick "Save image...".
The second approach is much easier. You'll have to decide whether it meets your requirements.
Once you decide which way you want to go you can search for solutions to that problem or submit a new question focused on that topic.
来源:https://stackoverflow.com/questions/47347505/bluring-the-image-which-is-under-specific-heading-using-python-docx