How can I get the text by color from a word document with win32com?

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-20 03:45:08

问题


I have a word document with several tables. In each table there are two colors, black and red.

I'd like to get the text from cells in a word document table by its color. I found a way, but I think it's very inefficient.

The following code gets the text from a word table cell, and prints each word with it's color.

import os, sys
import win32com.client, re

path = os.path.join(os.getcwd(),"../files/tests2.docx")
word = win32com.client.Dispatch("Word.Application")
word.Visible = 1
doc=word.Documents.Open(path)

for table in doc.Tables:
    f = 2
    c = 2
    wc = table.Cell(f,c).Range.Words.Count
    for i in range(1,wc):
        print table.Cell(f,c).Range.Words(i), table.Cell(f,c).Range.Words(i).Font.Color

Do you know any other (better) way to achieve this?

Thank you.


回答1:


Here is a way to extract highlighted words from a Word document using python-docx:

#!usr/bin/python
# -*- coding: utf-8 -*-
from docx import *
document = opendocx(r'test.docx')
words = document.xpath('//w:r', namespaces=document.nsmap)
WPML_URI = "{http://schemas.openxmlformats.org/wordprocessingml/2006/main}"
tag_rPr = WPML_URI + 'rPr'
tag_highlight = WPML_URI + 'highlight'
tag_val = WPML_URI + 'val'
tag_t = WPML_URI + 't'
for word in words:
    for rPr in word.findall(tag_rPr):
        high=rPr.findall(tag_highlight)
        for hi in high:
            if hi.attrib[tag_val] == 'yellow':
                print word.find(tag_t).text.encode('utf-8').lower()


来源:https://stackoverflow.com/questions/14625732/how-can-i-get-the-text-by-color-from-a-word-document-with-win32com

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!