python-docx

How to get cell background color in python-docx?

…衆ロ難τιáo~ 提交于 2019-12-24 21:53:57
问题 I'm trying to read data from MS Word table using python-docx. There is a way to set background color of a table cell: tcPr = cell._tc.get_or_add_tcPr() shd = OxmlElement("w:shd") shd.set(qn("w:fill"), rgb2hex(*color)) tcPr.append(shd) My task is contrary, I need to get the existing color. I'm not skilled in xml and I tried this: cell = table.cell(row, col) tcPr = cell._tc.get_or_add_tcPr().get(qn('w:shd')) How ever it returns me None for each read cell regardless of its color. 回答1: As scanny

Can't change “heading 1” font name using docx

♀尐吖头ヾ 提交于 2019-12-23 17:53:05
问题 I am using the following script : header = self.document.add_paragraph(style='Heading 1') header.style.font.name = 'Arial' header.style.font.size = Pt(16) header.add_run('Header One') The result is that "Header One" get 'Calibri'. 回答1: This is a legitimate bug even with python-docx version 0.8.5. If you were to change the font name of the style 'Normal', it works (as shown in the examples on the python-docx manuals), but this does not work for the 'Heading 1' style. One workaround is to

Parse .docx in python 3

前提是你 提交于 2019-12-23 10:08:29
问题 I am currently writing a python 3 program that parses through certain docx files and extracts the text and images from them. I have been trying to use docx but it will not import into my program. I have installed lxml, Pillow, and python-docx yet it does not import. When I try to use python-docx from the terminal I cannot use example-extracttext.py or example-makedocument.py which brings me to believe that the installation didn't run properly. Is there a way I can check if this installed

How to read tables in multiple docx files in a same folder by python

百般思念 提交于 2019-12-23 03:42:00
问题 I have one folder called "Test_Plan". It consist multiple docx files and each docx file has multiple tables. My question is how can I read the whole docx files and give the output? For example, all docx files has multiple tables, I'm picking one docx file and give the output like (i.e) Total Number of Tables: 52 Total Number of YES Automations: 6 Total Number of NO Automations: 5 Like this I need to automate the whole number of files in that "Test_Plan" folder. Hope you understand my question

How to extract the url in hyperlinks from a docx file using python

拥有回忆 提交于 2019-12-23 02:59:04
问题 I've been trying to find out how to get urls from a docx file using python, but failed to find anything, i've tried python-docx, and python-docx2txt, but python-docx only seems to extract the text, while python-docx2txt is able to extract the text from the hyperlink but not the urls themselves. 回答1: I am a beginner on Python and have an assignment to use Python to change each hyperlink in a .docx document. Thanks to Kiran's code which gave me hints to do a few guess, trial and errors and

Attempting to install python-docx (error: Unable to find vcvarsall.bat)

橙三吉。 提交于 2019-12-22 10:08:21
问题 I have tried everything and I have no idea where to go from here. When I call the command pip install python-docx I get this: running build_ext building 'lxml.etree' extension error: Unable to find vcvarsall.bat ---------------------------------------- Command "c:\users\alex\appdata\local\programs\python\python35-32\python.exe -c "import setuptools, tokenize;__file__='C:\\Users\\Alex\\AppData\\Local\\Temp\\pip-build-u2i_l872\\lxml\\setup.py';exec(compile(getattr(tokenize, 'open', open)(__file

Removing personal information from the comments in a word file using python

爷,独闯天下 提交于 2019-12-21 18:04:04
问题 I want to remove all the personal information from the comments inside a word file. Removing the Authors name is fine, I did that using the following, document = Document('sampleFile.docx') core_properties = document.core_properties core_properties.author = "" document.save('new-filename.docx') But this is not what I need, I want to remove the name of any person who commented inside that word file. The way we do it manually is by going into Preferences->security->remove personal information

Removing personal information from the comments in a word file using python

谁说我不能喝 提交于 2019-12-21 18:03:11
问题 I want to remove all the personal information from the comments inside a word file. Removing the Authors name is fine, I did that using the following, document = Document('sampleFile.docx') core_properties = document.core_properties core_properties.author = "" document.save('new-filename.docx') But this is not what I need, I want to remove the name of any person who commented inside that word file. The way we do it manually is by going into Preferences->security->remove personal information

how to know when a new paragraph in python-docx causes a new page

馋奶兔 提交于 2019-12-20 06:38:24
问题 I have to create word documents dynamically using python-docx . I do it by adding table rows dynamically and there is no way to know how many records fit on a page because it depends on the specific data. I need to know when a new element added to the document (table row or paragraph) causes a new page, so I can record some data in the database accordingly with the information that each page contains. This is the code for the word document generation with python-docx : def get_invoice_word

How to identify page breaks using python-docx from docx

╄→尐↘猪︶ㄣ 提交于 2019-12-19 03:26:10
问题 I have several .docx files that contain a number of similar blocks of text: docx files that contain 300+ press releases that are 1-2 pages each, that need to be separated into individual text files. The only consistent way to tell differences between articles is that there is always and only a page break between 2 articles. However, I don't know how to find page breaks when converting the encompassing Word documents to text, and the page break information is lost after the conversion using my