python-docx

Extracting headings' text from word doc

非 Y 不嫁゛ 提交于 2019-12-01 06:16:17
问题 I am trying to extract text from headings(of any level) in a MS Word document(.docx file). Currently I am trying to solve using python-docx, but unfortunately I am still not able to figure out if it is even feasible after reading it(maybe I am mistaken). I tried to look for the solutions online but found nothing specific to my task. It would be great if someone could guide me here. 回答1: The fundamental challenge is identifying heading paragraphs. There's nothing stopping an author from

docx center text in table cells

落花浮王杯 提交于 2019-12-01 05:57:31
问题 So I am starting to use pythons docx library. Now, I create a table with multiple rows, and only 2 columns, it looks like this: Now, I would like the text in those cells to be centered horizontally. How can I do this? I've searched through docx API documentation but I only saw information about aligning paragraphs. 回答1: There is a code to do this by setting the alignment as you create cells. doc=Document() table = doc.add_table(rows=0, columns=2) row=table.add_row().cells p=row[0].add

Adding an hyperlink in MSWord by using python-docx

♀尐吖头ヾ 提交于 2019-12-01 05:30:39
问题 I am trying to add an hyperlink in a MS Word document using docx module for Python. I searched everywhere (official doc, StackOverflow, Google) but found nothing. I would like to do something like: from docx import Document document = Document() p = document.add_paragraph('A plain paragraph having some ') p.add_hyperlink('Link to my site', target="http://supersitedelamortquitue.fr") Anyone got an idea on how to do that? 回答1: Yes we can do it. Reference import docx from docx.enum.dml import

Update the TOC (table of content) of MS Word .docx documents with Python

半腔热情 提交于 2019-12-01 00:05:31
I use the python package "python-docx" to modify the structure amd content of MS word .docx documents. The package lacks the possibility to update the TOC (table of content) [ Python: Create a "Table Of Contents" with python-docx/lxml . Are there workarounds to update the TOC of a document? I thought about using "win32com.client" from the python package "pywin32" [ https://pypi.python.org/pypi/pypiwin32] or a comparable pypi package offering "cli control" capabilities for MS Office. I tried the following: I changed the document.docx to document.docm and implemented the following macro [ http:/

How to identify page breaks using python-docx from docx

大憨熊 提交于 2019-11-30 22:57:53
I have several .docx files that contain a number of similar blocks of text: docx files that contain 300+ press releases that are 1-2 pages each, that need to be separated into individual text files. The only consistent way to tell differences between articles is that there is always and only a page break between 2 articles. However, I don't know how to find page breaks when converting the encompassing Word documents to text, and the page break information is lost after the conversion using my current script I want to know how to preserve HARD page breaks when converting a .docx file to .txt.

Downloadable docx file in Django

孤街醉人 提交于 2019-11-30 20:24:35
问题 My django web app makes and save docx and I need to make it downloadable. I use simple render_to_response as below. return render_to_response("test.docx", mimetype='application/vnd.ms-word') However, it raises error like 'utf8' codec can't decode byte 0xeb in position 15: invalid continuation byte I couldn't serve this file as static so I need to find a way to serve it as this. Really appreciate for any help. 回答1: Yep, a cleaner options, as stated by wardk would be, using https://python-docx

Update the TOC (table of content) of MS Word .docx documents with Python

馋奶兔 提交于 2019-11-30 18:06:11
问题 I use the python package "python-docx" to modify the structure amd content of MS word .docx documents. The package lacks the possibility to update the TOC (table of content) [Python: Create a "Table Of Contents" with python-docx/lxml. Are there workarounds to update the TOC of a document? I thought about using "win32com.client" from the python package "pywin32" [https://pypi.python.org/pypi/pypiwin32] or a comparable pypi package offering "cli control" capabilities for MS Office. I tried the

How to iterate over everything in a python-docx document?

大憨熊 提交于 2019-11-30 08:36:10
问题 I am using python-docx to convert a Word docx to a custom HTML equivalent. The document that I need to convert has images and tables, but I haven't been able to figure out how to access the images and the tables within a given run. Here is what I am thinking... for para in doc.paragraphs: for run in para.runs: # How to tell if this run has images or tables? ...but I don't see anything on the Run that has info on the InlineShape or Table . Do I have to fall back to the XML directly or is there

How can I insert a checkbox form into a .docx file using python-docx?

ⅰ亾dé卋堺 提交于 2019-11-29 17:30:40
I've been using python to implement a custom parser and use that parsed data to format a word document to be distributed internally. All of the formatting has been straightforward and easy so far but I'm completely stumped on how to insert a checkbox into individual table cells. I've tried using the python object functions within python-docx (using get_or_add_tcPr() , etc.) which causes MS Word to throw the following error when I try to open the file, "The file xxxx cannot be opened because there are problems with the contents Details: The file is corrupt and cannot be opened". After

Python-docx, how to set cell width in tables?

主宰稳场 提交于 2019-11-29 13:14:07
How to set cell width in tables?, so far I got: from docx import Document from docx.shared import Cm, Inches document = Document() table = document.add_table(rows=2, cols=2) table.style = 'TableGrid' #single lines in all cells table.autofit = False col = table.columns[0] col.width=Inches(0.5) #col.width=Cm(1.0) #col.width=360000 #=1cm document.save('test.docx') No mater what number or units I set in col.width, its width does not change. Short answer: set cell width individually. for cell in table_columns[0].cells: cell.width = Inches(0.5) python-docx does what you tell it to do when you set