python-docx add_style with CTL (Complex text layout) language

后端 未结 2 1262
天涯浪人
天涯浪人 2020-12-20 00:37

What I’m trying to accomplish:

  • Create a paragraph style in python-docx with user defined Persian font and size (a CTL language)

相关标签:
2条回答
  • 2020-12-20 01:17

    After many hours poking around the docx file I realized much to my horror, that the answer lied in style.xml file of the document. Here’s a kind of way to fix it for people with similar problems:

    Problems with Text Direction:

    • If you’ve ever typed in Arabic or Persian you might have seen that aligning the text right to left doesn’t fix all your problems. Because if you don’t change text direction, then the cursor and punctuation marks remain at the far right of the screen (instead of following the last letter) and there is no right-justify if you need it. Now because I couldn’t change text direction in python-docx even by changing “textDirection” value of document.xml from ‘lrTb’ (Left-Right/Top-Bottom) to ‘rlTb’, I had to make a document with LibreOffice and change its default paragraph style (‘Normal’) to what I had in mind (rtl text direction, etc). This actually saves a lot of time later too because you don’t need to do it in python.

    Xml explanation of the font changing problem:

    • The document with altered default style shows a couple of different things in its style.xml file. In Normal paragraph style under "w:rPr" you can see that there is an additional "w:szCs" that determines the size of complex script font (which you can’t change by changing style.font.size) and in "w:rFonts" the value for "cs" is now my specified Persian font. Also the "w:lang" value, “bidi”, is now “fa-IR” (for Persian). Here’s the xml part I’m talking about:

      <w:rPr>
      <w:rFonts w:ascii="FreeMono" w:hAnsi="FreeMono" w:cs="FreeFarsi"/>
      <w:sz w:val="40"/>
      <w:rtl/>
      <w:cs/>
      <w:szCs w:val="40"/>
      <w:lang w:val="en-Us" w:bidi="fa-IR"/>
      </w:rPr>
      
    • Now changing the style.font.size only changes "sz" value (western font size) and doesn’t do anything to "szCs" value (cs font size). And similarly style.font.name only changes "ascii" and "hAnsi" values of "w:rFonts" and doesn't do anything to "cs" value. So to change these values I had to change my style elements in python.

    Solution:

    from docx import Document
    from docx.shared import Pt
    
    #path to doc with altered style:
    base_doc_location = 'base.docx'
    doc = Document(base_doc_location)
    my_style = doc.styles['Normal']
    
    # define your desired fonts
    user_cs_font_size = 16
    user_cs_font_name = 'FreeFarsi'
    user_en_font_size = 12
    user_en_font_name = 'FreeMono'
    
    # get <w:rPr> element of this style
    rpr = my_style.element.rPr
    
    #==================================================
    '''This probably isn't necessary if you already
    have a document with altered style, but just to be
    safe I'm going to add this here'''
    
    if rpr.rFonts is None:
        rpr._add_rFonts()
    if rpr.sz is None:
        rpr._add_sz()
    #==================================================
    
    '''Get the nsmap string for rpr. This is that "w:"
    at the start of elements and element values in xml.
    Like these:
        <w:rPr>
        <w:rFonts>
        w:val
    
    The nsmap is like a url:
    http://schemas.openxmlformats.org/...
    
    Now w:rPr translates to:
    {nsmap url string}rPr
    
    So I made the w_nsmap string like this:'''
    
    w_nsmap = '{'+rpr.nsmap['w']+'}'
    #==================================================
    
    '''Because I didn't find any better ways to get an
    element based on its tag here's a not so great way
    of getting it:
    '''
    szCs = None
    lang = None
    
    for element in rpr:
        if element.tag == w_nsmap + 'szCs':
            szCs = element
        elif element.tag == w_nsmap + 'lang':
            lang = element
    
    '''if there is a szCs and lang element in your style
    those variables will be assigned to it, and if not
    we make those elements and add them to rpr'''
    
    if szCs is None:
        szCs = rpr.makeelement(w_nsmap+'szCs',nsmap=rpr.nsmap)
    if lang is None:
        lang = rpr.makeelement(w_nsmap+'lang',nsmap =rpr.nsmap)
    
    rpr.append(szCs)
    rpr.append(lang)
    #==================================================
    
    '''Now to set our desired values to these elements
    we have to get attrib dictionary of these elements
    and set the name of value as key and our value as
    value for that dict'''
    
    szCs_attrib = szCs.attrib
    lang_attrib = lang.attrib
    rFonts_atr = rpr.rFonts.attrib
    
    '''sz and szCs values are string values and 2 times
    the font size so if you want font size to be 11 you
    have to set sz (for western fonts) or szCs (for CTL
    fonts) to "22" '''
    szCs_attrib[w_nsmap+'val'] =str(int(user_cs_font_size*2))
    
    '''Now to change cs font and bidi lang values'''
    rFonts_atr[w_nsmap+'cs'] = user_cs_font_name
    lang_attrib[w_nsmap+'bidi'] = 'fa-IR' # For Persian
    #==================================================
    
    '''Because we changed default style we don't even
    need to set style every time we add a new paragraph
    And if you change font name or size the normal way
    it won't change these cs values so you can have a
    font for CTL language and a different font for
    western language
    '''
    persian_p = doc.add_paragraph('نوشته')
    en_font = my_style.font
    en_font.name = user_en_font_name
    en_font.size = Pt(user_en_font_size)
    english_p = doc.add_paragraph('some text')
    
    doc.save('ex.docx')
    

    Edit (code improvement):
    I commented the lines that could use some improvement and put the better lines underneath them.

    #rpr = my_style.element.rPr # If None it'll throw errors later
    rpr = my_style.element.get_or_add_rPr() # this avoids potential errors
    #if rpr.rFonts is None:
    #    rpr._add_rFonts()
    rFonts = rpr.get_or_add_rFonts()
    #if rpr.sz is None:
    #    rpr._add_sz()
    rpr.get_or_add_sz()
    
    #by importing these you can make elements and set values quicker
    from docx.oxml.shared import OxmlElement, qn
    #szCs = rpr.makeelement(w_nsmap+'szCs',nsmap=rpr.nsmap)
    szCs = OxmlElement('w:szCs')
    #lang = rpr.makeelement(w_nsmap+'lang',nsmap =rpr.nsmap)
    lang = OxmlElement('w:lang')
    
    #szCs_attrib = szCs.attrib
    #lang_attrib = lang.attrib
    #rFonts_atr = rpr.rFonts.attrib
    #szCs_attrib[w_nsmap+'val'] =str(int(user_cs_font_size*2))
    #rFonts_atr[w_nsmap+'cs'] = user_cs_font_name
    #lang_attrib[w_nsmap+'bidi'] = 'fa-IR'
    
    szCs.set(qn('w:val'),str(int(user_cs_font_size*2)))
    lang.set(qn('w:bidi'),'fa-IR')
    rFonts.set(qn('w:cs'),user_cs_font_name)
    
    0 讨论(0)
  • 2020-12-20 01:21

    I had a similar problem and added the support to the docx library. The forked docx code is in https://github.com/Oritk/python-docx Usage:

    run = p.add_run(line)
    #ru.font.size = Pt(8) ### This line is redundant - but you can leave it
    run.font.cs_size = Pt(8)
    run.font.rtl = True
    
    0 讨论(0)
提交回复
热议问题