Export Pandas DataFrame into a PDF file using Python

后端 未结 4 1443
既然无缘
既然无缘 2020-12-01 04:42

What is an efficient way to generate PDF for data frames in Pandas?

相关标签:
4条回答
  • 2020-12-01 05:16

    This is a solution with an intermediate pdf file.

    The table is pretty printed with some minimal css.

    The pdf conversion is done with weasyprint. You need to pip install weasyprint.

    # Create a pandas dataframe with demo data:
    import pandas as pd
    demodata_csv = 'https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv'
    df = pd.read_csv(demodata_csv)
    
    # Pretty print the dataframe as an html table to a file
    intermediate_html = '/tmp/intermediate.html'
    to_html_pretty(df,intermediate_html,'Iris Data')
    # if you do not want pretty printing, just use pandas:
    # df.to_html(intermediate_html)
    
    # Convert the html file to a pdf file using weasyprint
    import weasyprint
    out_pdf= '/tmp/demo.pdf'
    weasyprint.HTML(intermediate_html).write_pdf(out_pdf)
    
    # This is the table pretty printer used above:
    
    def to_html_pretty(df, filename='/tmp/out.html', title=''):
        '''
        Write an entire dataframe to an HTML file
        with nice formatting.
        Thanks to @stackoverflowuser2010 for the
        pretty printer see https://stackoverflow.com/a/47723330/362951
        '''
        ht = ''
        if title != '':
            ht += '<h2> %s </h2>\n' % title
        ht += df.to_html(classes='wide', escape=False)
    
        with open(filename, 'w') as f:
             f.write(HTML_TEMPLATE1 + ht + HTML_TEMPLATE2)
    
    HTML_TEMPLATE1 = '''
    <html>
    <head>
    <style>
      h2 {
        text-align: center;
        font-family: Helvetica, Arial, sans-serif;
      }
      table { 
        margin-left: auto;
        margin-right: auto;
      }
      table, th, td {
        border: 1px solid black;
        border-collapse: collapse;
      }
      th, td {
        padding: 5px;
        text-align: center;
        font-family: Helvetica, Arial, sans-serif;
        font-size: 90%;
      }
      table tbody tr:hover {
        background-color: #ffffdffffd;
      }
      .wide {
        width: 90%; 
      }
    </style>
    </head>
    <body>
    '''
    
    HTML_TEMPLATE2 = '''
    </body>
    </html>
    '''
    

    Thanks to @stackoverflowuser2010 for the pretty printer, see stackoverflowuser2010's answer https://stackoverflow.com/a/47723330/362951

    I did not use pdfkit, because I had some problems with it on a headless machine. But weasyprint is great.

    0 讨论(0)
  • 2020-12-01 05:25

    First plot table with matplotlib then generate pdf

    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    from matplotlib.backends.backend_pdf import PdfPages
    
    df = pd.DataFrame(np.random.random((10,3)), columns = ("col 1", "col 2", "col 3"))
    
    #https://stackoverflow.com/questions/32137396/how-do-i-plot-only-a-table-in-matplotlib
    fig, ax =plt.subplots(figsize=(12,4))
    ax.axis('tight')
    ax.axis('off')
    the_table = ax.table(cellText=df.values,colLabels=df.columns,loc='center')
    
    #https://stackoverflow.com/questions/4042192/reduce-left-and-right-margins-in-matplotlib-plot
    pp = PdfPages("foo.pdf")
    pp.savefig(fig, bbox_inches='tight')
    pp.close()
    

    reference:

    How do I plot only a table in Matplotlib?

    Reduce left and right margins in matplotlib plot

    0 讨论(0)
  • 2020-12-01 05:25

    Here is how I do it from sqlite database using sqlite3, pandas and pdfkit

    import pandas as pd
    import pdfkit as pdf
    import sqlite3
    
    con=sqlite3.connect("baza.db")
    
    df=pd.read_sql_query("select * from dobit", con)
    df.to_html('/home/linux/izvestaj.html')
    nazivFajla='/home/linux/pdfPrintOut.pdf'
    pdf.from_file('/home/linux/izvestaj.html', nazivFajla)
    
    0 讨论(0)
  • 2020-12-01 05:38

    Well one way is to use markdown. You can use df.to_html(). This converts the dataframe into a html table. From there you can put the generated html into a markdown file (.md) (see http://daringfireball.net/projects/markdown/basics). From there, there are utilities to convert markdown into a pdf (https://www.npmjs.com/package/markdown-pdf).

    One all-in-one tool for this method is to use Atom text editor (https://atom.io/). There you can use an extension, search "markdown to pdf", which will make the conversion for you.

    Note: When using to_html() recently I had to remove extra '\n' characters for some reason. I chose to use Atom -> Find -> '\n' -> Replace "".

    Overall this should do the trick!

    0 讨论(0)
提交回复
热议问题