What is an efficient way to generate PDF for data frames in Pandas?
Well one way is to use markdown. You can use df.to_html()
. This converts the dataframe into a html table. From there you can put the generated html into a markdown file (.md) (see http://daringfireball.net/projects/markdown/basics). From there, there are utilities to convert markdown into a pdf (https://www.npmjs.com/package/markdown-pdf).
One all-in-one tool for this method is to use Atom text editor (https://atom.io/). There you can use an extension, search "markdown to pdf", which will make the conversion for you.
Note: When using to_html()
recently I had to remove extra '\n' characters for some reason. I chose to use Atom -> Find -> '\n' -> Replace ""
.
Overall this should do the trick!
Here is how I do it from sqlite database using sqlite3, pandas and pdfkit
import pandas as pd
import pdfkit as pdf
import sqlite3
con=sqlite3.connect("baza.db")
df=pd.read_sql_query("select * from dobit", con)
df.to_html('/home/linux/izvestaj.html')
nazivFajla='/home/linux/pdfPrintOut.pdf'
pdf.from_file('/home/linux/izvestaj.html', nazivFajla)
This is a solution with an intermediate pdf file.
The table is pretty printed with some minimal css.
The pdf conversion is done with weasyprint. You need to pip install weasyprint
.
# Create a pandas dataframe with demo data:
import pandas as pd
demodata_csv = 'https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv'
df = pd.read_csv(demodata_csv)
# Pretty print the dataframe as an html table to a file
intermediate_html = '/tmp/intermediate.html'
to_html_pretty(df,intermediate_html,'Iris Data')
# if you do not want pretty printing, just use pandas:
# df.to_html(intermediate_html)
# Convert the html file to a pdf file using weasyprint
import weasyprint
out_pdf= '/tmp/demo.pdf'
weasyprint.HTML(intermediate_html).write_pdf(out_pdf)
# This is the table pretty printer used above:
def to_html_pretty(df, filename='/tmp/out.html', title=''):
'''
Write an entire dataframe to an HTML file
with nice formatting.
Thanks to @stackoverflowuser2010 for the
pretty printer see https://stackoverflow.com/a/47723330/362951
'''
ht = ''
if title != '':
ht += '<h2> %s </h2>\n' % title
ht += df.to_html(classes='wide', escape=False)
with open(filename, 'w') as f:
f.write(HTML_TEMPLATE1 + ht + HTML_TEMPLATE2)
HTML_TEMPLATE1 = '''
<html>
<head>
<style>
h2 {
text-align: center;
font-family: Helvetica, Arial, sans-serif;
}
table {
margin-left: auto;
margin-right: auto;
}
table, th, td {
border: 1px solid black;
border-collapse: collapse;
}
th, td {
padding: 5px;
text-align: center;
font-family: Helvetica, Arial, sans-serif;
font-size: 90%;
}
table tbody tr:hover {
background-color: #dddddd;
}
.wide {
width: 90%;
}
</style>
</head>
<body>
'''
HTML_TEMPLATE2 = '''
</body>
</html>
'''
Thanks to @stackoverflowuser2010 for the pretty printer, see stackoverflowuser2010's answer https://stackoverflow.com/a/47723330/362951
I did not use pdfkit, because I had some problems with it on a headless machine. But weasyprint is great.
来源:https://stackoverflow.com/questions/33155776/export-pandas-dataframe-into-a-pdf-file-using-python