Save the “Out[]” table of a pandas dataframe as a figure

后端未结

关注

 3  1370

This may seem to be a useless feature but it would be very helpful for me. I would like to save the output I get inside Canopy IDE. I would not think this is specific to Can

相关标签:

3条回答

被撕碎了的回忆

2020-12-10 12:08

Here is a somewhat hackish solution but it gets the job done. You wanted a .pdf but you get a bonus .png. :)

import numpy as np
import pandas as pd
from matplotlib.backends.backend_pdf import PdfPages
import matplotlib.pyplot as plt

from PySide.QtGui import QImage
from PySide.QtGui import QPainter
from PySide.QtCore import QSize
from PySide.QtWebKit import QWebPage

arrays = [np.hstack([ ['one']*3, ['two']*3]), ['Dog', 'Bird', 'Cat']*2]
columns = pd.MultiIndex.from_arrays(arrays, names=['foo', 'bar'])
df =pd.DataFrame(np.zeros((3,6)),columns=columns,index=pd.date_range('20000103',periods=3))

h = "<!DOCTYPE html> <html> <body> <p> " + df.to_html() + " </p> </body> </html>";
page = QWebPage()
page.setViewportSize(QSize(5000,5000))

frame = page.mainFrame()
frame.setHtml(h, "text/html")

img = QImage(1000,700, QImage.Format(5))
painter = QPainter(img)
frame.render(painter)
painter.end()
a = img.save("html.png")

pp = PdfPages('html.pdf')
fig = plt.figure(figsize=(8,6),dpi=1080) 
ax = fig.add_subplot(1, 1, 1)
img2 = plt.imread("html.png")
plt.axis('off')
ax.imshow(img2)
pp.savefig()
pp.close()

Edits welcome.

0 讨论(0)

猫巷女王i

2020-12-10 12:11

I think what is needed here is a consistent way of outputting a table to a pdf file amongst graphs output to pdf.

My first thought is not to use the matplotlib backend i.e.

from matplotlib.backends.backend_pdf import PdfPages

because it seemed somewhat limited in formatting options and leaned towards formatting the table as an image (thus rendering the text of the table in a non-selectable format)

If you want to mix dataframe output and matplotlib plots in a pdf without using the matplotlib pdf backend, I can think of two ways.

Generate your pdf of matplotlib figures as before and then insert pages containing the dataframe table afterwards. I view this as a difficult option.
Use a different library to generate the pdf. I illustrate one option to do this below.

First, install xhtml2pdf library. This seems a little patchily supported, but is active on Github and has some basic usage documentation here. You can install it via pip i.e. pip install xhtml2pdf

Once you've done that, here is a barebones example embedding a matplotlib figure, then the table (all text selectable), then another figure. You can play around with CSS etc to alter the formatting to your exact specifications, but I think this fulfils the brief:

from xhtml2pdf import pisa             # this is the module that will do the work
import numpy as np
import pandas as pd
from matplotlib.backends.backend_pdf import PdfPages
import matplotlib.pyplot as plt

# Utility function
def convertHtmlToPdf(sourceHtml, outputFilename):
    # open output file for writing (truncated binary)
    resultFile = open(outputFilename, "w+b")

    # convert HTML to PDF
    pisaStatus = pisa.CreatePDF(
            sourceHtml,                # the HTML to convert
            dest=resultFile,           # file handle to recieve result
            path='.')                  # this path is needed so relative paths for 
                                       # temporary image sources work

    # close output file
    resultFile.close()                 # close output file

    # return True on success and False on errors
    return pisaStatus.err

# Main program
if __name__=='__main__':   
 
    arrays = [np.hstack([ ['one']*3, ['two']*3]), ['Dog', 'Bird', 'Cat']*2]
    columns = pd.MultiIndex.from_arrays(arrays, names=['foo', 'bar'])
    df = pd.DataFrame(np.zeros((3,6)),columns=columns,index=pd.date_range('20000103',periods=3))

    # Define your data
    sourceHtml = '<html><head>'         
    # add some table CSS in head
    sourceHtml += '''<style>
                     table, td, th {
                           border-style: double;
                           border-width: 3px;
                     }

                     td,th {
                           padding: 5px;
                     }
                     </style>'''
    sourceHtml += '</head><body>'
    #Add a matplotlib figure(s)
    plt.plot(range(20))
    plt.savefig('tmp1.jpg')
    sourceHtml += '\n<p><img src="tmp1.jpg"></p>'
    
    # Add the dataframe
    sourceHtml += '\n<p>' + df.to_html() + '</p>'
    
    #Add another matplotlib figure(s)
    plt.plot(range(70,100))
    plt.savefig('tmp2.jpg')
    sourceHtml += '\n<p><img src="tmp2.jpg"></p>'
    
    sourceHtml += '</body></html>'
    outputFilename = 'test.pdf'
    
    convertHtmlToPdf(sourceHtml, outputFilename)

Note There seems to be a bug in xhtml2pdf at the time of writing which means that some CSS is not respected. Particularly pertinent to this question is that it seems impossible to get double borders around the table

EDIT

In response comments, it became obvious that some users (well, at least @Keith who both answered and awarded a bounty!) want the table selectable, but definitely on a matplotlib axis. This is somewhat more in keeping with the original method. Hence - here is a method using the pdf backend for matplotlib and matplotlib objects only. I do not think the table looks as good - in particular the display of hierarchical column headers, but that's a matter of choice, I guess. I'm indebted to this answer and comments for the way to format axes for table display.

import numpy as np
import pandas as pd
from matplotlib.backends.backend_pdf import PdfPages
import matplotlib.pyplot as plt

# Main program
if __name__=='__main__':   
    pp = PdfPages('Output.pdf')
    arrays = [np.hstack([ ['one']*3, ['two']*3]), ['Dog', 'Bird', 'Cat']*2]
    columns = pd.MultiIndex.from_arrays(arrays, names=['foo', 'bar'])
    df =pd.DataFrame(np.zeros((3,6)),columns=columns,index=pd.date_range('20000103',periods=3))

    plt.plot(range(20))
    pp.savefig()
    plt.close()

    # Calculate some sizes for formatting - constants are arbitrary - play around
    nrows, ncols = len(df)+1, len(df.columns) + 10
    hcell, wcell = 0.3, 1.
    hpad, wpad = 0, 0   
    
    #put the table on a correctly sized figure    
    fig=plt.figure(figsize=(ncols*wcell+wpad, nrows*hcell+hpad))
    plt.gca().axis('off')
    matplotlib_tab = pd.tools.plotting.table(plt.gca(),df, loc='center')    
    pp.savefig()
    plt.close()

    #Add another matplotlib figure(s)
    plt.plot(range(70,100))
    pp.savefig()
    plt.close()
  
    pp.close()

0 讨论(0)

傲寒

2020-12-10 12:21

It is, I believe, an HTML table that your IDE is rendering. This is what ipython notebook does.

You can get a handle to it thusly:

from IPython.display import HTML
import pandas as pd
data = pd.DataFrame({'spam':['ham','green','five',0,'kitties'],
                     'eggs':[0,1,2,3,4]})
h = HTML(data.to_html())
h

and save to an HTML file:

my_file = open('some_file.html', 'w')
my_file.write(h.data)
my_file.close()

0 讨论(0)