Save the “Out[]” table of a pandas dataframe as a figure

后端 未结 3 1370
南笙
南笙 2020-12-10 11:53

This may seem to be a useless feature but it would be very helpful for me. I would like to save the output I get inside Canopy IDE. I would not think this is specific to Can

相关标签:
3条回答
  • 2020-12-10 12:08

    Here is a somewhat hackish solution but it gets the job done. You wanted a .pdf but you get a bonus .png. :)

    import numpy as np
    import pandas as pd
    from matplotlib.backends.backend_pdf import PdfPages
    import matplotlib.pyplot as plt
    
    from PySide.QtGui import QImage
    from PySide.QtGui import QPainter
    from PySide.QtCore import QSize
    from PySide.QtWebKit import QWebPage
    
    arrays = [np.hstack([ ['one']*3, ['two']*3]), ['Dog', 'Bird', 'Cat']*2]
    columns = pd.MultiIndex.from_arrays(arrays, names=['foo', 'bar'])
    df =pd.DataFrame(np.zeros((3,6)),columns=columns,index=pd.date_range('20000103',periods=3))
    
    h = "<!DOCTYPE html> <html> <body> <p> " + df.to_html() + " </p> </body> </html>";
    page = QWebPage()
    page.setViewportSize(QSize(5000,5000))
    
    frame = page.mainFrame()
    frame.setHtml(h, "text/html")
    
    img = QImage(1000,700, QImage.Format(5))
    painter = QPainter(img)
    frame.render(painter)
    painter.end()
    a = img.save("html.png")
    
    pp = PdfPages('html.pdf')
    fig = plt.figure(figsize=(8,6),dpi=1080) 
    ax = fig.add_subplot(1, 1, 1)
    img2 = plt.imread("html.png")
    plt.axis('off')
    ax.imshow(img2)
    pp.savefig()
    pp.close()
    

    Edits welcome.

    0 讨论(0)
  • 2020-12-10 12:11

    I think what is needed here is a consistent way of outputting a table to a pdf file amongst graphs output to pdf.

    My first thought is not to use the matplotlib backend i.e.

    from matplotlib.backends.backend_pdf import PdfPages
    

    because it seemed somewhat limited in formatting options and leaned towards formatting the table as an image (thus rendering the text of the table in a non-selectable format)

    If you want to mix dataframe output and matplotlib plots in a pdf without using the matplotlib pdf backend, I can think of two ways.

    1. Generate your pdf of matplotlib figures as before and then insert pages containing the dataframe table afterwards. I view this as a difficult option.
    2. Use a different library to generate the pdf. I illustrate one option to do this below.

    First, install xhtml2pdf library. This seems a little patchily supported, but is active on Github and has some basic usage documentation here. You can install it via pip i.e. pip install xhtml2pdf

    Once you've done that, here is a barebones example embedding a matplotlib figure, then the table (all text selectable), then another figure. You can play around with CSS etc to alter the formatting to your exact specifications, but I think this fulfils the brief:

    from xhtml2pdf import pisa             # this is the module that will do the work
    import numpy as np
    import pandas as pd
    from matplotlib.backends.backend_pdf import PdfPages
    import matplotlib.pyplot as plt
    
    # Utility function
    def convertHtmlToPdf(sourceHtml, outputFilename):
        # open output file for writing (truncated binary)
        resultFile = open(outputFilename, "w+b")
    
        # convert HTML to PDF
        pisaStatus = pisa.CreatePDF(
                sourceHtml,                # the HTML to convert
                dest=resultFile,           # file handle to recieve result
                path='.')                  # this path is needed so relative paths for 
                                           # temporary image sources work
    
        # close output file
        resultFile.close()                 # close output file
    
        # return True on success and False on errors
        return pisaStatus.err
    
    # Main program
    if __name__=='__main__':   
     
        arrays = [np.hstack([ ['one']*3, ['two']*3]), ['Dog', 'Bird', 'Cat']*2]
        columns = pd.MultiIndex.from_arrays(arrays, names=['foo', 'bar'])
        df = pd.DataFrame(np.zeros((3,6)),columns=columns,index=pd.date_range('20000103',periods=3))
    
        # Define your data
        sourceHtml = '<html><head>'         
        # add some table CSS in head
        sourceHtml += '''<style>
                         table, td, th {
                               border-style: double;
                               border-width: 3px;
                         }
    
                         td,th {
                               padding: 5px;
                         }
                         </style>'''
        sourceHtml += '</head><body>'
        #Add a matplotlib figure(s)
        plt.plot(range(20))
        plt.savefig('tmp1.jpg')
        sourceHtml += '\n<p><img src="tmp1.jpg"></p>'
        
        # Add the dataframe
        sourceHtml += '\n<p>' + df.to_html() + '</p>'
        
        #Add another matplotlib figure(s)
        plt.plot(range(70,100))
        plt.savefig('tmp2.jpg')
        sourceHtml += '\n<p><img src="tmp2.jpg"></p>'
        
        sourceHtml += '</body></html>'
        outputFilename = 'test.pdf'
        
        convertHtmlToPdf(sourceHtml, outputFilename)
    

    Note There seems to be a bug in xhtml2pdf at the time of writing which means that some CSS is not respected. Particularly pertinent to this question is that it seems impossible to get double borders around the table


    EDIT

    In response comments, it became obvious that some users (well, at least @Keith who both answered and awarded a bounty!) want the table selectable, but definitely on a matplotlib axis. This is somewhat more in keeping with the original method. Hence - here is a method using the pdf backend for matplotlib and matplotlib objects only. I do not think the table looks as good - in particular the display of hierarchical column headers, but that's a matter of choice, I guess. I'm indebted to this answer and comments for the way to format axes for table display.

    import numpy as np
    import pandas as pd
    from matplotlib.backends.backend_pdf import PdfPages
    import matplotlib.pyplot as plt
    
    # Main program
    if __name__=='__main__':   
        pp = PdfPages('Output.pdf')
        arrays = [np.hstack([ ['one']*3, ['two']*3]), ['Dog', 'Bird', 'Cat']*2]
        columns = pd.MultiIndex.from_arrays(arrays, names=['foo', 'bar'])
        df =pd.DataFrame(np.zeros((3,6)),columns=columns,index=pd.date_range('20000103',periods=3))
    
        plt.plot(range(20))
        pp.savefig()
        plt.close()
    
        # Calculate some sizes for formatting - constants are arbitrary - play around
        nrows, ncols = len(df)+1, len(df.columns) + 10
        hcell, wcell = 0.3, 1.
        hpad, wpad = 0, 0   
        
        #put the table on a correctly sized figure    
        fig=plt.figure(figsize=(ncols*wcell+wpad, nrows*hcell+hpad))
        plt.gca().axis('off')
        matplotlib_tab = pd.tools.plotting.table(plt.gca(),df, loc='center')    
        pp.savefig()
        plt.close()
    
        #Add another matplotlib figure(s)
        plt.plot(range(70,100))
        pp.savefig()
        plt.close()
      
        pp.close()
    
    0 讨论(0)
  • 2020-12-10 12:21

    It is, I believe, an HTML table that your IDE is rendering. This is what ipython notebook does.

    You can get a handle to it thusly:

    from IPython.display import HTML
    import pandas as pd
    data = pd.DataFrame({'spam':['ham','green','five',0,'kitties'],
                         'eggs':[0,1,2,3,4]})
    h = HTML(data.to_html())
    h
    

    and save to an HTML file:

    my_file = open('some_file.html', 'w')
    my_file.write(h.data)
    my_file.close()
    
    0 讨论(0)
提交回复
热议问题