How to copy/paste DataFrame from Stack Overflow into Python

前端未结

关注

 3  667

In questions and answers, users very often post an example DataFrame which their question/answer works with:

In []: x
Out[]: 
   bar  foo
0    4


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  死守一世寂寞        
                
              
                            
                2020-11-27 14:07
              
            
            
                                                                       
If you are copy-pasting from CSV file which has standard entries like this:

2016,10,M,0600,0610,13,1020,24
2016,3,F,0300,0330,21,6312,1
2015,4,M,0800,0830,8,7112,30
2015,10,M,0800,0810,19,0125,1
2016,8,M,1500,1510,21,0910,2
2015,10,F,0800,0810,3,8413,5

df =pd.read_clipboard(sep=",", header=None)
df.rename(columns={0: "Name0", 1: "Name1",2:"Name2",3:"Name3",4:"Name4",5:"Name5",6:"Name6",7:"Name7",8:"Name8"})


will give you properly defined pandas Dataframe.


                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  生来不讨喜        
                
              
                            
                2020-11-27 14:10
              
            
            
                                                                       
pd.read_clipboard() is nifty. However, if you're writing code in a script or a notebook (and you want your code to work in the future) it's not a great fit. Here's an alternative way to copy/paste the output of a dataframe into a new dataframe object that ensures that df will outlive the contents of your clipboard:

# py3 only, see below for py2
import pandas as pd
from io import StringIO

d = '''0   1   2   3   4
A   Y   N   N   Y
B   N   Y   N   N
C   N   N   N   N
D   Y   Y   N   Y
E   N   Y   Y   Y
F   Y   Y   N   Y
G   Y   N   N   Y'''

df = pd.read_csv(StringIO(d), sep='\s+')


A few notes:


The triple-quoted string preserves the newlines in the output.
StringIO wraps the output in a file-like object, which read_csv requires.
Setting sep to \s+ makes it so that each contiguous block of whitespace is treated as a single delimiter.


update

The above answer is Python 3 only. If you're stuck in Python 2, replace the import line:

from io import StringIO


with instead:

from StringIO import StringIO


If you have an old version of pandas (v0.24 or older) there's an easy way to write a Py2/Py3 compatible version of the above code:

import pandas as pd

d = ...
df = pd.read_csv(pd.compat.StringIO(d), sep='\s+')


The newest versions of pandas have dropped the compat module along with Python 2 support.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  无人及你        
                
              
                            
                2020-11-27 14:12
              
            
            
                                                                       
Pandas is written by people that really know what people want to do.

Since version 0.13 there's a function pd.read_clipboard which is absurdly effective at making this "just work".

Copy and paste the part of the code in the question that starts bar foo, (i.e. the DataFrame) and do this in a Python interpreter:

In [53]: import pandas as pd
In [54]: df = pd.read_clipboard()

In [55]: df
Out[55]: 
   bar  foo
0    4    1
1    5    2
2    6    3


Caveats


Don't include the iPython In or Out stuff or it won't work
If you have a named index, you currently need to add engine='python' (see this issue on GitHub). The 'c' engine is currently broken when the index is named.
It's not brilliant at MultiIndexes:


Try this:

                      0         1         2
level1 level2                              
foo    a       0.518444  0.239354  0.364764
       b       0.377863  0.912586  0.760612
bar    a       0.086825  0.118280  0.592211


which doesn't work at all, or this:

              0         1         2
foo a  0.859630  0.399901  0.052504
    b  0.231838  0.863228  0.017451
bar a  0.422231  0.307960  0.801993


Which works, but returns something totally incorrect!
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复