Pandas read_html results in TypeError

后端未结

关注

 3  1853

I\'m using bs4 to parse a html page and extract a table, sample table given below and I\'m trying to load it into pandas but when i call pddataframe = pd.read_html(LOT


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  爱一瞬间的悲伤        
                
              
                            
                2021-01-14 06:39
              
            
            
                                                                       
This exact code works for me.

htm = """<table cellpadding="5" cellspacing="0" class="borders" width="100%">
    <tr>
     <th colspan="2">
      Learning Outcomes
     </th>
    </tr>
    <tr>
     <td class="info" colspan="2">
      On successful completion of this module the learner will be able to:
     </td>
    </tr>
    <tr>
     <td style="width:10%;">
      LO1
     </td>
     <td>
      Demonstrate an awareness of the important role of Financial Accounting information as an input into the decision making process.
     </td>
    </tr>
    <tr>
     <td style="width:10%;">
      LO2
     </td>
     <td>
      Display an understanding of the fundamental accounting concepts, principles and conventions that underpin the preparation of Financial statements.
     </td>
    </tr>
    <tr>
     <td style="width:10%;">
      LO3
     </td>
     <td>
      Understand the various formats in which  information in relation to transactions or events is recorded and classified.
     </td>
    </tr>
    <tr>
     <td style="width:10%;">
      LO4
     </td>
     <td>
      Apply a knowledge of accounting concepts,conventions and techniques such as double entry to the  posting of  recorded information to the T accounts in the Nominal Ledger.
     </td>
    </tr>
    <tr>
     <td style="width:10%;">
      LO5
     </td>
     <td>
      Prepare and present the financial statements of a Sole Trader  in prescribed format from a Trial Balance  accompanies by notes with additional information.
     </td>
    </tr>
   </table> 
"""

pd.read_html(htm, skiprows=2, flavor='bs4')[0]



                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  忘了有多久        
                
              
                            
                2021-01-14 06:40
              
            
            
                                                                       
Pandas can guess.

HTML = '''\
<table cellpadding="5" cellspacing="0" class="borders" width="100%">
    <tr>
     <th colspan="2">
      Learning Outcomes
     </th>


... omitting most of what you had here


      Prepare and present the financial statements of a Sole Trader  in prescribed format from a Trial Balance  accompanies by notes with additional information.
     </td>
    </tr>
   </table>'''

from io import StringIO
import pandas as pd

df = pd.read_html(StringIO(HTML))
print (df)


Result:

[                                                   0  \
0                                  Learning Outcomes   
1  On successful completion of this module the le...   
2                                                LO1   
3                                                LO2   
4                                                LO3   
5                                                LO4   
6                                                LO5   

                                                   1  
0                                                NaN  
1                                                NaN  
2  Demonstrate an awareness of the important role...  
3  Display an understanding of the fundamental ac...  
4  Understand the various formats in which inform...  
5  Apply a knowledge of accounting concepts,conve...  
6  Prepare and present the financial statements o...  ]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  佛祖请我去吃肉        
                
              
                            
                2021-01-14 06:45
              
            
            
                                                                       
Thanks for the pointers from all the suggested answers and comments, my rookie mistake was I had the table in a variable after extracting it using bs4.
I was running pd.read_html(LOTable,skiprows=2, flavor='bs4') when I needed to run pd.read_html(LOTable.prettify(),skiprows=2, flavor='bs4')
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复