Don't put html, head and body tags automatically, beautifulsoup

前端未结

关注

 8  1475

using beautifulsoup with html5lib, it puts the html, head and body tags automatically:

BeautifulSoup(\'FOO\', \'html5lib\') # => <


                      
              相关标签:


      
      
        
          8条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  我寻月下人不归        
                
              
                            
                2020-12-03 10:33
              
            
            
                                                                       
Yet another solution:

from bs4 import BeautifulSoup
soup = BeautifulSoup('<p>Hello <a href="http://google.com">Google</a></p><p>Hi!</p>', 'lxml')
# content handling example (just for example)
# replace Google with StackOverflow
for a in soup.findAll('a'):
  a['href'] = 'http://stackoverflow.com/'
  a.string = 'StackOverflow'
print ''.join([unicode(i) for i in soup.html.body.findChildren(recursive=False)])

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  栀梦        
                
              
                            
                2020-12-03 10:34
              
            
            
                                                                       
Since v4.0.1 there's a method decode_contents():
>>> BeautifulSoup('<h1>FOO</h1>', 'html5lib').decode_contents()
'<h1>FOO</h1>' 

More details in a solution to this question:
https://stackoverflow.com/a/18602241/237105
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     上一页
1
2
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复