BeautifulSoup `find_all` generator

后端未结

关注

 3  972

情书的邮戳 2021-02-01 11:17

Is there any way to turn find_all into a more memory efficient generator? For example:

Given:

soup = BeautifulSoup(content, \"html.parser\


      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   猫巷女王i
                                             
                
                
                (楼主)
            
              
              
                2021-02-01 11:43
              

            
            
                        
Document:


  I gave the generators PEP 8-compliant names, and transformed them into
  properties:


childGenerator() -> children
nextGenerator() -> next_elements
nextSiblingGenerator() -> next_siblings
previousGenerator() -> previous_elements
previousSiblingGenerator() -> previous_siblings
recursiveChildGenerator() -> descendants
parentGenerator() -> parents


There is chapter in the Document named Generators, you can read it.

SoupStrainer will only parse the part of html, it can save memory, but it only exclude the irrelevant tag, if you html has thounds of tag you want, it will result same memory problem. 
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复