BeautifulSoup - How to find a specific class name alone

前端未结

关注

 3  557

How to find the li tags with a specific class name but not others? For example:

...
 no wanted 
 not his


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  感情败类        
                
              
                            
                2021-02-10 06:53
              
            
            
                                                                       
Possibly with a filter function as in the doc

def is_only_z(css_class):
    return css_class is not None and css_class == 'z'

bs4.find_all('li',class_=is_only_z)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  傲寒        
                
              
                            
                2021-02-10 07:04
              
            
            
                                                                       
You can use CSS selectors to match the exact class name.

html = '''<li> no wanted </li>
<li class="a"> not his one </li>
<li class="a z"> neither this one </li>
<li class="b z"> neither this one </li>
<li class="c z"> neither this one </li>
<li class="z"> I WANT THIS ONLY ONE</li>'''

soup = BeautifulSoup(html, 'lxml')

tags = soup.select('li[class="z"]')
print(tags)


The same result can be achieved using lambda:

tags = soup.find_all(lambda tag: tag.name == 'li' and tag.get('class') == ['z'])


Output:

[<li class="z"> I WANT THIS ONLY ONE</li>]




Have a look at Multi-valued attributes. You'll understand why class_='z' matches all the tags that have z in their class name.


  HTML 4 defines a few attributes that can have multiple values. HTML 5 removes a couple of them, but defines a few more. The most common multi-valued attribute is class (that is, a tag can have more than one CSS class). Others include rel, rev, accept-charset, headers, and accesskey. Beautiful Soup presents the value(s) of a multi-valued attribute as a list:

css_soup = BeautifulSoup('<p class="body"></p>')
css_soup.p['class']
# ["body"]

css_soup = BeautifulSoup('<p class="body strikeout"></p>')
css_soup.p['class']
# ["body", "strikeout"]


                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  温柔的废话        
                
              
                            
                2021-02-10 07:06
              
            
            
                                                                       
You can simply do:
data = soup.find_all('li',{'class':'z'})
print(data)

If you only want to get text:
for a in data:
   print(a.text)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复