Beautiful soup getting the first child

前端未结

关注

 3  1823

How can I get the first child?

  
        London 
        York 

        
                      
              相关标签:
       
      
      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  日久生厌        
                
              
                            
                2021-01-17 09:53
              
            
            
                                                                       
div.children returns an iterator.

for div in nsoup.find_all(class_='cities'):
    for childdiv in div.find_all('div'):
        print (childdiv.string) #london, york


AttributeError was raised, because of non-tags like '\n' are in .children. just use proper child selector to find the specific div.

(more edit) can't reproduce your exceptions - here's what I've done:

In [137]: print foo.prettify()
<div class="cities">
 <div id="3232">
  London
 </div>
 <div id="131">
  York
 </div>
</div>

In [138]: for div in foo.find_all(class_ = 'cities'):
   .....:     for childdiv in div.find_all('div'):
   .....:         print childdiv.string
   .....: 
 London 
 York 

In [139]: for div in foo.find_all(class_ = 'cities'):
   .....:     for childdiv in div.find_all('div'):
   .....:         print childdiv.string, childdiv['id']
   .....: 
 London  3232
 York  131

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  无人及你        
                
              
                            
                2021-01-17 09:56
              
            
            
                                                                       
With modern versions of bs4 (certainly bs4 4.7.1+) you have access to :first-child css pseudo selector. Nice and descriptive. Use soup.select_one if you only want to return the first match i.e. soup.select_one('.cities div:first-child').text. It is considered good practice to test is not None before using .text accessor.
from bs4 import BeautifulSoup as bs

html = '''
<div class="cities"> 
       <div id="3232"> London </div>
       <div id="131"> York </div>
  </div>
  '''
soup = bs(html, 'lxml') #or 'html.parser'
first_children = [i.text for i in soup.select('.cities div:first-child')]
print(first_children)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  粉色の甜心        
                
              
                            
                2021-01-17 10:02
              
            
            
                                                                       
The current accepted answer gets all cities, when the question only wanted the first. 

If you only need the first child, you can take advantage of .children returning an iterator and not a list. Remember that an iterator generates list items on the fly, and because we only need the first element of the iterator, we don't ever need to generate all other city elements (thus saving time).

for div in nsoup.find_all(class_='cities'):
    first_child = next(div.children, None)
    if first_child is not None:
        print(first_child.string.strip())

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复