Beautiful soup getting the first child

前端 未结 3 1821
深忆病人
深忆病人 2021-01-17 09:31

How can I get the first child?

 
London
York
相关标签:
3条回答
  • 2021-01-17 09:53

    div.children returns an iterator.

    for div in nsoup.find_all(class_='cities'):
        for childdiv in div.find_all('div'):
            print (childdiv.string) #london, york
    

    AttributeError was raised, because of non-tags like '\n' are in .children. just use proper child selector to find the specific div.

    (more edit) can't reproduce your exceptions - here's what I've done:

    In [137]: print foo.prettify()
    <div class="cities">
     <div id="3232">
      London
     </div>
     <div id="131">
      York
     </div>
    </div>
    
    In [138]: for div in foo.find_all(class_ = 'cities'):
       .....:     for childdiv in div.find_all('div'):
       .....:         print childdiv.string
       .....: 
     London 
     York 
    
    In [139]: for div in foo.find_all(class_ = 'cities'):
       .....:     for childdiv in div.find_all('div'):
       .....:         print childdiv.string, childdiv['id']
       .....: 
     London  3232
     York  131
    
    0 讨论(0)
  • 2021-01-17 09:56

    With modern versions of bs4 (certainly bs4 4.7.1+) you have access to :first-child css pseudo selector. Nice and descriptive. Use soup.select_one if you only want to return the first match i.e. soup.select_one('.cities div:first-child').text. It is considered good practice to test is not None before using .text accessor.

    from bs4 import BeautifulSoup as bs
    
    html = '''
    <div class="cities"> 
           <div id="3232"> London </div>
           <div id="131"> York </div>
      </div>
      '''
    soup = bs(html, 'lxml') #or 'html.parser'
    first_children = [i.text for i in soup.select('.cities div:first-child')]
    print(first_children)
    
    0 讨论(0)
  • 2021-01-17 10:02

    The current accepted answer gets all cities, when the question only wanted the first.

    If you only need the first child, you can take advantage of .children returning an iterator and not a list. Remember that an iterator generates list items on the fly, and because we only need the first element of the iterator, we don't ever need to generate all other city elements (thus saving time).

    for div in nsoup.find_all(class_='cities'):
        first_child = next(div.children, None)
        if first_child is not None:
            print(first_child.string.strip())
    
    0 讨论(0)
提交回复
热议问题