How can I get the first child?
London
York
div.children returns an iterator.
for div in nsoup.find_all(class_='cities'):
for childdiv in div.find_all('div'):
print (childdiv.string) #london, york
AttributeError was raised, because of non-tags like '\n'
are in .children
. just use proper child selector to find the specific div.
(more edit) can't reproduce your exceptions - here's what I've done:
In [137]: print foo.prettify()
<div class="cities">
<div id="3232">
London
</div>
<div id="131">
York
</div>
</div>
In [138]: for div in foo.find_all(class_ = 'cities'):
.....: for childdiv in div.find_all('div'):
.....: print childdiv.string
.....:
London
York
In [139]: for div in foo.find_all(class_ = 'cities'):
.....: for childdiv in div.find_all('div'):
.....: print childdiv.string, childdiv['id']
.....:
London 3232
York 131
With modern versions of bs4 (certainly bs4 4.7.1+) you have access to :first-child css pseudo selector. Nice and descriptive. Use soup.select_one
if you only want to return the first match i.e. soup.select_one('.cities div:first-child').text
. It is considered good practice to test is not None
before using .text
accessor.
from bs4 import BeautifulSoup as bs
html = '''
<div class="cities">
<div id="3232"> London </div>
<div id="131"> York </div>
</div>
'''
soup = bs(html, 'lxml') #or 'html.parser'
first_children = [i.text for i in soup.select('.cities div:first-child')]
print(first_children)
The current accepted answer gets all cities, when the question only wanted the first.
If you only need the first child, you can take advantage of .children
returning an iterator and not a list. Remember that an iterator generates list items on the fly, and because we only need the first element of the iterator, we don't ever need to generate all other city elements (thus saving time).
for div in nsoup.find_all(class_='cities'):
first_child = next(div.children, None)
if first_child is not None:
print(first_child.string.strip())