My first time posting. I am using BeautifulSoup 4 and python 2.7 (pycharm). I have a webpage containing elements and I need to extract specific elements where the tags
If order is not important just make some changes:
...
dl_data = soup.find_all("dd")
for dlitem in dl_data:
print dlitem.string
Result:
13 September 2015
Starting at £40,130 per annum.
15 December 2015
Starting at £22,460 per annum.
10 January 2014
Starting at £18,160 per annum.
For your latest request:
for item in list(zip(soup.find_all("dd")[0::3],soup.find_all("dd")[2::3])):
date, salary = item
print ', '.join([date.string, salary.string])
Output:
13 September 2015, 100
14 September 2015, 200
I guess it works if you just omit the .parent
in your code. At least this worked for my problem which is very similar to yours.
Here's my html, where order of the <dt>
is not guaranteed:
<dl>
<dt>Time</dt><dd>10:05:02</dd>
<dt>Temp</dt><dd>20.5°C</dd>
</dl>
I'm accessing the values successfully with the following code:
time = at_tl.find("dt",text="Time").findNext("dd").string
temp = at_tl.find("dt",text="Temp").findNext("dd").string