Using BeautifulSoup to extract specific dl and dd list elements

前端 未结 2 1279
别跟我提以往
别跟我提以往 2021-01-20 01:56

My first time posting. I am using BeautifulSoup 4 and python 2.7 (pycharm). I have a webpage containing elements and I need to extract specific elements where the tags

相关标签:
2条回答
  • 2021-01-20 02:28

    If order is not important just make some changes:

    ...
    dl_data = soup.find_all("dd")
    for dlitem in dl_data:
        print dlitem.string
    

    Result:

    13 September 2015
    Starting at £40,130 per annum.
    15 December 2015
    Starting at £22,460 per annum.
    10 January 2014
    Starting at £18,160 per annum.
    

    For your latest request:

    for item in list(zip(soup.find_all("dd")[0::3],soup.find_all("dd")[2::3])):
        date, salary = item
        print ', '.join([date.string, salary.string])
    

    Output:

    13 September 2015, 100
    14 September 2015, 200
    
    0 讨论(0)
  • 2021-01-20 02:50

    I guess it works if you just omit the .parent in your code. At least this worked for my problem which is very similar to yours.

    Here's my html, where order of the <dt> is not guaranteed:

    <dl>
     <dt>Time</dt><dd>10:05:02</dd>
     <dt>Temp</dt><dd>20.5°C</dd>
    </dl>
    

    I'm accessing the values successfully with the following code:

     time = at_tl.find("dt",text="Time").findNext("dd").string
     temp = at_tl.find("dt",text="Temp").findNext("dd").string
    
    0 讨论(0)
提交回复
热议问题