BeautifulSoup getting content behind multiple
levels

后端 未结 3 1702
旧巷少年郎
旧巷少年郎 2021-01-27 14:03

How can I get the time data behind two \"divs\" with BeautifulSoup?

6:00.00

I\'ve tried the f

相关标签:
3条回答
  • 2021-01-27 14:14

    To your second question:

    if "kW" in item.text:
        itemval = item.find_parent().find_next_sibling().text.strip()
        output.append(itemval)
    
    0 讨论(0)
  • 2021-01-27 14:19

    div.div selector is too ambiguous, to say the least.

    Since, from what it appears, you are up to getting the "Duration at Rated Power (HH:MM)" field value, I would first locate the corresponding label and then find the next text node matching the field format:

    label = soup.find("label", text="Duration at Rated Power (HH:MM)")
    value = label.find_next(text=re.compile(r"\d+:\d+")).strip()
    print(value)  # prints 6:00.00
    

    (don't forget to import re module)

    0 讨论(0)
  • 2021-01-27 14:27

    Try this to get the time you wish to scrape:

    import requests
    from bs4 import BeautifulSoup
    
    page = requests.get("https://www.energystorageexchange.org/projects/2") 
    soup = BeautifulSoup(page.content, 'lxml')
    for item in soup.select("label.new_font"):
        if "HH:MM" in item.text:
            itemval = item.find_parent().find_next_sibling().text.strip()
            print(itemval)
    

    Output:

    6:00.00
    
    0 讨论(0)
提交回复
热议问题