Parsing a script tag with dicts in BeautifulSoup

后端 未结 2 340
野的像风
野的像风 2021-01-22 04:09

Working on a partial answer to this question, I came across a bs4.element.Tag that is a mess of nested dicts and lists (s, below).

Is there a

相关标签:
2条回答
  • 2021-01-22 05:02

    More easy:

    from bs4 import BeautifulSoup
    import requests
    
    link = 'https://stackoverflow.com/jobs?med=site-ui&ref=jobs-tab&sort=p'
    r = requests.get(link)
    soup = BeautifulSoup(r.text, 'html.parser')
    
    s = soup.find('script', type='application/ld+json')
    
    # JUST THIS
    json = json.loads(s.string)
    
    0 讨论(0)
  • 2021-01-22 05:12

    You can use s.text to get the content of the script. It's JSON, so you can then just parse it with json.loads. From there, it's simple dictionary access:

    import json
    
    from bs4 import BeautifulSoup
    import requests
    
    link = 'https://stackoverflow.com/jobs?med=site-ui&ref=jobs-tab&sort=p'
    r = requests.get(link)
    
    soup = BeautifulSoup(r.text, 'html.parser')
    
    s = soup.find('script', type='application/ld+json')
    
    urls = [el['url'] for el in json.loads(s.text)['itemListElement']]
    
    print(urls)
    
    0 讨论(0)
提交回复
热议问题