Scraping data from the tag names in python

前端 未结 2 835
伪装坚强ぢ
伪装坚强ぢ 2021-01-25 06:40

Hi I am trying to scrape user data from a website. I need User ID which are available in the tag names itself.I am trying to scrape the UID using python selenium and beautiful s

相关标签:
2条回答
  • 2021-01-25 07:35

    you can use .get method and scrape the tag names easily,

    in your question;

    soup.get('id')

    of course, if there are many id tags exist, you need to use more specific tags with find or find_all method before using the .get

    0 讨论(0)
  • 2021-01-25 07:40

    Assuming the id attribute value is always in the format UID_ followed by one or more alphanumeric characters followed by -SRC_ followed by one or more digits:

    import re
    from bs4 import BeautifulSoup
    
    soup = BeautifulSoup(html)
    
    pattern = re.compile(r"UID_(\w+)\-SRC_\d+")
    id = soup.find("div", id=pattern)["id"]
    
    uid = pattern.match(id).group(1)
    print(uid)
    

    Here we are using BeautifulSoup and searching for an id attribute value to match a specific regular expression. It contains a saving group (\w+) that helps us to extract the UID value.

    0 讨论(0)
提交回复
热议问题