How to scrape phone no using python when it show after clicked

后端 未结 2 1563
眼角桃花
眼角桃花 2020-12-22 13:02

I want to scrape phone no but phone no only displays after clicked so please is it possible to scrape phone no directly using python?My code scrape phone no but with starr**

相关标签:
2条回答
  • 2020-12-22 13:36
    import requests
    from bs4 import BeautifulSoup
    import re
    
    
    def Main():
        r = requests.get(
            "https://hipages.com.au/connect/abcelectricservicespl/service/126298")
        soup = BeautifulSoup(r.text, 'html.parser')
        name = soup.find("h1", {'class': 'sc-AykKI'}).text
        print(name)
        person = soup.find(
            "span", {'class': 'Contact__Item-sc-1giw2l4-2 kBpGee'}).text.strip()
        print(person)
        addr = soup.findAll(
            "span", {'class': 'Contact__Item-sc-1giw2l4-2 kBpGee'})[1].text
        print(addr)
        print(re.search('phone\\\\":\\\\"(.*?)\\\\"', r.text).group(1))
        print(re.search('mobile\\\\":\\\\"(.*?)\\\\"', r.text).group(1))
        print(re.search('abn\\\\":\\\\"(.*?)\\\\"', r.text).group(1))
        print(re.search('website\\\\":\\\\"(.*?)\\\\"', r.text).group(1))
    
    
    Main()
    

    Output:

    ABC Electric Services p/l
    Mal
    222 Henry Lawson DRV, Georges Hall NSW 2198
    1800 801 828
    0408 600 950
    37137808989
    www.abcelectricservices.com.au
    

    Or if you would like to parse the full script:

    import requests
    from bs4 import BeautifulSoup
    import pyjsparser
    import json
    import re
    
    
    def Main():
        r = requests.get(
            "https://hipages.com.au/connect/abcelectricservicespl/service/126298")
        soup = BeautifulSoup(r.text, 'html.parser')
        phone = soup.findAll("script")[5]
        tree = pyjsparser.parse(phone.text)
        print(json.loads(tree["body"][0]["expression"]["right"]["value"]))
    
    
    Main()
    

    Another version:

    import requests
    from bs4 import BeautifulSoup
    import re
    import json
    
    
    def Main():
        r = requests.get(
            "https://hipages.com.au/connect/abcelectricservicespl/service/126298")
        soup = BeautifulSoup(r.text, 'html.parser')
        data = soup.findAll("script")[5].text
        source = re.search(r'__INITIAL_STATE__\s*=\s*"({.*})', data).group(1)
        kuku = json.loads(re.sub('(?<!\\\)\\\\"', '"', source))
        print(json.dumps(kuku, indent=4))
    
    
    Main()
    
    0 讨论(0)
  • 2020-12-22 13:39

    Phone number exists in page source already. There is a script in page source starting with window.__INITIAL_STATE__, it contains an object having data against multiple providers so you can get phone number for all of them from here or simply load this object in json and on basis of store as a key, get phone number against that store

    0 讨论(0)
提交回复
热议问题