How to parse the website using Beautifulsoup

前端 未结 1 515
梦如初夏
梦如初夏 2021-01-07 06:50

I am new to web scraping and i want to get the html of the page.But when i run the program i get html empty and console show the javascript

from bs4 import B         


        
相关标签:
1条回答
  • 2021-01-07 07:24

    Problem is not BeautifulSoup but server which needs more information in requests to give you access to this page. Now it sends JavaScript code which redirects you to login page.

    You need User-Agent header to get this page.

    You can use http://httpbin.org/get to see User-Agent in your browser.

    import requests
    from bs4 import BeautifulSoup
    
    headers = {'User-Agent': 'Mozilla/5.0'}
    
    url = "https://linkedin.com/company/1005"
    
    r = requests.get(url, headers=headers)
    print(r.text)
    
    soup = BeautifulSoup(r.text, 'html.parser')
    print(soup.prettify())
    
    0 讨论(0)
提交回复
热议问题