问题
I am new to web scraping and i want to get the html of the page.But when i run the program i get html empty and console show the javascript
from bs4 import BeautifulSoup
import requests
import urllib
url = "https://linkedin.com/company/1005"
r = requests.get(url)
html_content = r.text
soup = BeautifulSoup(html_content,'html.parser')
print (soup.prettify())
回答1:
Problem is not BeautifulSoup
but server which needs more information in requests to give you access to this page. Now it sends JavaScript code which redirects you to login page.
You need User-Agent
header to get this page.
You can use http://httpbin.org/get to see User-Agent
in your browser.
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0'}
url = "https://linkedin.com/company/1005"
r = requests.get(url, headers=headers)
print(r.text)
soup = BeautifulSoup(r.text, 'html.parser')
print(soup.prettify())
来源:https://stackoverflow.com/questions/40255128/how-to-parse-the-website-using-beautifulsoup