问题
I'm using python 3.6.6 for this.
I'm trying to get the current versionnumber of pycharm from the pycharm website (https://www.jetbrains.com/pycharm/download/#section=windows). The versionnumber is displayed pretty obvious, still I can't get it because I don't know how to process java script properly.
I tried parsing it out with requests_html from:
<li>Version: <span data-code="PCP" data-release-version=""></span></li>
This part should look like this after java script has done its job:
<li>Version: <span data-code="PCP" data-release-version="">2018.1.4</span></li>
Here is my not working script by the way:
from requests_html import HTMLSession
session = HTMLSession()
r = session.get('https://www.jetbrains.com/pycharm/download/#section=windows')
r.html.render()
item = r.html.find('<span data-code="PCP" data-release-version=""></span>')
print(item)
I don't care if there would be any parts left over, I would simply filter them out with RegEx. Still the only thing I'm getting from this is:
[<Element 'span' data-code='PCP' data-release-version=''>]
回答1:
update:
I found an solution my self. It seems like render() is in need for sleep. Also I used xpath instead of search.
from requests_html import HTMLSession
session = HTMLSession()
r = session.get('https://www.jetbrains.com/pycharm/download/#section=windows')
r.html.render(sleep=0.1)
item = r.html.xpath('/html/body/div[1]/div[2]/div/div[2]/div[1]/div[2]/ul[1]/li[1]/span/text()')
print('------------------------------------------------')
print(item)
my Result:
['2018.1.4']
来源:https://stackoverflow.com/questions/51403755/get-renderd-javascript-lines-from-website-in-python