HTML data is hidden from urllib

∥☆過路亽.° 提交于 2019-12-11 05:14:51

问题


How do I get the real content from this page: http://kursuskatalog.au.dk/da/course/74960/105E17-Demokrati-og-diktatur-i-komparativt-perspektiv

All I get from the code below is some links to javascript and CSS files. Is there a way out of this?

from urllib.request import urlopen
html = urlopen("http://kursuskatalog.au.dk/da/course/74960/105E17-Demokrati-og-diktatur-i-komparativt-perspektiv")
print(html.read())

Best regards, Kresten


回答1:


Content in this URL is created with JavaScript after page is loaded.




回答2:


What is printed is the 'real' content. If you wanted to see the output of that JavaScript code you would need to fetch all the JavaScript through the <script></script> tags and external script and then use a JavaScript Parser to read it. You would not need the CSS scripts for just reading the content as they are just used to style the page.

Unfortunately I can think of no alternative.

I hope I was helpful.



来源:https://stackoverflow.com/questions/47351045/html-data-is-hidden-from-urllib

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!