Getting the final destination of a javascript redirect on a website

后端 未结 2 575
悲哀的现实
悲哀的现实 2020-12-20 03:40

I parse a website with python. They use a lot of redirects and they do them by calling javascript functions.

So when I just use urllib to parse the site, it doesn\'t

相关标签:
2条回答
  • 2020-12-20 04:03

    It doesnt sound like fun to me, but every javascript function is a is also an object, so you can just read the function rather than call it and perhaps the URL is in it. Otherwise, that function may call another which you would then have to recurse into... Again, doesnt sound like fun, but might be doable.

    0 讨论(0)
  • 2020-12-20 04:11

    I looked into Selenium. And if you are not running a pure script (meaning you don't have a display and can't start a "normal" browser) the solution is actually quite simple:

    from selenium import webdriver
    
    driver = webdriver.Firefox()
    link = "http://yourlink.com"
    driver.get(link)
    
    #this waits for the new page to load
    while(link == driver.current_url):
      time.sleep(1)
    
    redirected_url = driver.current_url
    

    For my usecase this is more than enough. Selenium can also interact with forms and send keystrokes to the website.

    0 讨论(0)
提交回复
热议问题