How to follow a redirect with urllib?

柔情痞子 提交于 2019-12-11 09:02:43

问题


I'm creating a script in Python 3 which access a page like:

example.com/daora/zz.asp?x=qqrzzt

using the urllib.request.urlopen("example.com/daora/zz.asp?x=qqrzzt"), but this code just give me the same page(example.com/daora/zz.asp?x=qqrzzt) and on the browser i get a redirect to a page like:

example.com/egg.aspx

What could i do to retrieve the

example.com/egg.aspx

and not the

example.com/daora/zz.asp?x=qqrzzt

I think this is relevant code, this is the code from "example.com/daora/zz.asp?x=qqrzzt":

<head>

<script language="JavaScript">

<!--
    function Submit()

    {
        document.formzz.submit();
    }
-->
</script>

</head>

<body bgcolor="#FFFFFF" leftmargin="0" topmargin="0" marginwidth="0" marginheight="0" onLoad="javascript:Submit();">

<form name="formZZ" method="post" action="http://example.com/egg.aspx">

<input type="hidden" name="token" value="UFASGFJKASGDJFGAJS">

</form>

回答1:


urllib.request follows redirects automatically; you don't need to do anything.

The problem here is that there is no redirect to follow. The web page uses Javascript to fake a form submission as soon as it's loaded. urllib just fetches the page; it doesn't implement a browser DOM and run Javascript code.

Depending on how general you need your script to be, the simplest solution may be something hacky. For example, if you're just trying to spider 500 pages that all have a similar structure but different details, just find the action of the first form and navigate to that.

Also, if fetching the pages and processing them are two distinct steps, you may want to write a fetcher with super-simple Javascript/Greasemonkey (running in the browser, so it's already got a working DOM implementation, etc.) and a separate fancy processing script in Python (which just operates on the finally-fetched/generated HTML pages).

If you need to be fully general, the simplest solution is probably to use the selenium browser automation framework. (Or, maybe, PyWin32 or PyObjC to automate IE or Webkit directly.)

If you want the best possible solution, and have infinite resources… write your own implementation of the DOM and hook up your favorite Javascript interpreter (probably spidermonkey or v8). That's only about 2/3rds as much work as writing a new browser. (And you may be able to find pieces that get you 80% of the way there. For example, if you're willing to use Jython instead of CPython as your Python interpreter, HtmlUnit is pretty slick.)



来源:https://stackoverflow.com/questions/16157719/how-to-follow-a-redirect-with-urllib

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!