Get webpage contents with Python?

前端 未结 8 1502
情话喂你
情话喂你 2020-12-04 15:22

I\'m using Python 3.1, if that helps.

Anyways, I\'m trying to get the contents of this webpage. I Googled for a little bit and tried different things, but they didn\

相关标签:
8条回答
  • 2020-12-04 15:48

    Suppose you want to GET a webpage's content. The following code does it:

    # -*- coding: utf-8 -*-
    # python
    
    # example of getting a web page
    
    from urllib import urlopen
    print urlopen("http://xahlee.info/python/python_index.html").read()
    
    0 讨论(0)
  • 2020-12-04 15:49

    Because you're using Python 3.1, you need to use the new Python 3.1 APIs.

    Try:

    urllib.request.urlopen('http://www.python.org/')
    

    Alternately, it looks like you're working from Python 2 examples. Write it in Python 2, then use the 2to3 tool to convert it. On Windows, 2to3.py is in \python31\tools\scripts. Can someone else point out where to find 2to3.py on other platforms?

    Edit

    These days, I write Python 2 and 3 compatible code by using six.

    from six.moves import urllib
    urllib.request.urlopen('http://www.python.org')
    

    Assuming you have six installed, that runs on both Python 2 and Python 3.

    0 讨论(0)
  • 2020-12-04 15:51

    A solution with works with Python 2.X and Python 3.X:

    try:
        # For Python 3.0 and later
        from urllib.request import urlopen
    except ImportError:
        # Fall back to Python 2's urllib2
        from urllib2 import urlopen
    
    url = 'http://hiscore.runescape.com/index_lite.ws?player=zezima'
    response = urlopen(url)
    data = str(response.read())
    
    0 讨论(0)
  • 2020-12-04 16:03

    The best way to do this these day is to use the 'requests' library:

    import requests
    response = requests.get('http://hiscore.runescape.com/index_lite.ws?player=zezima')
    print (response.status_code)
    print (response.content)
    
    0 讨论(0)
  • 2020-12-04 16:03

    Mechanize is a great package for "acting like a browser", if you want to handle cookie state, etc.

    http://wwwsearch.sourceforge.net/mechanize/

    0 讨论(0)
  • 2020-12-04 16:03

    You can use urlib2 and parse the HTML yourself.

    Or try Beautiful Soup to do some of the parsing for you.

    0 讨论(0)
提交回复
热议问题