How to build a Python crawler for websites using oauth2

ε祈祈猫儿з 提交于 2019-12-04 11:05:34

You should check out the python-oauth2 module. It seems to be the most stable thing out there.

In particular, this blog post has a really good run down on how to do Oauth easily with Python. The example code uses the Foursquare API, so I would check that out first.

I recently had to get oauth working with Dropbox, and wrote this module containing the necessary steps to do oauth exchange.

For my system, the simplest thing I could think of was to pickle the Oauth client. My blog package just deserialized the pickled client and requested endpoints with the following function:

get = lambda x: client.request(x, 'GET')[1]

Just makes sure your workers have this client object and you should be good to go :-)

Get your app authenticated by oauth2 first. This is an example of how to use oauth for twitter authentication. http://popdevelop.com/2010/07/an-example-on-how-to-use-oauth-and-python-to-connect-to-twitter/

Similarly, you can find more examples, at https://code.google.com

Then you can use BeautifulSoup or lxml for html parsing. You can extract relevant data from page source that you will get after your request is complete.

BeautifulSoup Documentation - http://www.crummy.com/software/BeautifulSoup/

To download images, videos, etc you can use openers. Read more about openers on http://docs.python.org/library/urllib2.html

You don't have to do it every time. They'll give you a token that is good for X hours/day. Eventually you'll get 403 http code and you'll need to re-authenticate

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!