Getting a steady flow of messages from twitter

我的未来我决定 提交于 2019-12-10 20:38:04

问题


I'd like to try to make a simple twitter client that learns my tastes and automatically finds friends and interesting tweets to provide me with relevant information.

To get started, I would need to get a good stream of random twitter messages, so I can test a few machine learning algorithms on them.

What API methods should I use for this? Do I have to poll regularly to get messages, or is there a way to get twitter to push messages as they are published?

I'd also be interested in learning about any similar project.


回答1:


I use tweepy to access Twitter API and listen to the public stream they provide -- which should be a one-percent-sample of all tweets. Here is my sample code that I use myself. You can still use the basic auth mechanism for streaming, though they may change that soon. Change the USERNAME and PASSWORD variables accordingly and make sure you respect the error codes that Twitter returns (this sample code might not be respecting the exponential backoff mechanism that Twitter wants in some cases).

import tweepy
import time

def log_error(msg):
    timestamp = time.strftime('%Y%m%d:%H%M:%S')
    sys.stderr.write("%s: %s\n" % (timestamp,msg))

class StreamWatcherListener(tweepy.StreamListener):
  def on_status(self, status):
      print status.text.encode('utf-8')

    def on_error(self, status_code):
      log_error("Status code: %s." % status_code)
      time.sleep(3)
      return True  # keep stream alive

    def on_timeout(self):
      log_error("Timeout.")


def main():
    auth = tweepy.BasicAuthHandler(USERNAME, PASSWORD)
    listener = StreamWatcherListener()
    stream = tweepy.Stream(auth, listener)
    stream.sample()

if __name__ == '__main__':
    try:
      main()
    except KeyboardInterrupt:
      break
    except Exception,e:
      log_error("Exception: %s" % str(e))
      time.sleep(3)

I also set the timeout of the socket module, I believe I had some problems with the default timeout behavior in Python, so be careful.

import socket
socket.setdefaulttimeout(timeout)



回答2:


I don't think you can get access to the world twitter timeline. But you can certainly look at your friends tweets and setup lists to play with, I would recommend using the Twitter4J library http://twitter4j.org/en/index.html

I might have been mistaken, getPublicTimeline() might be what you want.




回答3:


Twitter has a streaming API for just this purpose. They provide a small random sample of all messages posted to twitter, continually updated in a 'push' manner as you describe. If you are doing this for some kind of noble purpose then you can request access from Twitter to a larger sample.

From the API docs, you want statuses/sample:

statuses/sample

Returns a random sample of all public statuses. The default access level, ‘Spritzer’ provides a small proportion of the Firehose, very roughly, 1% of all public statuses. The “Gardenhose” access level provides a proportion more suitable for data mining and research applications that desire a larger proportion to be statistically significant sample. Currently Gardenhose returns, very roughly, 10% of all public statuses. Note that these proportions are subject to unannounced adjustment as traffic volume varies.

URL: http://stream.twitter.com/1/statuses/sample.json

Method(s): GET

Parameters: count, delimited

Returns: stream of status element

Personally, I've had some success using the python library tweepy to use the streaming API.




回答4:


import tweepy, sys, time

ckey = ''
csecret = ''
atoken = ''
asecret = ''
def log_error(msg):
    timestamp = time.strftime('%Y%m%d:%H%M:%S')
    sys.stderr.write("%s: %s\n" % (timestamp,msg))

class StreamWatcherListener(tweepy.StreamListener):
  def on_data(self, status):
    try: #Some of the object are deletion of tweet, won't have 'text' in the dict
      print getData['text']
    except Exception, e:
      pass
    #print text.encode('utf-8')
  def on_error(self, status_code):
    log_error("Status code: %s." % status_code)
    time.sleep(3)
    return True  # keep stream alive
  def on_timeout(self):
    log_error("Timeout.")

def main():
  auth = tweepy.OAuthHandler(ckey, csecret)
  auth.set_access_token(atoken, asecret)
  listener = StreamWatcherListener()
  stream = tweepy.Stream(auth, listener)
  stream.sample()

if __name__ == '__main__':
    try:
      main()
    except Exception,e:
      log_error("Exception: %s" % str(e))
      time.sleep(3)

Tweepy's BasicAuthHandler is deprecated. Here's a new set of code. Have fun!



来源:https://stackoverflow.com/questions/6445875/getting-a-steady-flow-of-messages-from-twitter

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!