Classify type of tweet (tweet/retweet/mention) based on tweet text in Python

别等时光非礼了梦想. 提交于 2019-12-25 03:15:55

问题


Pulling from a couple of different examples, I've been able to create a simple Python script that parses the JSON output from the Twitter Streaming API, and prints out the screen_name and text for each tweet. I would like to modify my code to also classify each tweet as one of the following:

(1) Retweet --> There is an "RT @anyusername" somewhere in the tweet text column

(2) Mention --> There is an "@anyusername" but no "RT @anyusername" in the tweet column

(3) Tweet --> There is no "RT @anyusername" nor any "@anyusername" in the tweet column

I can do this in Excel with the following formula, but I can figure it out in Python yet.

=IF(IFERROR(FIND("RT @",B2)>0,"False"),"Retweet",IF(IFERROR(FIND("@",B2)>0,"False"),"Mention","Tweet"))

Existing Code

import json
import sys
from csv import writer

with open(sys.argv[1]) as in_file, \
    open(sys.argv[2], 'w') as out_file:
    print >> out_file, 'tweet_author, tweet_text, tweet_type'
    csv = writer(out_file)

    for line in in_file:
        try:
            tweet = json.loads(line)
        except:
            pass

        tweet_text = tweet['text']

        row = (
        tweet['user']['screen_name'],
        tweet_text
        )
        values = [(value.encode('utf8') if hasattr(value, 'encode') else value) for value in row]
        csv.writerow(values)

回答1:


I don't have any python interpreter here, but it should be something similar to this:

import re


def url_match(tweet):
    match = re.match(r'RT\s@....+', tweet)
    if match:
        return "RT"
    else:
        match = re.match(r'@....+', tweet)
        if match:
           return "mention"
        else
           return "tweet"

Note: this will work for this classification, but if you want to retrieve usernames i.e. @USERNAME you will have to tweak this a little more.



来源:https://stackoverflow.com/questions/23526410/classify-type-of-tweet-tweet-retweet-mention-based-on-tweet-text-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!