问题
Pulling from a couple of different examples, I've been able to create a simple Python script that parses the JSON output from the Twitter Streaming API, and prints out the screen_name
and text
for each tweet. I would like to modify my code to also classify each tweet as one of the following:
(1) Retweet --> There is an "RT @anyusername" somewhere in the tweet text column
(2) Mention --> There is an "@anyusername" but no "RT @anyusername" in the tweet column
(3) Tweet --> There is no "RT @anyusername" nor any "@anyusername" in the tweet column
I can do this in Excel with the following formula, but I can figure it out in Python yet.
=IF(IFERROR(FIND("RT @",B2)>0,"False"),"Retweet",IF(IFERROR(FIND("@",B2)>0,"False"),"Mention","Tweet"))
Existing Code
import json
import sys
from csv import writer
with open(sys.argv[1]) as in_file, \
open(sys.argv[2], 'w') as out_file:
print >> out_file, 'tweet_author, tweet_text, tweet_type'
csv = writer(out_file)
for line in in_file:
try:
tweet = json.loads(line)
except:
pass
tweet_text = tweet['text']
row = (
tweet['user']['screen_name'],
tweet_text
)
values = [(value.encode('utf8') if hasattr(value, 'encode') else value) for value in row]
csv.writerow(values)
回答1:
I don't have any python interpreter here, but it should be something similar to this:
import re
def url_match(tweet):
match = re.match(r'RT\s@....+', tweet)
if match:
return "RT"
else:
match = re.match(r'@....+', tweet)
if match:
return "mention"
else
return "tweet"
Note: this will work for this classification, but if you want to retrieve usernames i.e. @USERNAME you will have to tweak this a little more.
来源:https://stackoverflow.com/questions/23526410/classify-type-of-tweet-tweet-retweet-mention-based-on-tweet-text-in-python