问题
I am trying to extract the hashtags in a tweet. All of the tweets are in one column in a csv file. Although, there are resources on parsing strings and putting the extracted hashtags into a list, I haven't come across a solution on how to parse tweets already stored in list or dictionary. Here is my code:
with open('hash.csv', 'rb') as f:
reader = csv.reader(f, delimiter=',')
for line in reader:
tweet = line[1:2] #This is the column that contains the tweets
for x in tweet:
match = re.findall(r"#(\w+)", x)
if match: print x
I predictably get 'TypeError: expected string or buffer', because it's true, 'tweet' in this case is not a string- it is a list.
Here is where my research has taken me thus far:
Parsing a tweet to extract hashtags into an array in Python
http://www.tutorialspoint.com/python/python_reg_expressions.htm
So I'm iterating through the match list and I'm still getting the whole tweet and not the hashtagged item. I was able to strip the hashtag away but I want to strip everything but the hashtag.
with open('hash.csv', 'rb') as f:
reader = csv.reader(f, delimiter=',')
for line in reader:
tweet = line[1:2]
print tweet
for x in tweet:
match = re.split(r"#(\w+)", x)
hashtags = [i for i in tweet if match]
回答1:
Actually, your problem is probably just a syntax problem. You are calling tweet = line[1:2]
. In python, this says 'take a slice from 1 - 2', which is logically what you want. Unfortunately, it returns the answer as a list -- so you end up with [tweet] instead of tweet!
Try changing that line to tweet = line[1]
and see if that fixes your problem.
On a separate note, this is probably just a typo on your part, but I think you might want to check your indentation -- I think it should look like
for line in reader:
tweet = line[1:2] #This is the column that contains the tweets
for x in tweet:
match = re.findall(r"#(\w+)", x)
if match: print x
unless I'm misunderstanding your logic.
来源:https://stackoverflow.com/questions/24354067/parsing-a-tweet-inside-a-csv-column-in-python