Parsing a tweet inside a csv column in Python

浪子不回头ぞ 提交于 2019-12-25 16:56:29

问题


I am trying to extract the hashtags in a tweet. All of the tweets are in one column in a csv file. Although, there are resources on parsing strings and putting the extracted hashtags into a list, I haven't come across a solution on how to parse tweets already stored in list or dictionary. Here is my code:

with open('hash.csv', 'rb') as f:
reader = csv.reader(f, delimiter=',')
for line in reader:
    tweet = line[1:2] #This is the column that contains the tweets
for x in tweet:
    match = re.findall(r"#(\w+)", x)
    if match: print x

I predictably get 'TypeError: expected string or buffer', because it's true, 'tweet' in this case is not a string- it is a list.

Here is where my research has taken me thus far:

Parsing a tweet to extract hashtags into an array in Python

http://www.tutorialspoint.com/python/python_reg_expressions.htm


So I'm iterating through the match list and I'm still getting the whole tweet and not the hashtagged item. I was able to strip the hashtag away but I want to strip everything but the hashtag.

with open('hash.csv', 'rb') as f:
        reader = csv.reader(f, delimiter=',')
        for line in reader:
            tweet = line[1:2]
            print tweet
            for x in tweet:
                match = re.split(r"#(\w+)", x)
                hashtags = [i for i in tweet if match]

回答1:


Actually, your problem is probably just a syntax problem. You are calling tweet = line[1:2]. In python, this says 'take a slice from 1 - 2', which is logically what you want. Unfortunately, it returns the answer as a list -- so you end up with [tweet] instead of tweet!

Try changing that line to tweet = line[1] and see if that fixes your problem.


On a separate note, this is probably just a typo on your part, but I think you might want to check your indentation -- I think it should look like

for line in reader:
  tweet = line[1:2] #This is the column that contains the tweets
  for x in tweet:
    match = re.findall(r"#(\w+)", x)
    if match: print x

unless I'm misunderstanding your logic.



来源:https://stackoverflow.com/questions/24354067/parsing-a-tweet-inside-a-csv-column-in-python

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!