remove usernames from twitter data using python

后端 未结 3 1837
轻奢々
轻奢々 2021-01-27 06:42

I have fetched some data from Twitter using python. now I want to pre process it. how can I remove usernames if the tweet has username between two words and there is no space am

相关标签:
3条回答
  • 2021-01-27 07:05

    Code

    def remove_pattern(input_txt, pattern):
        r = re.findall(pattern, input_txt)
        for i in r:
            input_txt = re.sub(i, '', input_txt)
        return input_txt    
    
    combi['tidy_tweet'] = np.vectorize(remove_pattern)(combi['tweet'], "@[\w]*") 
    

    Result:-

    Thanks :)

    0 讨论(0)
  • 2021-01-27 07:05
    import re
    Tweet = "Hello@username"
    Tweet = re.sub('@[\w]+','',Tweet)
    

    Building on @NegiBabu's solution, Twitter only allows alphanumeric handles and so [\w] works as a better regex for this task. For e.g. with my proposed regex you wouldn't allow for @app#le to be matched.

    0 讨论(0)
  • 2021-01-27 07:11
    import re
    Tweet = "Hello@username"
    Tweet = re.sub('@[^\s]+','',Tweet)
    

    This code will remove the @username and Hello will not be removed.

    0 讨论(0)
提交回复
热议问题