I am having some trouble filtering a pandas dataframe on a column (let\'s call it column_1) whose data type is a list. Specifically, I want to return only rows such that co
Hi for long term use you can wrap the whole work flow in functions and apply the functions where you need. As you did not put any example dataset. I am taking an example data set and resolving it. Considering I have text database. First I will find the #tags into a list then I will search the only #tags I want and filter the data.
# find all the tags in the message
def find_hashtags(post_msg):
combo = r'#\w+'
rx = re.compile(combo)
hash_tags = rx.findall(post_msg)
return hash_tags
# find the requered match according to a tag list and return true or false
def match_tags(tag_list, htag_list):
matched_items = bool(set(tag_list).intersection(htag_list))
return matched_items
test_data = [{'text': 'Head nipid mõnusateks sõitudeks kitsastel tänavatel. #TipStop'},
{'text': 'Homses Rooli Võimus uus #Peugeot208!\nVaata kindlasti.'},
{'text': 'Soovitame ennast tulevikuks ette valmistada, electric car sest uus #PeugeotE208 on peagi kohal! ⚡️⚡️\n#UnboringTheFuture'},
{'text': "Aeg on täiesti uueks roadtrip'i kogemuseks! \nLase ennast üllatada - #Peugeot5008!"},
{'text': 'Tõeline ikoon, mille stiil avaldab muljet läbi eco car, electric cars generatsioonide #Peugeot504!'}
]
test_df = pd.DataFrame(test_data)
# find all the hashtags
test_df["hashtags"] = test_df["text"].apply(lambda x: find_hashtags(x))
# the only hashtags we are interested
tag_search = ["#TipStop", "#Peugeot208"]
# match the tags in our list
test_df["tag_exist"] = test_df["hashtags"].apply(lambda x: match_tags(x, tag_search))
# filter the data
main_df = test_df[test_df.tag_exist]