How to filter on pandas dataframe when column data type is a list

后端 未结 2 949
面向向阳花
面向向阳花 2021-01-20 18:31

I am having some trouble filtering a pandas dataframe on a column (let\'s call it column_1) whose data type is a list. Specifically, I want to return only rows such that co

2条回答
  •  野趣味
    野趣味 (楼主)
    2021-01-20 19:35

    Hi for long term use you can wrap the whole work flow in functions and apply the functions where you need. As you did not put any example dataset. I am taking an example data set and resolving it. Considering I have text database. First I will find the #tags into a list then I will search the only #tags I want and filter the data.

    # find all the tags in the message
    def find_hashtags(post_msg):
        combo = r'#\w+'
        rx = re.compile(combo)
        hash_tags = rx.findall(post_msg)
        return hash_tags
    
    
    # find the requered match according to a tag list and return true or false
    def match_tags(tag_list, htag_list):
        matched_items = bool(set(tag_list).intersection(htag_list))
        return matched_items
    
    
    test_data = [{'text': 'Head nipid mõnusateks sõitudeks kitsastel tänavatel. #TipStop'},
     {'text': 'Homses Rooli Võimus uus #Peugeot208!\nVaata kindlasti.'},
     {'text': 'Soovitame ennast tulevikuks ette valmistada, electric car sest uus #PeugeotE208 on peagi kohal!  ⚡️⚡️\n#UnboringTheFuture'},
     {'text': "Aeg on täiesti uueks roadtrip'i kogemuseks! \nLase ennast üllatada - #Peugeot5008!"},
     {'text': 'Tõeline ikoon, mille stiil avaldab muljet läbi eco car, electric cars generatsioonide #Peugeot504!'}
    ]
    
    test_df = pd.DataFrame(test_data)
    
    # find all the hashtags
    test_df["hashtags"] = test_df["text"].apply(lambda x: find_hashtags(x))
    
    # the only hashtags we are interested
    tag_search = ["#TipStop", "#Peugeot208"]
    
    # match the tags in our list
    test_df["tag_exist"] = test_df["hashtags"].apply(lambda x: match_tags(x, tag_search))
    
    # filter the data
    main_df = test_df[test_df.tag_exist]
    

提交回复
热议问题