问题
I have the following operation to add a status showing where any string in a column of one dataframe column is present in a specified column of another dataframe. It looks like this:
df_one['Status'] = np.where(df_one.A.isin(df_two.A), 'Matched','Unmatched')
This won't match if the string case is different. Is it possible to perform this operation while being case insensitive?
Also, is it possible return 'Matched' when a value in df_one.A ends with the full string from df_two.A? e.g. df_one.A abcdefghijkl -> df_two.A ijkl = 'Matched'
回答1:
You can do the first test by converting both strings to lowercase or uppercase (either works) inside the expression (as you aren't reassigning either column back to your DataFrames, the case conversion is only temporary):
df_one['Status'] = np.where(df_one.A.str.lower().isin(df_two.A.str.lower()), \
'Matched', 'Unmatched')
You can perform your second test by checking whether each string in df_one.A ends with any of the strings in df_two.A, like so (assuming you still want a case-insensitive match):
df_one['Endswith_Status'] = np.where(df_one.A.str.lower().apply( \
lambda x: any(x.endswith(i) for i in df_two.A.str.lower())), \
'Matched', 'Unmatched')
来源:https://stackoverflow.com/questions/44979927/pandas-series-case-insensitive-matching-and-partial-matching-between-values