How to Add a New Column With Selected Values from Another Column In Python

放肆的年华 提交于 2019-12-06 14:00:51

问题


I have been trying to figure this out all day. I am new to Python.

I have a table with about 50,000 records. But the table below will explain what I am trying to do.

I will like to add a third column called Category. This column will contain values based results from the conditions set on the Movies column.

-----------------------------------------
N     | Movies               
-----------------------------------------
1        | Save the Last Dance 
-----------------------------------------
2        | Love and Other Drugs
---------------------------------------
3        | Dance with Me      
---------------------------------------
4        | Love Actually       
---------------------------------------
5        | High School Musical
----------------------------------------

The condition is this; search through the Movies column for these words {Dance, Love, and Musical). If the word is found in the string, return the word in the Category column.

This will produce a new dataframe like this at the end;

-----------------------------------------
N        | Movies               | Category
-----------------------------------------
1        | Save the Last Dance  | Dance
-----------------------------------------
2        | Love and Other Drugs | Love
---------------------------------------
3        | Dance with Me        | Dance
---------------------------------------
4        | Love Actually        | Love
---------------------------------------
5        | High School Musical  | Musical
----------------------------------------

Thanks in advance!!

|improve this question

回答1:


A faster way would be to create a mask for all your categories, assuming you have a smallish number:

In [22]:

dance_mask = df['Movies'].str.contains('Dance')
love_mask = df['Movies'].str.contains('Love')
musical_mask = df['Movies'].str.contains('Musical')
df[dance_mask]
Out[22]:
   N               Movies
0  1  Save the Last Dance
2  3        Dance with Me

[2 rows x 2 columns]

In [26]:
# now set category
df.ix[dance_mask,'Category'] = 'Dance'
df
Out[26]:
   N                Movies Category
0  1   Save the Last Dance    Dance
1  2  Love and Other Drugs      NaN
2  3         Dance with Me    Dance
3  4         Love Actually      NaN
4  5   High School Musical      NaN

[5 rows x 3 columns]

In [28]:
# repeat for remaining masks
df.ix[love_mask,'Category'] = 'Love'
df.ix[musical_mask,'Category'] = 'Musical'
df
Out[28]:
   N                Movies Category
0  1   Save the Last Dance    Dance
1  2  Love and Other Drugs     Love
2  3         Dance with Me    Dance
3  4         Love Actually     Love
4  5   High School Musical  Musical

[5 rows x 3 columns]



回答2:


If you have a 2D list then just do this:

def add_category(record):
    movie = record[1]
    categories = []
    for category in ['Dance', 'Love', 'Musical']:
        if category in movie:
            categories.append(category)
    return record.append(', '.join(categories))

database = [add_category(record) for record in database]

You can change how the values for the category column are calculated by changing the add_category() function.



来源:https://stackoverflow.com/questions/22902885/how-to-add-a-new-column-with-selected-values-from-another-column-in-python

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!