How to split a column into three columns in pandas

后端未结

关注

 3  1098

I have a data frame as shown below

ID  Name     Address
1   Kohli    Country: India; State: Delhi; Sector: SE25
2   Sachin   Country: India; State: Mumbai; Secto


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  予麋鹿        
                
              
                            
                2021-01-21 17:37
              
            
            
                                                                       
Use list comprehension with dict comprehension for list of dictionaries and pass to DataFrame constructor:

L = [{k:v for y in x.split('; ')  for k, v in dict([y.split(': ')]).items()} 
          for x in df.pop('Address')]

df = df.join(pd.DataFrame(L, index=df.index))
print (df)
   ID     Name    Country     State Sector
0   1    Kohli      India     Delhi   SE25
1   2   Sachin      India    Mumbai   SE39
2   3  Ponting  Australia  Tasmania    NaN


Or use split with reshape stack:

df1 = (df.pop('Address')
         .str.split('; ', expand=True)
         .stack()
         .reset_index(level=1, drop=True)
         .str.split(': ', expand=True)
         .set_index(0, append=True)[1]
         .unstack()
         )
print (df1)
0    Country Sector     State
0      India   SE25     Delhi
1      India   SE39    Mumbai
2  Australia    NaN  Tasmania

df = df.join(df1)
print (df)
   ID     Name    Country Sector     State
0   1    Kohli      India   SE25     Delhi
1   2   Sachin      India   SE39    Mumbai
2   3  Ponting  Australia    NaN  Tasmania

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  爱一瞬间的悲伤        
                
              
                            
                2021-01-21 17:40
              
            
            
                                                                       
You are almost there

cols = ['ZONE', 'State', 'Sector']
df[cols] = pd.DataFrame(df['ADDRESS'].str.split('; ',2).tolist(),
                                   columns = cols)

for col in cols:
    df[col] = df[col].str.split(': ').apply(lambda x:x[1])

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  情书的邮戳        
                
              
                            
                2021-01-21 17:58
              
            
            
                                                                       
Original answer

This can also do the job:

import pandas as pd

df = pd.DataFrame(
 [
     {'ID': 1, 'Name': 'Kohli', 'Address': 'Country: India; State: Delhi; Sector: SE25'},
     {'ID': 2, 'Name': 'Sachin','Address': 'Country: India; State: Mumbai; Sector: SE39'},
     {'ID': 3,'Name': 'Ponting','Address': 'Country: Australia; State: Tasmania'}
 ]
)

cols_to_extract = ['ZONE', 'State', 'Sector']
list_of_rows = df['Address'].str.split(';', 2).tolist()
df[cols_to_extract] = pd.DataFrame(
    [[item.split(': ')[1] for item in row] for row in list_of_rows], 
    columns=cols_to_extract)


Output would be the following:

>> df[['ID', 'Name', 'ZONE', 'State', 'Sector']]

ID  Name    ZONE       State     Sector
1   Kohli   India      Delhi     SE25
2   Sachin  India      Mumbai    SE39
3   Ponting Australia  Tasmania  None


Edited answer

As @jezrael pointed out very well in question comment, my original answer was wrong, because it aligned values by position and could tend to wrong key - value pairs, when some of the values were NaNs. The following code should work on edited data set.

import pandas as pd

df = pd.DataFrame(
 [
     {'ID': 1, 'Name': 'Kohli', 'Address': 'Country: India; State: Delhi; Sector: SE25'},
     {'ID': 2, 'Name': 'Sachin','Address': 'Country: India; State: Mumbai; Sector: SE39'},
     {'ID': 3,'Name': 'Ponting','Address': 'Country: Australia; State: Tasmania'},
     {'ID': 4, 'Name': 'Ponting','Address': 'State: Tasmania; Sector: SE27'}
 ]
)

cols_to_extract = ['Country', 'State', 'Sector']
list_of_rows = df['Address'].str.split(';', 2).tolist()
df[cols_to_extract] = pd.DataFrame(
    [{item.split(': ')[0].strip(): item.split(': ')[1] for item in row} for row in list_of_rows], 
    columns=cols_to_extract)
df = df.rename(columns={'Country': 'ZONE'})


Output would be:

>> df[['ID', 'Name', 'ZONE', 'State', 'Sector']]

ID  Name    ZONE       State     Sector
1   Kohli   India      Delhi     SE25
2   Sachin  India      Mumbai    SE39
3   Ponting Australia  Tasmania  NaN
3   Ponting NaN        Tasmania  SE27  

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复