I have a data frame as shown below
ID Name Address
1 Kohli Country: India; State: Delhi; Sector: SE25
2 Sachin Country: India; State: Mumbai; Secto
Use list comprehension with dict comprehension for list of dictionaries and pass to DataFrame
constructor:
L = [{k:v for y in x.split('; ') for k, v in dict([y.split(': ')]).items()}
for x in df.pop('Address')]
df = df.join(pd.DataFrame(L, index=df.index))
print (df)
ID Name Country State Sector
0 1 Kohli India Delhi SE25
1 2 Sachin India Mumbai SE39
2 3 Ponting Australia Tasmania NaN
Or use split
with reshape stack
:
df1 = (df.pop('Address')
.str.split('; ', expand=True)
.stack()
.reset_index(level=1, drop=True)
.str.split(': ', expand=True)
.set_index(0, append=True)[1]
.unstack()
)
print (df1)
0 Country Sector State
0 India SE25 Delhi
1 India SE39 Mumbai
2 Australia NaN Tasmania
df = df.join(df1)
print (df)
ID Name Country Sector State
0 1 Kohli India SE25 Delhi
1 2 Sachin India SE39 Mumbai
2 3 Ponting Australia NaN Tasmania
You are almost there
cols = ['ZONE', 'State', 'Sector']
df[cols] = pd.DataFrame(df['ADDRESS'].str.split('; ',2).tolist(),
columns = cols)
for col in cols:
df[col] = df[col].str.split(': ').apply(lambda x:x[1])
Original answer
This can also do the job:
import pandas as pd
df = pd.DataFrame(
[
{'ID': 1, 'Name': 'Kohli', 'Address': 'Country: India; State: Delhi; Sector: SE25'},
{'ID': 2, 'Name': 'Sachin','Address': 'Country: India; State: Mumbai; Sector: SE39'},
{'ID': 3,'Name': 'Ponting','Address': 'Country: Australia; State: Tasmania'}
]
)
cols_to_extract = ['ZONE', 'State', 'Sector']
list_of_rows = df['Address'].str.split(';', 2).tolist()
df[cols_to_extract] = pd.DataFrame(
[[item.split(': ')[1] for item in row] for row in list_of_rows],
columns=cols_to_extract)
Output would be the following:
>> df[['ID', 'Name', 'ZONE', 'State', 'Sector']]
ID Name ZONE State Sector
1 Kohli India Delhi SE25
2 Sachin India Mumbai SE39
3 Ponting Australia Tasmania None
Edited answer
As @jezrael pointed out very well in question comment, my original answer was wrong, because it aligned values by position and could tend to wrong key - value pairs, when some of the values were NaN
s. The following code should work on edited data set.
import pandas as pd
df = pd.DataFrame(
[
{'ID': 1, 'Name': 'Kohli', 'Address': 'Country: India; State: Delhi; Sector: SE25'},
{'ID': 2, 'Name': 'Sachin','Address': 'Country: India; State: Mumbai; Sector: SE39'},
{'ID': 3,'Name': 'Ponting','Address': 'Country: Australia; State: Tasmania'},
{'ID': 4, 'Name': 'Ponting','Address': 'State: Tasmania; Sector: SE27'}
]
)
cols_to_extract = ['Country', 'State', 'Sector']
list_of_rows = df['Address'].str.split(';', 2).tolist()
df[cols_to_extract] = pd.DataFrame(
[{item.split(': ')[0].strip(): item.split(': ')[1] for item in row} for row in list_of_rows],
columns=cols_to_extract)
df = df.rename(columns={'Country': 'ZONE'})
Output would be:
>> df[['ID', 'Name', 'ZONE', 'State', 'Sector']]
ID Name ZONE State Sector
1 Kohli India Delhi SE25
2 Sachin India Mumbai SE39
3 Ponting Australia Tasmania NaN
3 Ponting NaN Tasmania SE27