I have a Pandas Series of lists of strings:
0 [slim, waist, man]
1 [slim, waistline]
2
Flattening and unflattening can be done using this function
def flatten(df, col):
col_flat = pd.DataFrame([[i, x] for i, y in df[col].apply(list).iteritems() for x in y], columns=['I', col])
col_flat = col_flat.set_index('I')
df = df.drop(col, 1)
df = df.merge(col_flat, left_index=True, right_index=True)
return df
Unflattening:
def unflatten(flat_df, col):
flat_df.groupby(level=0).agg({**{c:'first' for c in flat_df.columns}, col: list})
After unflattening we get the same dataframe except column order:
(df.sort_index(axis=1) == unflatten(flatten(df)).sort_index(axis=1)).all().all()
>> True
You are basically just trying to flatten a nested list here.
You should just be able to iterate over the elements of the series:
slist =[]
for x in series:
slist.extend(x)
or a slicker (but harder to understand) list comprehension:
slist = [st for row in s for st in row]
Here's a simple method using only pandas functions:
import pandas as pd
s = pd.Series([
['slim', 'waist', 'man'],
['slim', 'waistline'],
['santa']])
Then
s.apply(pd.Series).stack().reset_index(drop=True)
gives the desired output. In some cases you might want to save the original index and add a second level to index the nested elements, e.g.
0 0 slim
1 waist
2 man
1 0 slim
1 waistline
2 0 santa
If this is what you want, just omit .reset_index(drop=True)
from the chain.
If your pandas
version is too old to use series_name.explode()
, this should work too:
from itertools import chain
pd.Series(
chain.from_iterable(
value
for i, value
in series_name.iteritems()
)
)
You can try using itertools.chain to simply flatten the lists:
In [70]: from itertools import chain
In [71]: import pandas as pnd
In [72]: s = pnd.Series([['slim', 'waist', 'man'], ['slim', 'waistline'], ['santa']])
In [73]: s
Out[73]:
0 [slim, waist, man]
1 [slim, waistline]
2 [santa]
dtype: object
In [74]: new_s = pnd.Series(list(chain(*s.values)))
In [75]: new_s
Out[75]:
0 slim
1 waist
2 man
3 slim
4 waistline
5 santa
dtype: object
You can use the list concatenation operator like below -
lst1 = ['hello','world']
lst2 = ['bye','world']
newlst = lst1 + lst2
print(newlst)
>> ['hello','world','bye','world']
Or you can use list.extend()
function as below -
lst1 = ['hello','world']
lst2 = ['bye','world']
lst1.extend(lst2)
print(lst1)
>> ['hello', 'world', 'bye', 'world']
Benefits of using extend
function is that it can work on multiple types, where as concatenation
operator will only work if both LHS and RHS are lists.
Other examples of extend
function -
lst1.extend(('Bye','Bye'))
>> ['hello', 'world', 'Bye', 'Bye']