Pandas Series of lists to one series

后端 未结 9 2158
野性不改
野性不改 2020-12-28 13:18

I have a Pandas Series of lists of strings:

0                           [slim, waist, man]
1                                [slim, waistline]
2                       


        
相关标签:
9条回答
  • 2020-12-28 13:33

    Flattening and unflattening can be done using this function

    def flatten(df, col):
        col_flat = pd.DataFrame([[i, x] for i, y in df[col].apply(list).iteritems() for x in y], columns=['I', col])
        col_flat = col_flat.set_index('I')
        df = df.drop(col, 1)
        df = df.merge(col_flat, left_index=True, right_index=True)
    
        return df
    

    Unflattening:

    def unflatten(flat_df, col):
        flat_df.groupby(level=0).agg({**{c:'first' for c in flat_df.columns}, col: list})
    

    After unflattening we get the same dataframe except column order:

    (df.sort_index(axis=1) == unflatten(flatten(df)).sort_index(axis=1)).all().all()
    >> True
    
    0 讨论(0)
  • 2020-12-28 13:34

    You are basically just trying to flatten a nested list here.

    You should just be able to iterate over the elements of the series:

    slist =[]
    for x in series:
        slist.extend(x)
    

    or a slicker (but harder to understand) list comprehension:

    slist = [st for row in s for st in row]
    
    0 讨论(0)
  • 2020-12-28 13:36

    Here's a simple method using only pandas functions:

    import pandas as pd
    
    s = pd.Series([
        ['slim', 'waist', 'man'],
        ['slim', 'waistline'],
        ['santa']])
    

    Then

    s.apply(pd.Series).stack().reset_index(drop=True)
    

    gives the desired output. In some cases you might want to save the original index and add a second level to index the nested elements, e.g.

    0  0         slim
       1        waist
       2          man
    1  0         slim
       1    waistline
    2  0        santa
    

    If this is what you want, just omit .reset_index(drop=True) from the chain.

    0 讨论(0)
  • 2020-12-28 13:37

    If your pandas version is too old to use series_name.explode(), this should work too:

    from itertools import chain
    
    pd.Series(
        chain.from_iterable(
            value
            for i, value
            in series_name.iteritems()
        )
    )
    
    0 讨论(0)
  • 2020-12-28 13:38

    You can try using itertools.chain to simply flatten the lists:

    In [70]: from itertools import chain
    In [71]: import pandas as pnd
    In [72]: s = pnd.Series([['slim', 'waist', 'man'], ['slim', 'waistline'], ['santa']])
    In [73]: s
    Out[73]: 
    0    [slim, waist, man]
    1     [slim, waistline]
    2               [santa]
    dtype: object
    In [74]: new_s = pnd.Series(list(chain(*s.values)))
    In [75]: new_s
    Out[75]: 
    0         slim
    1        waist
    2          man
    3         slim
    4    waistline
    5        santa
    dtype: object
    
    0 讨论(0)
  • 2020-12-28 13:39

    You can use the list concatenation operator like below -

    lst1 = ['hello','world']
    lst2 = ['bye','world']
    newlst = lst1 + lst2
    print(newlst)
    >> ['hello','world','bye','world']
    

    Or you can use list.extend() function as below -

    lst1 = ['hello','world']
    lst2 = ['bye','world']
    lst1.extend(lst2)
    print(lst1)
    >> ['hello', 'world', 'bye', 'world']
    

    Benefits of using extend function is that it can work on multiple types, where as concatenation operator will only work if both LHS and RHS are lists.

    Other examples of extend function -

    lst1.extend(('Bye','Bye'))
    >> ['hello', 'world', 'Bye', 'Bye']
    
    0 讨论(0)
提交回复
热议问题