How to combine multiple rows of strings into one using pandas?

前端 未结 4 1483
情话喂你
情话喂你 2021-02-01 19:12

I have a DataFrame with multiple rows. Is there any way in which they can be combined to form one string?

For example:

     words
0    I, will, hereby
1         


        
相关标签:
4条回答
  • 2021-02-01 19:32

    How about traditional python's join? And, it's faster.

    In [209]: ', '.join(df.words)
    Out[209]: 'I, will, hereby, am, gonna, going, far, to, do, this'
    

    Timings in Dec, 2016 on pandas 0.18.1

    In [214]: df.shape
    Out[214]: (6, 1)
    
    In [215]: %timeit df.words.str.cat(sep=', ')
    10000 loops, best of 3: 72.2 µs per loop
    
    In [216]: %timeit ', '.join(df.words)
    100000 loops, best of 3: 14 µs per loop
    
    In [217]: df = pd.concat([df]*10000, ignore_index=True)
    
    In [218]: df.shape
    Out[218]: (60000, 1)
    
    In [219]: %timeit df.words.str.cat(sep=', ')
    100 loops, best of 3: 5.2 ms per loop
    
    In [220]: %timeit ', '.join(df.words)
    100 loops, best of 3: 1.91 ms per loop
    
    0 讨论(0)
  • 2021-02-01 19:37

    If you have a DataFrame rather than a Series and you want to concatenate values (I think text values only) from different rows based on another column as a 'group by' key, then you can use the .agg method from the class DataFrameGroupBy. Here is a link to the API manual.

    Sample code tested with Pandas v0.18.1:

    import pandas as pd
    
    df = pd.DataFrame({
        'category': ['A'] * 3 + ['B'] * 2,
        'name': ['A1', 'A2', 'A3', 'B1', 'B2'],
        'num': range(1, 6)
    })
    
    df.groupby('category').agg({
        'name': lambda x: ', '.join(x),
        'num': lambda x: x.max()
    })
    
    0 讨论(0)
  • 2021-02-01 19:41

    You can use str.cat to join the strings in each row. For a Series or column s, write:

    >>> s.str.cat(sep=', ')
    'I, will, hereby, am, gonna, going, far, to, do, this'
    
    0 讨论(0)
  • 2021-02-01 19:41

    For anyone want to know how to combine multiple rows of strings in dataframe,
    I provide a method that can concatenate strings within a 'window-like' range of near rows as follows:

    # add columns based on 'windows-like' rows
    df['windows_key_list'] = pd.Series(df['key'].str.cat([df.groupby(['bycol']).shift(-i)['key'] for i in range(1, windows_size)], sep = ' ')
    

    Note: This can't be reached by groupby, because we don't mean the same id of rows, just near rows.

    0 讨论(0)
提交回复
热议问题