Splitting dataframe into multiple dataframes

后端 未结 11 1168
南方客
南方客 2020-11-22 01:16

I have a very large dataframe (around 1 million rows) with data from an experiment (60 respondents).

I would like to split the dataframe into 60 dataframes (a datafra

相关标签:
11条回答
  • 2020-11-22 01:17

    Firstly your approach is inefficient because the appending to the list on a row by basis will be slow as it has to periodically grow the list when there is insufficient space for the new entry, list comprehensions are better in this respect as the size is determined up front and allocated once.

    However, I think fundamentally your approach is a little wasteful as you have a dataframe already so why create a new one for each of these users?

    I would sort the dataframe by column 'name', set the index to be this and if required not drop the column.

    Then generate a list of all the unique entries and then you can perform a lookup using these entries and crucially if you only querying the data, use the selection criteria to return a view on the dataframe without incurring a costly data copy.

    Use pandas.DataFrame.sort_values and pandas.DataFrame.set_index:

    # sort the dataframe
    df.sort_values(by='name', axis=1, inplace=True)
    
    # set the index to be this and don't drop
    df.set_index(keys=['name'], drop=False,inplace=True)
    
    # get a list of names
    names=df['name'].unique().tolist()
    
    # now we can perform a lookup on a 'view' of the dataframe
    joe = df.loc[df.name=='joe']
    
    # now you can query all 'joes'
    
    0 讨论(0)
  • 2020-11-22 01:24

    You can convert groupby object to tuples and then to dict:

    df = pd.DataFrame({'Name':list('aabbef'),
                       'A':[4,5,4,5,5,4],
                       'B':[7,8,9,4,2,3],
                       'C':[1,3,5,7,1,0]}, columns = ['Name','A','B','C'])
    
    print (df)
      Name  A  B  C
    0    a  4  7  1
    1    a  5  8  3
    2    b  4  9  5
    3    b  5  4  7
    4    e  5  2  1
    5    f  4  3  0
    
    d = dict(tuple(df.groupby('Name')))
    print (d)
    {'b':   Name  A  B  C
    2    b  4  9  5
    3    b  5  4  7, 'e':   Name  A  B  C
    4    e  5  2  1, 'a':   Name  A  B  C
    0    a  4  7  1
    1    a  5  8  3, 'f':   Name  A  B  C
    5    f  4  3  0}
    
    print (d['a'])
      Name  A  B  C
    0    a  4  7  1
    1    a  5  8  3
    

    It is not recommended, but possible create DataFrames by groups:

    for i, g in df.groupby('Name'):
        globals()['df_' + str(i)] =  g
    
    print (df_a)
      Name  A  B  C
    0    a  4  7  1
    1    a  5  8  3
    
    0 讨论(0)
  • 2020-11-22 01:24

    Groupby can helps you:

    grouped = data.groupby(['name'])
    

    Then you can work with each group like with a dataframe for each participant. And DataFrameGroupBy object methods such as (apply, transform, aggregate, head, first, last) return a DataFrame object.

    Or you can make list from grouped and get all DataFrame's by index:

    l_grouped = list(grouped)
    

    l_grouped[0][1] - DataFrame for first group with first name.

    0 讨论(0)
  • 2020-11-22 01:28

    Easy:

    [v for k, v in df.groupby('name')]
    
    0 讨论(0)
  • 2020-11-22 01:31

    Can I ask why not just do it by slicing the data frame. Something like

    #create some data with Names column
    data = pd.DataFrame({'Names': ['Joe', 'John', 'Jasper', 'Jez'] *4, 'Ob1' : np.random.rand(16), 'Ob2' : np.random.rand(16)})
    
    #create unique list of names
    UniqueNames = data.Names.unique()
    
    #create a data frame dictionary to store your data frames
    DataFrameDict = {elem : pd.DataFrame for elem in UniqueNames}
    
    for key in DataFrameDict.keys():
        DataFrameDict[key] = data[:][data.Names == key]
    

    Hey presto you have a dictionary of data frames just as (I think) you want them. Need to access one? Just enter

    DataFrameDict['Joe']
    

    Hope that helps

    0 讨论(0)
  • 2020-11-22 01:33

    In addition to Gusev Slava's answer, you might want to use groupby's groups:

    {key: df.loc[value] for key, value in df.groupby("name").groups.items()}
    

    This will yield a dictionary with the keys you have grouped by, pointing to the corresponding partitions. The advantage is that the keys are maintained and don't vanish in the list index.

    0 讨论(0)
提交回复
热议问题