Pandas Split DataFrame using row index

后端 未结 4 1245
日久生厌
日久生厌 2021-01-18 15:08

I want to split dataframe by uneven number of rows using row index.

The below code:

groups = df.groupby((np.arange(len(df.index))/l[1]).astype(int)         


        
相关标签:
4条回答
  • 2021-01-18 15:40

    You could use list comprehension with a little modications your list, l, first.

    print(df)
    
       a  b  c
    0  1  1  1
    1  2  2  2
    2  3  3  3
    3  4  4  4
    4  5  5  5
    5  6  6  6
    6  7  7  7
    7  8  8  8
    
    
    l = [2,5,7]
    l_mod = [0] + l + [max(l)+1]
    
    list_of_dfs = [df.iloc[l_mod[n]:l_mod[n+1]] for n in range(len(l_mod)-1)]
    

    Output:

    list_of_dfs[0]
    
       a  b  c
    0  1  1  1
    1  2  2  2
    
    list_of_dfs[1]
    
       a  b  c
    2  3  3  3
    3  4  4  4
    4  5  5  5
    
    list_of_dfs[2]
    
       a  b  c
    5  6  6  6
    6  7  7  7
    
    list_of_dfs[3]
    
       a  b  c
    7  8  8  8
    
    0 讨论(0)
  • 2021-01-18 15:48

    You can create an array to use for indexing via NumPy:

    import pandas as pd, numpy as np
    
    df = pd.DataFrame(np.arange(24).reshape((8, 3)), columns=list('abc'))
    
    L = [2, 5, 7]
    idx = np.cumsum(np.in1d(np.arange(len(df.index)), L))
    
    for _, chunk in df.groupby(idx):
        print(chunk, '\n')
    
       a  b  c
    0  0  1  2
    1  3  4  5 
    
        a   b   c
    2   6   7   8
    3   9  10  11
    4  12  13  14 
    
        a   b   c
    5  15  16  17
    6  18  19  20 
    
        a   b   c
    7  21  22  23 
    

    Instead of defining a new variable for each dataframe, you can use a dictionary:

    d = dict(tuple(df.groupby(idx)))
    
    print(d[1])  # print second groupby value
    
        a   b   c
    2   6   7   8
    3   9  10  11
    4  12  13  14
    
    0 讨论(0)
  • 2021-01-18 15:52

    I think this is what you need:

    df = pd.DataFrame({'a': np.arange(1, 8),
                      'b': np.arange(1, 8),
                      'c': np.arange(1, 8)})
    df.head()
        a   b   c
    0   1   1   1
    1   2   2   2
    2   3   3   3
    3   4   4   4
    4   5   5   5
    5   6   6   6
    6   7   7   7
    
    last_check = 0
    dfs = []
    for ind in [2, 5, 7]:
        dfs.append(df.loc[last_check:ind-1])
        last_check = ind
    

    Although list comprehension are much more efficient than a for loop, the last_check is necessary if you don't have a pattern in your list of indices.

    dfs[0]
    
        a   b   c
    0   1   1   1
    1   2   2   2
    
    dfs[2]
    
        a   b   c
    5   6   6   6
    6   7   7   7
    
    0 讨论(0)
  • 2021-01-18 15:57

    I think this is you are looking for.,

    l = [2, 5, 7]
    dfs=[]
    i=0
    for val in l:
        if i==0:
            temp=df.iloc[:val]
            dfs.append(temp)
        elif i==len(l):
            temp=df.iloc[val]
            dfs.append(temp)        
        else:
            temp=df.iloc[l[i-1]:val]
            dfs.append(temp)
        i+=1
    

    Output:

       a  b  c
    0  1  1  1
    1  2  2  2
       a  b  c
    2  3  3  3
    3  4  4  4
    4  5  5  5
       a  b  c
    5  6  6  6
    6  7  7  7
    

    Another Solution:

    l = [2, 5, 7]
    t= np.arange(l[-1])
    l.reverse()
    for val in l:
        t[:val]=val
    temp=pd.DataFrame(t)
    temp=pd.concat([df,temp],axis=1)
    for u,v in temp.groupby(0):
        print v
    

    Output:

       a  b  c  0
    0  1  1  1  2
    1  2  2  2  2
       a  b  c  0
    2  3  3  3  5
    3  4  4  4  5
    4  5  5  5  5
       a  b  c  0
    5  6  6  6  7
    6  7  7  7  7
    
    0 讨论(0)
提交回复
热议问题