Move column by name to front of table in pandas

后端 未结 9 832
你的背包
你的背包 2020-12-04 07:40

Here is my df:

                             Net   Upper   Lower  Mid  Zsore
Answer option                                                
More than once a da         


        
相关标签:
9条回答
  • 2020-12-04 08:08

    I prefer this solution:

    col = df.pop("Mid")
    df.insert(0, col.name, col)
    

    It's simpler to read and faster than other suggested answers.

    def move_column_inplace(df, col, pos):
        col = df.pop(col)
        df.insert(pos, col.name, col)
    

    Performance assessment:

    For this test, the currently last column is moved to the front in each repetition. In-place methods generally perform better. While citynorman's solution can be made in-place, Ed Chum's method based on .loc and sachinnm's method based on reindex cannot.

    While other methods are generic, citynorman's solution is limited to pos=0. I didn't observe any performance difference between df.loc[cols] and df[cols], which is why I didn't include some other suggestions.

    I tested with python 3.6.8 and pandas 0.24.2 on a MacBook Pro (Mid 2015).

    import numpy as np
    import pandas as pd
    
    n_cols = 11
    df = pd.DataFrame(np.random.randn(200000, n_cols),
                      columns=range(n_cols))
    
    def move_column_inplace(df, col, pos):
        col = df.pop(col)
        df.insert(pos, col.name, col)
    
    def move_to_front_normanius_inplace(df, col):
        move_column_inplace(df, col, 0)
        return df
    
    def move_to_front_chum(df, col):
        cols = list(df)
        cols.insert(0, cols.pop(cols.index(col)))
        return df.loc[:, cols]
    
    def move_to_front_chum_inplace(df, col):
        col = df[col]
        df.drop(col.name, axis=1, inplace=True)
        df.insert(0, col.name, col)
        return df
    
    def move_to_front_elpastor(df, col):
        cols = [col] + [ c for c in df.columns if c!=col ]
        return df[cols] # or df.loc[cols]
    
    def move_to_front_sachinmm(df, col):
        cols = df.columns.tolist()
        cols.insert(0, cols.pop(cols.index(col)))
        df = df.reindex(columns=cols, copy=False)
        return df
    
    def move_to_front_citynorman_inplace(df, col):
        # This approach exploits that reset_index() moves the index
        # at the first position of the data frame.
        df.set_index(col, inplace=True)
        df.reset_index(inplace=True)
        return df
    
    def test(method, df):
        col = np.random.randint(0, n_cols)
        method(df, col)
    
    col = np.random.randint(0, n_cols)
    ret_mine = move_to_front_normanius_inplace(df.copy(), col)
    ret_chum1 = move_to_front_chum(df.copy(), col)
    ret_chum2 = move_to_front_chum_inplace(df.copy(), col)
    ret_elpas = move_to_front_elpastor(df.copy(), col)
    ret_sach = move_to_front_sachinmm(df.copy(), col)
    ret_city = move_to_front_citynorman_inplace(df.copy(), col)
    
    # Assert equivalence of solutions.
    assert(ret_mine.equals(ret_chum1))
    assert(ret_mine.equals(ret_chum2))
    assert(ret_mine.equals(ret_elpas))
    assert(ret_mine.equals(ret_sach))
    assert(ret_mine.equals(ret_city))
    

    Results:

    # For n_cols = 11:
    %timeit test(move_to_front_normanius_inplace, df)
    # 1.05 ms ± 42.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    %timeit test(move_to_front_citynorman_inplace, df)
    # 1.68 ms ± 46.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    %timeit test(move_to_front_sachinmm, df)
    # 3.24 ms ± 96.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    %timeit test(move_to_front_chum, df)
    # 3.84 ms ± 114 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    %timeit test(move_to_front_elpastor, df)
    # 3.85 ms ± 58.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    %timeit test(move_to_front_chum_inplace, df)
    # 9.67 ms ± 101 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    
    
    # For n_cols = 31:
    %timeit test(move_to_front_normanius_inplace, df)
    # 1.26 ms ± 31.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    %timeit test(move_to_front_citynorman_inplace, df)
    # 1.95 ms ± 260 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    %timeit test(move_to_front_sachinmm, df)
    # 10.7 ms ± 348 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    %timeit test(move_to_front_chum, df)
    # 11.5 ms ± 869 µs per loop (mean ± std. dev. of 7 runs, 100 loops each
    %timeit test(move_to_front_elpastor, df)
    # 11.4 ms ± 598 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    %timeit test(move_to_front_chum_inplace, df)
    # 31.4 ms ± 1.89 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
    
    0 讨论(0)
  • 2020-12-04 08:08

    Here is a very simple answer to this.

    Don't forget the two (()) 'brackets' around columns names.Otherwise, it'll give you an error.

    
    # here you can add below line and it should work 
    df = df[list(('Mid','Upper', 'Lower', 'Net','Zsore'))]
    df
    
                                 Mid   Upper   Lower  Net  Zsore
    Answer option                                                
    More than once a day          2   0.22%  -0.12%   0%    65 
    Once a day                    3   0.32%  -0.19%   0%    45
    Several times a week          4   2.45%   1.10%   2%    78
    Once a week                   6   1.63%  -0.40%   1%    65
    
    0 讨论(0)
  • 2020-12-04 08:20

    We can use ix to reorder by passing a list:

    In [27]:
    # get a list of columns
    cols = list(df)
    # move the column to head of list using index, pop and insert
    cols.insert(0, cols.pop(cols.index('Mid')))
    cols
    Out[27]:
    ['Mid', 'Net', 'Upper', 'Lower', 'Zsore']
    In [28]:
    # use ix to reorder
    df = df.ix[:, cols]
    df
    Out[28]:
                          Mid Net  Upper   Lower  Zsore
    Answer_option                                      
    More_than_once_a_day    2  0%  0.22%  -0.12%     65
    Once_a_day              3  0%  0.32%  -0.19%     45
    Several_times_a_week    4  2%  2.45%   1.10%     78
    Once_a_week             6  1%  1.63%  -0.40%     65
    

    Another method is to take a reference to the column and reinsert it at the front:

    In [39]:
    mid = df['Mid']
    df.drop(labels=['Mid'], axis=1,inplace = True)
    df.insert(0, 'Mid', mid)
    df
    Out[39]:
                          Mid Net  Upper   Lower  Zsore
    Answer_option                                      
    More_than_once_a_day    2  0%  0.22%  -0.12%     65
    Once_a_day              3  0%  0.32%  -0.19%     45
    Several_times_a_week    4  2%  2.45%   1.10%     78
    Once_a_week             6  1%  1.63%  -0.40%     65
    

    You can also use loc to achieve the same result as ix will be deprecated in a future version of pandas from 0.20.0 onwards:

    df = df.loc[:, cols]
    
    0 讨论(0)
提交回复
热议问题