Panda .loc or .iloc to select the columns from a dataset

前端 未结 2 1558
暗喜
暗喜 2020-12-19 15:08

I have been trying to select a particular set of columns from a dataset for all the rows. I tried something like below.

train_features = train_df.loc[,[0,4,5         


        
相关标签:
2条回答
  • 2020-12-19 15:14

    You can access the column values via the the underlying numpy array

    Consider the dataframe df

    df = pd.DataFrame(np.random.randint(10, size=(5, 20)))
    df
    

    You can slice the underlying array

    slc = [0,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18]
    df.values[:, slc]
    
    array([[1, 3, 9, 8, 3, 2, 1, 6, 6, 0, 3, 9, 8, 5, 9, 9],
           [8, 0, 2, 3, 7, 8, 9, 2, 7, 2, 1, 3, 2, 5, 4, 9],
           [1, 1, 9, 3, 5, 8, 8, 8, 8, 4, 8, 0, 5, 4, 9, 0],
           [6, 3, 1, 8, 0, 3, 7, 9, 9, 0, 9, 7, 6, 1, 4, 8],
           [3, 2, 3, 3, 9, 8, 3, 8, 3, 4, 1, 6, 4, 1, 6, 4]])
    

    Or you can reconstruct a new dataframe from this slice

    pd.DataFrame(df.values[:, slc], df.index, df.columns[slc])
    

    This is not as clean and intuitive as

    df.iloc[:, slc]
    

    You could also use slc to slice the df.columns object and pass that to df.loc

    df.loc[:, df.columns[slc]]
    

    0 讨论(0)
  • 2020-12-19 15:19

    If need select by positions use iloc:

    train_features = train_df.iloc[:, [0,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18]]
    print (train_features)
       age  default  housing  loan  equities  contact  duration  campaign  pdays  \
    0   56        1        1     1         1        0       261         1    999   
    1   37        1        0     1         1        0       226         1    999   
    2   56        1        1     0         1        0       307         1    999   
    
       previous  poutcome  emp.var.rate  cons.price.idx  cons.conf.idx  euribor3m  \
    0         0         2           1.1          93.994          -36.4   3.299552   
    1         0         2           1.1          93.994          -36.4   0.743751   
    2         0         2           1.1          93.994          -36.4   1.282652   
    
       nr.employed  
    0         5191  
    1         5191  
    2         5191  
    

    Another solution is drop unnecessary columns:

    cols= ['job','marital','education','y']
    train_features = train_df.drop(cols, axis=1)
    print (train_features)
       age  default  housing  loan  equities  contact  duration  campaign  pdays  \
    0   56        1        1     1         1        0       261         1    999   
    1   37        1        0     1         1        0       226         1    999   
    2   56        1        1     0         1        0       307         1    999   
    
       previous  poutcome  emp.var.rate  cons.price.idx  cons.conf.idx  euribor3m  \
    0         0         2           1.1          93.994          -36.4   3.299552   
    1         0         2           1.1          93.994          -36.4   0.743751   
    2         0         2           1.1          93.994          -36.4   1.282652   
    
       nr.employed  
    0         5191  
    1         5191  
    2         5191  
    
    0 讨论(0)
提交回复
热议问题