How to remove multiple headers from dataframe and keeps just the first python

前端 未结 3 788
梦谈多话
梦谈多话 2021-01-28 18:45

I\'m working with a csv file that presents multiple headers, all are repeated like in this example:

1                     2     3   4
0            POSITION_T  P         


        
相关标签:
3条回答
  • 2021-01-28 18:59
    past_data=pd.read_csv("book.csv")
    
    past_data = past_data[past_data.LAT.astype(str).str.contains('LAT') == False]
    
    print(past_data)
    
    1. Replace the CSV (here: book.csv)
    2. Replace your variable names (here: past_data)
    3. Replace all the LAT with your any of your column name
    4. That's All/ your multiple headers will be removed
    0 讨论(0)
  • 2021-01-28 19:25

    This is not ideal! The best way to deal with this would be to handle it in the file parsing.

    mask = df.iloc[:, 0] == 'POSITION_T'
    d1 = df[~mask]
    d1.columns = df.loc[mask.idxmax].values
    
    d1 = d1.apply(pd.to_numeric, errors='ignore')
    d1
    
       POSITION_T  PROB  ID
    1                      
    1       2.385   2.0   1
    3       3.074   6.0   3
    4       6.731   8.0   4
    7      12.508   2.0   1
    8      12.932   4.0   2
    9      12.985   4.0   2
    
    0 讨论(0)
  • 2021-01-28 19:26

    Filtering out by field value:

    df = pd.read_table('yourfile.csv', header=None, delim_whitespace=True, skiprows=1)
    df.columns = ['0','POSITION_T','PROB','ID']
    del df['0']
    
    # filtering out the rows with `POSITION_T` value in corresponding column
    df = df[df.POSITION_T.str.contains('POSITION_T') == False]
    
    print(df)
    

    The output:

      POSITION_T PROB ID
    1      2.385  2.0  1
    3      3.074  6.0  3
    4      6.731  8.0  4
    6     12.508  2.0  1
    7     12.932  4.0  2
    8     12.985  4.0  2
    
    0 讨论(0)
提交回复
热议问题