How to select and delete columns with duplicate name in pandas DataFrame

前端 未结 4 650
难免孤独
难免孤独 2021-02-07 06:03

I have a huge DataFrame, where some columns have the same names. When I try to pick a column that exists twice, (eg del df[\'col name\'] or df2=

4条回答
  •  醉酒成梦
    2021-02-07 06:30

    This is not a good situation to be in. Best would be to create a hierarchical column labeling scheme (Pandas allows for multi-level column labeling or row index labels). Determine what it is that makes the two different columns that have the same name actually different from each other and leverage that to create a hierarchical column index.

    In the mean time, if you know the positional location of the columns in the ordered list of columns (e.g. from dataframe.columns) then you can use many of the explicit indexing features, such as .ix[], or .iloc[] to retrieve values from the column positionally.

    You can also create copies of the columns with new names, such as:

    dataframe["new_name"] = data_frame.ix[:, column_position].values
    

    where column_position references the positional location of the column you're trying to get (not the name).

    These may not work for you if the data is too large, however. So best is to find a way to modify the construction process to get the hierarchical column index.

提交回复
热议问题