How to select and delete columns with duplicate name in pandas DataFrame

前端 未结 4 647
难免孤独
难免孤独 2021-02-07 06:03

I have a huge DataFrame, where some columns have the same names. When I try to pick a column that exists twice, (eg del df[\'col name\'] or df2=

4条回答
  •  清酒与你
    2021-02-07 06:12

    The following function removes columns with dublicate names and keeps only one. Not exactly what you asked for, but you can use snips of it to solve your problem. The idea is to return the index numbers and then you can adress the specific column indices directly. The indices are unique while the column names aren't

    def remove_multiples(df,varname):
        """
        makes a copy of the first column of all columns with the same name,
        deletes all columns with that name and inserts the first column again
        """
        from copy import deepcopy
        dfout = deepcopy(df)
        if (varname in dfout.columns):
            tmp = dfout.iloc[:, min([i for i,x in enumerate(dfout.columns == varname) if x])]
            del dfout[varname]
            dfout[varname] = tmp
        return dfout
    

    where

    [i for i,x in enumerate(dfout.columns == varname) if x]
    

    is the part you need

提交回复
热议问题