Renaming columns in pandas

前端 未结 27 2406
野性不改
野性不改 2020-11-21 07:05

I have a DataFrame using pandas and column labels that I need to edit to replace the original column labels.

I\'d like to change the column names in a DataFrame

相关标签:
27条回答
  • 2020-11-21 07:31

    I know this question and answer has been chewed to death. But I referred to it for inspiration for one of the problem I was having . I was able to solve it using bits and pieces from different answers hence providing my response in case anyone needs it.

    My method is generic wherein you can add additional delimiters by comma separating delimiters= variable and future-proof it.

    Working Code:

    import pandas as pd
    import re
    
    
    df = pd.DataFrame({'$a':[1,2], '$b': [3,4],'$c':[5,6], '$d': [7,8], '$e': [9,10]})
    
    delimiters = '$'
    matchPattern = '|'.join(map(re.escape, delimiters))
    df.columns = [re.split(matchPattern, i)[1] for i in df.columns ]
    

    Output:

    >>> df
       $a  $b  $c  $d  $e
    0   1   3   5   7   9
    1   2   4   6   8  10
    
    >>> df
       a  b  c  d   e
    0  1  3  5  7   9
    1  2  4  6  8  10
    
    0 讨论(0)
  • 2020-11-21 07:33

    Column names vs Names of Series

    I would like to explain a bit what happens behind the scenes.

    Dataframes are a set of Series.

    Series in turn are an extension of a numpy.array

    numpy.arrays have a property .name

    This is the name of the series. It is seldom that pandas respects this attribute, but it lingers in places and can be used to hack some pandas behaviors.

    Naming the list of columns

    A lot of answers here talks about the df.columns attribute being a list when in fact it is a Series. This means it has a .name attribute.

    This is what happens if you decide to fill in the name of the columns Series:

    df.columns = ['column_one', 'column_two']
    df.columns.names = ['name of the list of columns']
    df.index.names = ['name of the index']
    
    name of the list of columns     column_one  column_two
    name of the index       
    0                                    4           1
    1                                    5           2
    2                                    6           3
    

    Note that the name of the index always comes one column lower.

    Artifacts that linger

    The .name attribute lingers on sometimes. If you set df.columns = ['one', 'two'] then the df.one.name will be 'one'.

    If you set df.one.name = 'three' then df.columns will still give you ['one', 'two'], and df.one.name will give you 'three'

    BUT

    pd.DataFrame(df.one) will return

        three
    0       1
    1       2
    2       3
    

    Because pandas reuses the .name of the already defined Series.

    Multi level column names

    Pandas has ways of doing multi layered column names. There is not so much magic involved but I wanted to cover this in my answer too since I don't see anyone picking up on this here.

        |one            |
        |one      |two  |
    0   |  4      |  1  |
    1   |  5      |  2  |
    2   |  6      |  3  |
    

    This is easily achievable by setting columns to lists, like this:

    df.columns = [['one', 'one'], ['one', 'two']]
    
    0 讨论(0)
  • 2020-11-21 07:33

    In addition to the solution already provided, you can replace all the columns while you are reading the file. We can use names and header=0 to do that.

    First, we create a list of the names that we like to use as our column names:

    import pandas as pd
    
    ufo_cols = ['city', 'color reported', 'shape reported', 'state', 'time']
    ufo.columns = ufo_cols
    
    ufo = pd.read_csv('link to the file you are using', names = ufo_cols, header = 0)
    

    In this case, all the column names will be replaced with the names you have in your list.

    0 讨论(0)
  • 2020-11-21 07:35

    One line or Pipeline solutions

    I'll focus on two things:

    1. OP clearly states

      I have the edited column names stored it in a list, but I don't know how to replace the column names.

      I do not want to solve the problem of how to replace '$' or strip the first character off of each column header. OP has already done this step. Instead I want to focus on replacing the existing columns object with a new one given a list of replacement column names.

    2. df.columns = new where new is the list of new columns names is as simple as it gets. The drawback of this approach is that it requires editing the existing dataframe's columns attribute and it isn't done inline. I'll show a few ways to perform this via pipelining without editing the existing dataframe.


    Setup 1
    To focus on the need to rename of replace column names with a pre-existing list, I'll create a new sample dataframe df with initial column names and unrelated new column names.

    df = pd.DataFrame({'Jack': [1, 2], 'Mahesh': [3, 4], 'Xin': [5, 6]})
    new = ['x098', 'y765', 'z432']
    
    df
    
       Jack  Mahesh  Xin
    0     1       3    5
    1     2       4    6
    

    Solution 1
    pd.DataFrame.rename

    It has been said already that if you had a dictionary mapping the old column names to new column names, you could use pd.DataFrame.rename.

    d = {'Jack': 'x098', 'Mahesh': 'y765', 'Xin': 'z432'}
    df.rename(columns=d)
    
       x098  y765  z432
    0     1     3     5
    1     2     4     6
    

    However, you can easily create that dictionary and include it in the call to rename. The following takes advantage of the fact that when iterating over df, we iterate over each column name.

    # given just a list of new column names
    df.rename(columns=dict(zip(df, new)))
    
       x098  y765  z432
    0     1     3     5
    1     2     4     6
    

    This works great if your original column names are unique. But if they are not, then this breaks down.


    Setup 2
    non-unique columns

    df = pd.DataFrame(
        [[1, 3, 5], [2, 4, 6]],
        columns=['Mahesh', 'Mahesh', 'Xin']
    )
    new = ['x098', 'y765', 'z432']
    
    df
    
       Mahesh  Mahesh  Xin
    0       1       3    5
    1       2       4    6
    

    Solution 2
    pd.concat using the keys argument

    First, notice what happens when we attempt to use solution 1:

    df.rename(columns=dict(zip(df, new)))
    
       y765  y765  z432
    0     1     3     5
    1     2     4     6
    

    We didn't map the new list as the column names. We ended up repeating y765. Instead, we can use the keys argument of the pd.concat function while iterating through the columns of df.

    pd.concat([c for _, c in df.items()], axis=1, keys=new) 
    
       x098  y765  z432
    0     1     3     5
    1     2     4     6
    

    Solution 3
    Reconstruct. This should only be used if you have a single dtype for all columns. Otherwise, you'll end up with dtype object for all columns and converting them back requires more dictionary work.

    Single dtype

    pd.DataFrame(df.values, df.index, new)
    
       x098  y765  z432
    0     1     3     5
    1     2     4     6
    

    Mixed dtype

    pd.DataFrame(df.values, df.index, new).astype(dict(zip(new, df.dtypes)))
    
       x098  y765  z432
    0     1     3     5
    1     2     4     6
    

    Solution 4
    This is a gimmicky trick with transpose and set_index. pd.DataFrame.set_index allows us to set an index inline but there is no corresponding set_columns. So we can transpose, then set_index, and transpose back. However, the same single dtype versus mixed dtype caveat from solution 3 applies here.

    Single dtype

    df.T.set_index(np.asarray(new)).T
    
       x098  y765  z432
    0     1     3     5
    1     2     4     6
    

    Mixed dtype

    df.T.set_index(np.asarray(new)).T.astype(dict(zip(new, df.dtypes)))
    
       x098  y765  z432
    0     1     3     5
    1     2     4     6
    

    Solution 5
    Use a lambda in pd.DataFrame.rename that cycles through each element of new
    In this solution, we pass a lambda that takes x but then ignores it. It also takes a y but doesn't expect it. Instead, an iterator is given as a default value and I can then use that to cycle through one at a time without regard to what the value of x is.

    df.rename(columns=lambda x, y=iter(new): next(y))
    
       x098  y765  z432
    0     1     3     5
    1     2     4     6
    

    And as pointed out to me by the folks in sopython chat, if I add a * in between x and y, I can protect my y variable. Though, in this context I don't believe it needs protecting. It is still worth mentioning.

    df.rename(columns=lambda x, *, y=iter(new): next(y))
    
       x098  y765  z432
    0     1     3     5
    1     2     4     6
    
    0 讨论(0)
  • 2020-11-21 07:35

    Renaming columns in pandas is an easy task.

    df.rename(columns={'$a': 'a', '$b': 'b', '$c': 'c', '$d': 'd', '$e': 'e'}, inplace=True)
    
    0 讨论(0)
  • 2020-11-21 07:36
    df = pd.DataFrame({'$a': [1], '$b': [1], '$c': [1], '$d': [1], '$e': [1]})
    

    If your new list of columns is in the same order as the existing columns, the assignment is simple:

    new_cols = ['a', 'b', 'c', 'd', 'e']
    df.columns = new_cols
    >>> df
       a  b  c  d  e
    0  1  1  1  1  1
    

    If you had a dictionary keyed on old column names to new column names, you could do the following:

    d = {'$a': 'a', '$b': 'b', '$c': 'c', '$d': 'd', '$e': 'e'}
    df.columns = df.columns.map(lambda col: d[col])  # Or `.map(d.get)` as pointed out by @PiRSquared.
    >>> df
       a  b  c  d  e
    0  1  1  1  1  1
    

    If you don't have a list or dictionary mapping, you could strip the leading $ symbol via a list comprehension:

    df.columns = [col[1:] if col[0] == '$' else col for col in df]
    
    0 讨论(0)
提交回复
热议问题