How can I strip the whitespace from Pandas DataFrame headers?

前端 未结 3 871
礼貌的吻别
礼貌的吻别 2020-12-02 08:06

I am parsing data from an Excel file that has extra white space in some of the column headings.

When I check the columns of the resulting dataframe, with df.co

相关标签:
3条回答
  • 2020-12-02 08:28

    You can give functions to the rename method. The str.strip() method should do what you want.

    In [5]: df
    Out[5]: 
       Year  Month   Value
    0     1       2      3
    
    [1 rows x 3 columns]
    
    In [6]: df.rename(columns=lambda x: x.strip())
    Out[6]: 
       Year  Month  Value
    0     1      2      3
    
    [1 rows x 3 columns]
    

    Note: that this returns a DataFrame object and it's shown as output on screen, but the changes are not actually set on your columns. To make the changes take place, use:

    1. Use the inplace=True argument [docs]
    df.rename(columns=lambda x: x.strip(), inplace=True)
    
    1. Assign it back to your df variable:
    df = df.rename(columns=lambda x: x.strip())
    
    0 讨论(0)
  • 2020-12-02 08:28

    You can now just call .str.strip on the columns if you're using a recent version:

    In [5]:
    df = pd.DataFrame(columns=['Year', 'Month ', 'Value'])
    print(df.columns.tolist())
    df.columns = df.columns.str.strip()
    df.columns.tolist()
    
    ['Year', 'Month ', 'Value']
    Out[5]:
    ['Year', 'Month', 'Value']
    

    Timings

    In[26]:
    df = pd.DataFrame(columns=[' year', ' month ', ' day', ' asdas ', ' asdas', 'as ', '  sa', ' asdas '])
    df
    Out[26]: 
    Empty DataFrame
    Columns: [ year,  month ,  day,  asdas ,  asdas, as ,   sa,  asdas ]
    
    
    %timeit df.rename(columns=lambda x: x.strip())
    %timeit df.columns.str.strip()
    1000 loops, best of 3: 293 µs per loop
    10000 loops, best of 3: 143 µs per loop
    

    So str.strip is ~2X faster, I expect this to scale better for larger dfs

    0 讨论(0)
  • 2020-12-02 08:38

    If you use CSV format to export from Excel and read as Pandas DataFrame, you can specify:

    skipinitialspace=True
    

    when calling pd.read_csv.

    From the documentation:

    skipinitialspace : bool, default False

    Skip spaces after delimiter.
    
    0 讨论(0)
提交回复
热议问题