How to add an empty column to a dataframe?

后端 未结 11 597
自闭症患者
自闭症患者 2020-11-28 01:06

What\'s the easiest way to add an empty column to a pandas DataFrame object? The best I\'ve stumbled upon is something like

df[\'foo\'] = df.ap         


        
相关标签:
11条回答
  • 2020-11-28 01:16

    @emunsing's answer is really cool for adding multiple columns, but I couldn't get it to work for me in python 2.7. Instead, I found this works:

    mydf = mydf.reindex(columns = np.append( mydf.columns.values, ['newcol1','newcol2'])
    
    0 讨论(0)
  • 2020-11-28 01:16

    Sorry for I did not explain my answer really well at beginning. There is another way to add an new column to an existing dataframe. 1st step, make a new empty data frame (with all the columns in your data frame, plus a new or few columns you want to add) called df_temp 2nd step, combine the df_temp and your data frame.

    df_temp = pd.DataFrame(columns=(df_null.columns.tolist() + ['empty']))
    df = pd.concat([df_temp, df])
    

    It might be the best solution, but it is another way to think about this question.

    the reason of I am using this method is because I am get this warning all the time:

    : SettingWithCopyWarning: 
    A value is trying to be set on a copy of a slice from a DataFrame.
    Try using .loc[row_indexer,col_indexer] = value instead
    
    See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
      df["empty1"], df["empty2"] = [np.nan, ""]
    

    great I found the way to disable the Warning

    pd.options.mode.chained_assignment = None 
    
    0 讨论(0)
  • 2020-11-28 01:21

    One can use df.insert(index_to_insert_at, column_header, init_value) to insert new column at a specific index.

    cost_tbl.insert(1, "col_name", "") 
    

    The above statement would insert an empty Column after the first column.

    0 讨论(0)
  • 2020-11-28 01:23

    If I understand correctly, assignment should fill:

    >>> import numpy as np
    >>> import pandas as pd
    >>> df = pd.DataFrame({"A": [1,2,3], "B": [2,3,4]})
    >>> df
       A  B
    0  1  2
    1  2  3
    2  3  4
    >>> df["C"] = ""
    >>> df["D"] = np.nan
    >>> df
       A  B C   D
    0  1  2   NaN
    1  2  3   NaN
    2  3  4   NaN
    
    0 讨论(0)
  • 2020-11-28 01:28

    To add to DSM's answer and building on this associated question, I'd split the approach into two cases:

    • Adding a single column: Just assign empty values to the new columns, e.g. df['C'] = np.nan

    • Adding multiple columns: I'd suggest using the .reindex(columns=[...]) method of pandas to add the new columns to the dataframe's column index. This also works for adding multiple new rows with .reindex(rows=[...]). Note that newer versions of Pandas (v>0.20) allow you to specify an axis keyword rather than explicitly assigning to columns or rows.

    Here is an example adding multiple columns:

    mydf = mydf.reindex(columns = mydf.columns.tolist() + ['newcol1','newcol2'])
    

    or

    mydf = mydf.reindex(mydf.columns.tolist() + ['newcol1','newcol2'], axis=1)  # version > 0.20.0
    

    You can also always concatenate a new (empty) dataframe to the existing dataframe, but that doesn't feel as pythonic to me :)

    0 讨论(0)
  • 2020-11-28 01:32

    Starting with v0.16.0, DF.assign() could be used to assign new columns (single/multiple) to a DF. These columns get inserted in alphabetical order at the end of the DF.

    This becomes advantageous compared to simple assignment in cases wherein you want to perform a series of chained operations directly on the returned dataframe.

    Consider the same DF sample demonstrated by @DSM:

    df = pd.DataFrame({"A": [1,2,3], "B": [2,3,4]})
    df
    Out[18]:
       A  B
    0  1  2
    1  2  3
    2  3  4
    
    df.assign(C="",D=np.nan)
    Out[21]:
       A  B C   D
    0  1  2   NaN
    1  2  3   NaN
    2  3  4   NaN
    

    Note that this returns a copy with all the previous columns along with the newly created ones. In order for the original DF to be modified accordingly, use it like : df = df.assign(...) as it does not support inplace operation currently.

    0 讨论(0)
提交回复
热议问题