What\'s the easiest way to add an empty column to a pandas DataFrame
object? The best I\'ve stumbled upon is something like
df[\'foo\'] = df.ap
@emunsing's answer is really cool for adding multiple columns, but I couldn't get it to work for me in python 2.7. Instead, I found this works:
mydf = mydf.reindex(columns = np.append( mydf.columns.values, ['newcol1','newcol2'])
Sorry for I did not explain my answer really well at beginning. There is another way to add an new column to an existing dataframe. 1st step, make a new empty data frame (with all the columns in your data frame, plus a new or few columns you want to add) called df_temp 2nd step, combine the df_temp and your data frame.
df_temp = pd.DataFrame(columns=(df_null.columns.tolist() + ['empty']))
df = pd.concat([df_temp, df])
It might be the best solution, but it is another way to think about this question.
the reason of I am using this method is because I am get this warning all the time:
: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df["empty1"], df["empty2"] = [np.nan, ""]
great I found the way to disable the Warning
pd.options.mode.chained_assignment = None
One can use df.insert(index_to_insert_at, column_header, init_value)
to insert new column at a specific index.
cost_tbl.insert(1, "col_name", "")
The above statement would insert an empty Column after the first column.
If I understand correctly, assignment should fill:
>>> import numpy as np
>>> import pandas as pd
>>> df = pd.DataFrame({"A": [1,2,3], "B": [2,3,4]})
>>> df
A B
0 1 2
1 2 3
2 3 4
>>> df["C"] = ""
>>> df["D"] = np.nan
>>> df
A B C D
0 1 2 NaN
1 2 3 NaN
2 3 4 NaN
To add to DSM's answer and building on this associated question, I'd split the approach into two cases:
Adding a single column: Just assign empty values to the new columns, e.g. df['C'] = np.nan
Adding multiple columns: I'd suggest using the .reindex(columns=[...])
method of pandas to add the new columns to the dataframe's column index. This also works for adding multiple new rows with .reindex(rows=[...])
. Note that newer versions of Pandas (v>0.20) allow you to specify an axis
keyword rather than explicitly assigning to columns
or rows
.
Here is an example adding multiple columns:
mydf = mydf.reindex(columns = mydf.columns.tolist() + ['newcol1','newcol2'])
or
mydf = mydf.reindex(mydf.columns.tolist() + ['newcol1','newcol2'], axis=1) # version > 0.20.0
You can also always concatenate a new (empty) dataframe to the existing dataframe, but that doesn't feel as pythonic to me :)
Starting with v0.16.0
, DF.assign() could be used to assign new columns (single/multiple) to a DF
. These columns get inserted in alphabetical order at the end of the DF
.
This becomes advantageous compared to simple assignment in cases wherein you want to perform a series of chained operations directly on the returned dataframe.
Consider the same DF
sample demonstrated by @DSM:
df = pd.DataFrame({"A": [1,2,3], "B": [2,3,4]})
df
Out[18]:
A B
0 1 2
1 2 3
2 3 4
df.assign(C="",D=np.nan)
Out[21]:
A B C D
0 1 2 NaN
1 2 3 NaN
2 3 4 NaN
Note that this returns a copy with all the previous columns along with the newly created ones. In order for the original DF
to be modified accordingly, use it like : df = df.assign(...)
as it does not support inplace
operation currently.