I am trying to append an empty row at the end of dataframe but unable to do so, even trying to understand how pandas work with append function and still not getting it.
The code below worked for me.
df.append(pd.Series([np.nan]), ignore_index = True)
Add a new pandas.Series using pandas.DataFrame.append().
If you wish to specify the name (AKA the "index") of the new row, use:
df.append(pandas.Series(name='NameOfNewRow'))
If you don't wish to name the new row, use:
df.append(pandas.Series(), ignore_index=True)
where df
is your pandas.DataFrame.
Assuming your df.index is sorted you can use:
df.loc[df.index.max() + 1] = None
It handles well different indexes and column types.
[EDIT] it works with pd.DatetimeIndex if there is a constant frequency, otherwise we must specify the new index exactly e.g:
df.loc[df.index.max() + pd.Timedelta(milliseconds=1)] = None
long example:
df = pd.DataFrame([[pd.Timestamp(12432423), 23, 'text_field']],
columns=["timestamp", "speed", "text"],
index=pd.DatetimeIndex(start='2111-11-11',freq='ms', periods=1))
df.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1 entries, 2111-11-11 to 2111-11-11
Freq: L
Data columns (total 3 columns):
timestamp 1 non-null datetime64[ns]
speed 1 non-null int64
text 1 non-null object
dtypes: datetime64[ns](1), int64(1), object(1)
memory usage: 32.0+ bytes
df.loc[df.index.max() + 1] = None
df.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2 entries, 2111-11-11 00:00:00 to 2111-11-11 00:00:00.001000
Data columns (total 3 columns):
timestamp 1 non-null datetime64[ns]
speed 1 non-null float64
text 1 non-null object
dtypes: datetime64[ns](1), float64(1), object(1)
memory usage: 64.0+ bytes
df.head()
timestamp speed text
2111-11-11 00:00:00.000 1970-01-01 00:00:00.012432423 23.0 text_field
2111-11-11 00:00:00.001 NaT NaN NaN
Assuming df
is your dataframe,
df_prime = pd.concat([df, pd.DataFrame([[np.nan] * df.shape[1]], columns=df.columns)], ignore_index=True)
where df_prime
equals df
with an additional last row of NaN's.
Note that pd.concat
is slow so if you need this functionality in a loop, it's best to avoid using it.
In that case, assuming your index is incremental, you can use
df.loc[df.iloc[-1].name + 1,:] = np.nan
You can add a new series, and name it at the same time. The name will be the index of the new row, and all the values will automatically be NaN.
df.append(pd.Series(name='Afterthought'))
Append "empty" row to data frame and fill selected cells:
Generate empty data frame (no rows just columns a
and b
):
import pandas as pd
col_names = ["a","b"]
df = pd.DataFrame(columns = col_names)
Append empty row at the end of the data frame:
df = df.append(pd.Series(), ignore_index = True)
Now fill the empty cell at the end (len(df)-1
) of the data frame in column a
:
df.loc[[len(df)-1],'a'] = 123
Result:
a b
0 123 NaN
And of course one can iterate over the rows and fill cells:
col_names = ["a","b"]
df = pd.DataFrame(columns = col_names)
for x in range(0,5):
df = df.append(pd.Series(), ignore_index = True)
df.loc[[len(df)-1],'a'] = 123
Result:
a b
0 123 NaN
1 123 NaN
2 123 NaN
3 123 NaN
4 123 NaN