I received a DataFrame from somewhere and want to create another DataFrame with the same number and names of columns and rows (indexes). For example, suppose that the origin
I know this is an old question, but I thought I would add my two cents.
def df_cols_like(df):
"""
Returns an empty data frame with the same column names and types as df
"""
df2 = pd.DataFrame({i[0]: pd.Series(dtype=i[1])
for i in df.dtypes.iteritems()},
columns=df.dtypes.index)
return df2
This approach centers around the df.dtypes
attribute of the input data frame, df
, which is a pd.Series
. A pd.DataFrame
is constructed from a dictionary of empty pd.Series
objects named using the input column names with the column order being taken from the input df
.
In [1]: import pandas as pd
In [2]: df = pd.DataFrame([[1, 'a'], [2, 'b'], [3, 'c']],
...: columns=['num', 'char'])
In [3]: df
Out[3]:
num char
0 1 a
1 2 b
2 3 c
In [4]: df.dtypes
Out[4]:
num int64
char object
dtype: object
DataFrame
initialization using the columns of the original DataFrame
but providing no data:In [5]: empty_copy_1 = pd.DataFrame(data=None, columns=df.columns)
In [6]: empty_copy_1
Out[6]:
Empty DataFrame
Columns: [num, char]
Index: []
In [7]: empty_copy_1.dtypes
Out[7]:
num object
char object
dtype: object
As you can see, the column data types are not the same as in our original DataFrame
.
dtype
...If you want to preserve the column data types you need to construct the DataFrame
one Series
at a time
In [8]: empty_copy_2 = pd.DataFrame.from_items([
...: (name, pd.Series(data=None, dtype=series.dtype))
...: for name, series in df.iteritems()])
In [9]: empty_copy_2
Out[9]:
Empty DataFrame
Columns: [num, char]
Index: []
In [10]: empty_copy_2.dtypes
Out[10]:
num int64
char object
dtype: object
My case was creating a copy of the data frame without data and without index. One can achieve this by doing the following. This will maintain the dtypes of the columns.
empty_copy = df.drop(df.index)
A simple alternative -- first copy the basic structure or indexes and columns with datatype from the original dataframe (df1) into df2
df2 = df1.iloc[0:0]
Then fill your dataframe with empty rows -- pseudocode that will need to be adapted to better match your actual structure:
s = pd.Series([Nan,Nan,Nan], index=['Col1', 'Col2', 'Col3'])
loop through the rows in df1
df2 = df2.append(s)
In version 0.18 of pandas, the DataFrame constructor has no options for creating a dataframe like another dataframe with NaN instead of the values.
The code you use df2 = pd.DataFrame(columns=df1.columns, index=df1.index)
is the most logical way, the only way to improve on it is to spell out even more what you are doing is to add data=None
, so that other coders directly see that you intentionally leave out the data from this new DataFrame you are creating.
TLDR: So my suggestion is:
df2 = pd.DataFrame(data=None, columns=df1.columns, index=df1.index)
Very much like yours, but more spelled out.
You can simply mask
by notna()
i.e
df1 = pd.DataFrame([[11, 12], [21, 22]], columns=['c1', 'c2'], index=['i1', 'i2'])
df2 = df1.mask(df1.notna())
c1 c2
i1 NaN NaN
i2 NaN NaN