Is there a way to copy only the structure (not the data) of a Pandas DataFrame?

后端未结

关注

 8  1938

I received a DataFrame from somewhere and want to create another DataFrame with the same number and names of columns and rows (indexes). For example, suppose that the origin

Let's start with some sample data

In [1]: import pandas as pd

In [2]: df = pd.DataFrame([[1, 'a'], [2, 'b'], [3, 'c']],
   ...:                   columns=['num', 'char'])

In [3]: df
Out[3]: 
   num char
0    1    a
1    2    b
2    3    c

In [4]: df.dtypes
Out[4]: 
num      int64
char    object
dtype: object

Now let's use a simple `DataFrame` initialization using the columns of the original `DataFrame` but providing no data:

In [5]: empty_copy_1 = pd.DataFrame(data=None, columns=df.columns)

In [6]: empty_copy_1
Out[6]: 
Empty DataFrame
Columns: [num, char]
Index: []

In [7]: empty_copy_1.dtypes
Out[7]: 
num     object
char    object
dtype: object

As you can see, the column data types are not the same as in our original DataFrame.

So, if you want to preserve the column `dtype`...

If you want to preserve the column data types you need to construct the DataFrame one Series at a time

In [8]: empty_copy_2 = pd.DataFrame.from_items([
   ...:     (name, pd.Series(data=None, dtype=series.dtype))
   ...:     for name, series in df.iteritems()])

In [9]: empty_copy_2
Out[9]: 
Empty DataFrame
Columns: [num, char]
Index: []

In [10]: empty_copy_2.dtypes
Out[10]: 
num      int64
char    object
dtype: object

0 讨论(0)

终归单人心

2020-12-13 08:43
Not exactly answering this question, but a similar one for people coming here via a search engine

My case was creating a copy of the data frame without data and without index. One can achieve this by doing the following. This will maintain the dtypes of the columns.
```
empty_copy = df.drop(df.index)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
暖寄归人

2020-12-13 08:46
A simple alternative -- first copy the basic structure or indexes and columns with datatype from the original dataframe (df1) into df2
```
df2 = df1.iloc[0:0]
```
Then fill your dataframe with empty rows -- pseudocode that will need to be adapted to better match your actual structure:
```
s = pd.Series([Nan,Nan,Nan], index=['Col1', 'Col2', 'Col3'])
```
loop through the rows in df1
```
df2 = df2.append(s)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
面向向阳花

2020-12-13 08:47
In version 0.18 of pandas, the DataFrame constructor has no options for creating a dataframe like another dataframe with NaN instead of the values.

The code you use df2 = pd.DataFrame(columns=df1.columns, index=df1.index) is the most logical way, the only way to improve on it is to spell out even more what you are doing is to add data=None, so that other coders directly see that you intentionally leave out the data from this new DataFrame you are creating.

TLDR: So my suggestion is:

Explicit is better than implicit
```
df2 = pd.DataFrame(data=None, columns=df1.columns, index=df1.index)
```
Very much like yours, but more spelled out.
0 讨论(0)
发布评论:

提交评论
- 加载中...

不知归路

2020-12-13 08:47

You can simply mask by notna() i.e

df1 = pd.DataFrame([[11, 12], [21, 22]], columns=['c1', 'c2'], index=['i1', 'i2'])

df2 = df1.mask(df1.notna())

    c1  c2
i1 NaN NaN
i2 NaN NaN

0 讨论(0)

1 2 下一页

Is there a way to copy only the structure (not the data) of a Pandas DataFrame?

Let's start with some sample data

Now let's use a simple DataFrame initialization using the columns of the original DataFrame but providing no data:

So, if you want to preserve the column dtype...

Not exactly answering this question, but a similar one for people coming here via a search engine

Explicit is better than implicit

Now let's use a simple `DataFrame` initialization using the columns of the original `DataFrame` but providing no data:

So, if you want to preserve the column `dtype`...