Is there a way to copy only the structure (not the data) of a Pandas DataFrame?

后端 未结 8 1938
感动是毒
感动是毒 2020-12-13 08:29

I received a DataFrame from somewhere and want to create another DataFrame with the same number and names of columns and rows (indexes). For example, suppose that the origin

相关标签:
8条回答
  • 2020-12-13 08:40

    I know this is an old question, but I thought I would add my two cents.

    def df_cols_like(df):
        """
        Returns an empty data frame with the same column names and types as df
        """
        df2 = pd.DataFrame({i[0]: pd.Series(dtype=i[1])
                            for i in df.dtypes.iteritems()},
                           columns=df.dtypes.index)
        return df2
    

    This approach centers around the df.dtypes attribute of the input data frame, df, which is a pd.Series. A pd.DataFrame is constructed from a dictionary of empty pd.Series objects named using the input column names with the column order being taken from the input df.

    0 讨论(0)
  • 2020-12-13 08:43

    Let's start with some sample data

    In [1]: import pandas as pd
    
    In [2]: df = pd.DataFrame([[1, 'a'], [2, 'b'], [3, 'c']],
       ...:                   columns=['num', 'char'])
    
    In [3]: df
    Out[3]: 
       num char
    0    1    a
    1    2    b
    2    3    c
    
    In [4]: df.dtypes
    Out[4]: 
    num      int64
    char    object
    dtype: object
    

    Now let's use a simple DataFrame initialization using the columns of the original DataFrame but providing no data:

    In [5]: empty_copy_1 = pd.DataFrame(data=None, columns=df.columns)
    
    In [6]: empty_copy_1
    Out[6]: 
    Empty DataFrame
    Columns: [num, char]
    Index: []
    
    In [7]: empty_copy_1.dtypes
    Out[7]: 
    num     object
    char    object
    dtype: object
    

    As you can see, the column data types are not the same as in our original DataFrame.

    So, if you want to preserve the column dtype...

    If you want to preserve the column data types you need to construct the DataFrame one Series at a time

    In [8]: empty_copy_2 = pd.DataFrame.from_items([
       ...:     (name, pd.Series(data=None, dtype=series.dtype))
       ...:     for name, series in df.iteritems()])
    
    In [9]: empty_copy_2
    Out[9]: 
    Empty DataFrame
    Columns: [num, char]
    Index: []
    
    In [10]: empty_copy_2.dtypes
    Out[10]: 
    num      int64
    char    object
    dtype: object
    
    0 讨论(0)
  • 2020-12-13 08:43

    Not exactly answering this question, but a similar one for people coming here via a search engine

    My case was creating a copy of the data frame without data and without index. One can achieve this by doing the following. This will maintain the dtypes of the columns.

    empty_copy = df.drop(df.index)
    
    0 讨论(0)
  • 2020-12-13 08:46

    A simple alternative -- first copy the basic structure or indexes and columns with datatype from the original dataframe (df1) into df2

    df2 = df1.iloc[0:0]
    

    Then fill your dataframe with empty rows -- pseudocode that will need to be adapted to better match your actual structure:

    s = pd.Series([Nan,Nan,Nan], index=['Col1', 'Col2', 'Col3'])
    

    loop through the rows in df1

    df2 = df2.append(s)
    
    0 讨论(0)
  • 2020-12-13 08:47

    In version 0.18 of pandas, the DataFrame constructor has no options for creating a dataframe like another dataframe with NaN instead of the values.

    The code you use df2 = pd.DataFrame(columns=df1.columns, index=df1.index) is the most logical way, the only way to improve on it is to spell out even more what you are doing is to add data=None, so that other coders directly see that you intentionally leave out the data from this new DataFrame you are creating.

    TLDR: So my suggestion is:

    Explicit is better than implicit

    df2 = pd.DataFrame(data=None, columns=df1.columns, index=df1.index)
    

    Very much like yours, but more spelled out.

    0 讨论(0)
  • 2020-12-13 08:47

    You can simply mask by notna() i.e

    df1 = pd.DataFrame([[11, 12], [21, 22]], columns=['c1', 'c2'], index=['i1', 'i2'])
    
    df2 = df1.mask(df1.notna())
    
        c1  c2
    i1 NaN NaN
    i2 NaN NaN
    
    0 讨论(0)
提交回复
热议问题