Creating multiple dataframes with a loop

前端 未结 3 808
闹比i
闹比i 2021-02-04 22:27

This undoubtedly reflects lack of knowledge on my part, but I can\'t find anything online to help. I am very new to programming. I want to load 6 csvs and do a few things to the

3条回答
  •  情歌与酒
    2021-02-04 23:05

    Use dictionary to store you DataFrames and access them by name

    files = ('data1.csv', 'data2.csv', 'data3.csv', 'data4.csv', 'data5.csv', 'data6.csv')
    dfs_names = ('df1', 'df2', 'df3', 'df4', 'df5', 'df6')
    dfs ={}
    for dfn,file in zip(dfs_names, files):
        dfs[dfn] = pd.read_csv(file)
        print(dfs[dfn].shape)
        print(dfs[dfn].dtypes)
    print(dfs['df3'])
    

    Use list to store you DataFrames and access them by index

    files = ('data1.csv', 'data2.csv', 'data3.csv', 'data4.csv', 'data5.csv', 'data6.csv')
    dfs = []
    for file in  files:
        dfs.append( pd.read_csv(file))
        print(dfs[len(dfs)-1].shape)
        print(dfs[len(dfs)-1].dtypes)
    print (dfs[2])
    

    Do not store intermediate DataFrame, just process them and add to resulting DataFrame.

    files = ('data1.csv', 'data2.csv', 'data3.csv', 'data4.csv', 'data5.csv', 'data6.csv')
    df = pd.DataFrame()
    for file in  files:
        df_n =  pd.read_csv(file)
        print(df_n.shape)
        print(df_n.dtypes)
        # do you want to do
        df = df.append(df_n)
    print (df)
    

    If you will process them differently, then you do not need a general structure to store them. Do it simply independent.

    df = pd.DataFrame()
    def do_general_stuff(d): #here we do common things with DataFrame
        print(d.shape,d.dtypes)
    
    df1 = pd.read_csv("data1.csv")
    # do you want to with df1
    
    do_general_stuff(df1)
    df = df.append(df1)
    del df1
    
    df2 = pd.read_csv("data2.csv")
    # do you want to with df2
    
    do_general_stuff(df2)
    df = df.append(df2)
    del df2
    
    df3 = pd.read_csv("data3.csv")
    # do you want to with df3
    
    do_general_stuff(df3)
    df = df.append(df3)
    del df3
    
    # ... and so on
    

    And one geeky way, but don't ask how it works:)

    from collections import namedtuple
    files = ['data1.csv', 'data2.csv', 'data3.csv', 'data4.csv', 'data5.csv', 'data6.csv']
    
    df = namedtuple('Cdfs',
                    ['df1', 'df2', 'df3', 'df4', 'df5', 'df6']
                   )(*[pd.read_csv(file) for file in files])
    
    for df_n in df._fields:
        print(getattr(df, df_n).shape,getattr(df, df_n).dtypes)
    
    print(df.df3)
    

提交回复
热议问题