Creating multiple dataframes with a loop

前端 未结 3 809
闹比i
闹比i 2021-02-04 22:27

This undoubtedly reflects lack of knowledge on my part, but I can\'t find anything online to help. I am very new to programming. I want to load 6 csvs and do a few things to the

相关标签:
3条回答
  • 2021-02-04 23:05

    Use dictionary to store you DataFrames and access them by name

    files = ('data1.csv', 'data2.csv', 'data3.csv', 'data4.csv', 'data5.csv', 'data6.csv')
    dfs_names = ('df1', 'df2', 'df3', 'df4', 'df5', 'df6')
    dfs ={}
    for dfn,file in zip(dfs_names, files):
        dfs[dfn] = pd.read_csv(file)
        print(dfs[dfn].shape)
        print(dfs[dfn].dtypes)
    print(dfs['df3'])
    

    Use list to store you DataFrames and access them by index

    files = ('data1.csv', 'data2.csv', 'data3.csv', 'data4.csv', 'data5.csv', 'data6.csv')
    dfs = []
    for file in  files:
        dfs.append( pd.read_csv(file))
        print(dfs[len(dfs)-1].shape)
        print(dfs[len(dfs)-1].dtypes)
    print (dfs[2])
    

    Do not store intermediate DataFrame, just process them and add to resulting DataFrame.

    files = ('data1.csv', 'data2.csv', 'data3.csv', 'data4.csv', 'data5.csv', 'data6.csv')
    df = pd.DataFrame()
    for file in  files:
        df_n =  pd.read_csv(file)
        print(df_n.shape)
        print(df_n.dtypes)
        # do you want to do
        df = df.append(df_n)
    print (df)
    

    If you will process them differently, then you do not need a general structure to store them. Do it simply independent.

    df = pd.DataFrame()
    def do_general_stuff(d): #here we do common things with DataFrame
        print(d.shape,d.dtypes)
    
    df1 = pd.read_csv("data1.csv")
    # do you want to with df1
    
    do_general_stuff(df1)
    df = df.append(df1)
    del df1
    
    df2 = pd.read_csv("data2.csv")
    # do you want to with df2
    
    do_general_stuff(df2)
    df = df.append(df2)
    del df2
    
    df3 = pd.read_csv("data3.csv")
    # do you want to with df3
    
    do_general_stuff(df3)
    df = df.append(df3)
    del df3
    
    # ... and so on
    

    And one geeky way, but don't ask how it works:)

    from collections import namedtuple
    files = ['data1.csv', 'data2.csv', 'data3.csv', 'data4.csv', 'data5.csv', 'data6.csv']
    
    df = namedtuple('Cdfs',
                    ['df1', 'df2', 'df3', 'df4', 'df5', 'df6']
                   )(*[pd.read_csv(file) for file in files])
    
    for df_n in df._fields:
        print(getattr(df, df_n).shape,getattr(df, df_n).dtypes)
    
    print(df.df3)
    
    0 讨论(0)
  • 2021-02-04 23:09

    A dictionary can store them too

    import pandas as pd
    from pprint import pprint
    
    files = ('doms_stats201610051.csv', 'doms_stats201610052.csv')
    dfsdic = {}
    dfs = ('df1', 'df2')
    for df, file in zip(dfs, files):
      dfsdic[df] = pd.read_csv(file)
      print(dfsdic[df].shape)
      print(dfsdic[df].dtypes)
      print(list(dfsdic[df]))
    
    print(dfsdic['df1'].shape)
    print(dfsdic['df2'].shape)
    
    0 讨论(0)
  • 2021-02-04 23:20

    I think you think your code is doing something that it is not actually doing.

    Specifically, this line: df = pd.read_csv(file)

    You might think that in each iteration through the for loop this line is being executed and modified with df being replaced with a string in dfs and file being replaced with a filename in files. While the latter is true, the former is not.

    Each iteration through the for loop is reading a csv file and storing it in the variable df effectively overwriting the csv file that was read in during the previous for loop. In other words, df in your for loop is not being replaced with the variable names you defined in dfs.

    The key takeaway here is that strings (e.g., 'df1', 'df2', etc.) cannot be substituted and used as variable names when executing code.

    One way to achieve the result you want is store each csv file read by pd.read_csv() in a dictionary, where the key is name of the dataframe (e.g., 'df1', 'df2', etc.) and value is the dataframe returned by pd.read_csv().

    list_of_dfs = {}
    for df, file in zip(dfs, files):
        list_of_dfs[df] = pd.read_csv(file)
        print(list_of_dfs[df].shape)
        print(list_of_dfs[df].dtypes)
        print(list(list_of_dfs[df]))
    

    You can then reference each of your dataframes like this:

    print(list_of_dfs['df1'])
    print(list_of_dfs['df2'])
    

    You can learn more about dictionaries here:

    https://docs.python.org/3.6/tutorial/datastructures.html#dictionaries

    0 讨论(0)
提交回复
热议问题