This undoubtedly reflects lack of knowledge on my part, but I can\'t find anything online to help. I am very new to programming. I want to load 6 csvs and do a few things to the
I think you think your code is doing something that it is not actually doing.
Specifically, this line: df = pd.read_csv(file)
You might think that in each iteration through the for
loop this line is being executed and modified with df
being replaced with a string in dfs
and file
being replaced with a filename in files
. While the latter is true, the former is not.
Each iteration through the for
loop is reading a csv file and storing it in the variable df
effectively overwriting the csv file that was read in during the previous for
loop. In other words, df
in your for
loop is not being replaced with the variable names you defined in dfs
.
The key takeaway here is that strings (e.g., 'df1'
, 'df2'
, etc.) cannot be substituted and used as variable names when executing code.
One way to achieve the result you want is store each csv file read by pd.read_csv()
in a dictionary, where the key is name of the dataframe (e.g., 'df1'
, 'df2'
, etc.) and value is the dataframe returned by pd.read_csv()
.
list_of_dfs = {}
for df, file in zip(dfs, files):
list_of_dfs[df] = pd.read_csv(file)
print(list_of_dfs[df].shape)
print(list_of_dfs[df].dtypes)
print(list(list_of_dfs[df]))
You can then reference each of your dataframes like this:
print(list_of_dfs['df1'])
print(list_of_dfs['df2'])
You can learn more about dictionaries here:
https://docs.python.org/3.6/tutorial/datastructures.html#dictionaries