问题
inp_file=os.getcwd()
files_comp = pd.read_csv(inp_file,"B00234*.csv", na_values = missing_values, nrows=10)
for f in files_comp:
df_calculated = pd.read_csv(f, na_values = missing_values, nrows=10)
col_length=len(df.columns)-1
Hi folks, How can I read 4 csv files in a for a loop. I am getting an error while reading the CSV in above format. Kindly help me
回答1:
You basically need this:
- Get a list of all target files.
files=os.listdir(path)
and then keep only the filenames that start with your pattern and end with.csv
. You could also improve it using regular expression (by importingre
library for more sophistication, or useglob.glob
).
filesnames = os.listdir(path)
filesnames = [f for f in filesnames if (f.startswith("B00234") and f.lower().endswith(".csv"))]
- Read in files using a for loop:
dfs = list()
for filename in filesnames:
df = pd.read_csv(filename)
dfs.append(df)
Complete Example
We will first make some dummy data and then save that to some .csv
and .txt
files. Some of these .csv
files will begin with "B00234"
and some other would not. We will write the dumy data to these files. And then selectively only read in the .csv
files into a list of dataframes, dfs
.
import pandas as pd
from IPython.display import display
# Define Temporary Output Folder
path = './temp_output'
# Clean Temporary Output Folder
import shutil
reset = True
if os.path.exists(path) and reset:
shutil.rmtree(path, ignore_errors=True)
# Create Content
df0 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
columns=['a', 'b', 'c'])
display(df0)
# Make Path
import os
if not os.path.exists(path):
os.makedirs(path)
else:
print('Path Exists: {}'.format(path))
# Make Filenames
filenames = list()
for i in range(10):
if i<5:
# Create Files starting with "B00234"
filenames.append("B00234_{}.csv".format(i))
filenames.append("B00234_{}.txt".format(i))
else:
# Create Files starting with "B00678"
filenames.append("B00678_{}.csv".format(i))
filenames.append("B00678_{}.txt".format(i))
# Create files
# Make files with extensions: .csv and .txt
# and file names starting
# with and without: "B00234"
for filename in filenames:
fpath = path + '/' + filename
if filename.lower().endswith(".csv"):
df0.to_csv(fpath, index=False)
else:
with open(fpath, 'w') as f:
f.write(df0.to_string())
# Get list of target files
files = os.listdir(path)
files = [f for f in files if (f.startswith("B00234") and f.lower().endswith(".csv"))]
print('\nList of target files: \n\t{}\n'.format(files))
# Read each csv file into a dataframe
dfs = list() # a list of dataframes
for csvfile in files:
fpath = path + '/' + csvfile
print("Reading file: {}".format(csvfile))
df = pd.read_csv(fpath)
dfs.append(df)
The list dfs
should have five elements, where each is dataframe read from the files.
Ouptput:
a b c
0 1 2 3
1 4 5 6
2 7 8 9
List of target files:
['B00234_3.csv', 'B00234_4.csv', 'B00234_0.csv', 'B00234_2.csv', 'B00234_1.csv']
Reading file: B00234_3.csv
Reading file: B00234_4.csv
Reading file: B00234_0.csv
Reading file: B00234_2.csv
Reading file: B00234_1.csv
来源:https://stackoverflow.com/questions/58071982/read-csv-in-a-for-loop-using-pandas