问题
I have the following code:
dfs = []
for f in files_xlsx:
city_name = pd.read_excel(f, "1. City", nrows=1, parse_cols="C", header=None, skiprows=1)
country_code = pd.read_excel(f, "1. City", nrows=1, parse_cols="C", header=None, skiprows=2)
data = pd.read_excel(f, "1. City", parse_cols="B:J", header=None, skiprows=8)
data['City name'] = city_name.iat[0,0]
data['City code'] = country_code.iat[0,0]
dfs.append(data)
df = pd.concat(dfs, ignore_index=True)
I would like to run the loop if and only if each Excel file contains the values 91, 92, 93, 94, 95, 96, 97
in the following location (column and row combination) D8, E8, F8, G8, H8, I8, J8
. The loop should only run if this condition is met across all files.
All my Excel files have the same format in theory. In practice, they often don't so I want to run a check before appending them. It would be great if the code could tell me which file does not meet the above condition. Thank you.
Edit:
In[1]: data
Out[1]:
0 1 2 3 4 5 6 7 8 City name City code
0 x x x x x x x x x x
1 x x x x x x x x x x
2 x x x x x x x x x x
回答1:
Consider building a helper data frame that resembles Excel data but with values matched to specification (row 8 and columns D-J with those specific values). Then in loop iteratively merge against this helper data frame and if it returns a match, conditionally append to your list of data frames.
NOTE: Adjust columns to actual column names, replacing list('ABCDEFGHIJ')
with list of names, ['col1','col2','col3',...]
, in DataFrame()
and merge()
calls.
check_df = pd.concat([pd.DataFrame([[0]*9 for _ in range(7)],
columns=['Heading code','Heading name','91','92','93','94','95','96','97']),
pd.DataFrame([[0,0,91,92,93,94,95,96,97]],
columns=['Heading code','Heading name','91','92','93','94','95','96','97'])],
ignore_index=True).reset_index()
print(check_df)
# index Heading code Heading name 91 92 93 94 95 96 97
# 0 0 0 0 0 0 0 0 0 0 0
# 1 1 0 0 0 0 0 0 0 0 0
# 2 2 0 0 0 0 0 0 0 0 0
# 3 3 0 0 0 0 0 0 0 0 0
# 4 4 0 0 0 0 0 0 0 0 0
# 5 5 0 0 0 0 0 0 0 0 0
# 6 6 0 0 0 0 0 0 0 0 0
# 7 7 0 0 91 92 93 94 95 96 97
dfs = []
for f in files_xlsx:
city_name = pd.read_excel(f, "1. City", nrows=1, parse_cols="C", header=None, skiprows=1)
country_code = pd.read_excel(f, "1. City", nrows=1, parse_cols="C", header=None, skiprows=2)
data = pd.read_excel(f, "1. City", parse_cols="B:J", header=None, skiprows=8)\
.assign(city_name=city_name.iat[0,0], city_code=country_code.iat[0,0])
data.columns = ['Heading code','Heading name','91','92','93','94','95','96','97','City name','City code']
# INNER JOIN MERGE ON INDEX AND COLS, D-J
tmp = data.reset_index().merge(check_df, on=['index','91','92','93','94','95','96','97'])
# CONDITIONALLY APPEND
if len(tmp) > 0:
dfs.append(data)
df = pd.concat(dfs, ignore_index=True)
Below demonstrates with random data
np.random.seed(82118)
# LIST OF FIVE DATAFRAMES (TO RESEMBLE EXCEL DFs)
rand_dfs = [pd.DataFrame([np.random.randint(1, 100, 9) for _ in range(10)],
columns=['Heading code','Heading name','91','92','93','94','95','96','97'])
for _ in range(5)]
# UPDATE TWO DATAFRAMES EACH WITH 10 COLS TO INCLUDE MATCHING 8TH ROW
rand_dfs[2].loc[7] = [0, 0, 91, 92, 93, 94, 95, 96, 97]
rand_dfs[4].loc[7] = [0, 0, 91, 92, 93, 94, 95, 96, 97]
final_dfs = []
for d in rand_dfs:
tmp = d.reset_index().merge(check_df, on=['index','91','92','93','94','95','96','97'])
if len(tmp) > 0:
final_dfs.append(d)
final_df = pd.concat(final_dfs, ignore_index=True)
Output (see 8th and 17th rows with matching criteria)
print(final_df)
# Heading code Heading name 91 92 93 94 95 96 97
# 0 53 98 67 8 86 33 65 56 62
# 1 61 9 40 14 18 9 53 30 24
# 2 89 88 80 91 91 49 8 39 84
# 3 15 99 49 92 63 96 11 95 29
# 4 13 62 82 12 34 92 54 29 47
# 5 44 18 67 61 52 71 52 25 12
# 6 56 25 52 10 82 12 59 63 15
# 7 0 0 91 92 93 94 95 96 97
# 8 51 50 27 38 34 11 57 92 3
# 9 49 99 46 87 46 5 63 24 8
# 10 31 62 8 23 19 66 60 10 66
# 11 51 98 30 44 45 39 32 74 82
# 12 88 19 54 28 38 71 3 31 34
# 13 58 13 89 17 96 35 12 52 85
# 14 93 67 13 13 28 43 24 7 4
# 15 34 26 73 20 44 37 18 17 22
# 16 59 1 99 9 11 6 4 99 95
# 17 0 0 91 92 93 94 95 96 97
# 18 88 6 23 20 35 26 37 56 51
# 19 21 67 19 63 77 98 41 9 22
来源:https://stackoverflow.com/questions/51938430/python-pandas-loop-to-append-excel-files-only-if-each-excel-file-contains-certa