In my previous question where I asked to drop particular rows in Pandas
With help, I was to drop rows that before 1980. The \'Season\' column (that had the years) were i
You need create new list of filtered DataFrames or reaasign old one:
Notice: Dont use variable list
, because builtins
(python code word).
L = [df[df['Season'].str.split('-').str[0].astype(int) > 1980] for df in L]
Loop version:
output = []
for df in L:
df = df[df['Season'].str.split('-').str[0].astype(int) > 1980]
output.append(df)
If need extract only first integers with length 4:
L = [df, df]
L = [df[df['Season'].str.extract('(\d{4})', expand=False).astype(float) > 1980]
for df in L]
print (L)
[ Season
0 2018-19
1 2017-18, Season
0 2018-19
1 2017-18]
EDIT:
If data have same structure I suggest create one big DataFrame with new column for distinguish cities:
import glob
files = glob.glob('files/*.csv')
dfs = [pd.read_csv(fp).assign(City=os.path.basename(fp).split('.')[0]) for fp in files]
df = pd.concat(dfs, ignore_index=True)
print (df)
Season City
0 2018-19 Boston_Sheet
1 This Boston_Sheet
2 list would go Boston_Sheet
3 till 1960 Boston_Sheet
4 2018-19 Chicago_Sheet
5 2017-18 Chicago_Sheet
6 This Chicago_Sheet
df1 = df[df['Season'].str.extract('(\d{4})', expand=False).astype(float) > 1980]
print (df1)
Season City
0 2018-19 Boston_Sheet
4 2018-19 Chicago_Sheet
5 2017-18 Chicago_Sheet
df2 = df1[df1['City'] == 'Boston_Sheet']
print (df2)
Season City
0 2018-19 Boston_Sheet
df3 = df1[df1['City'] == 'Chicago_Sheet']
print (df3)
Season City
4 2018-19 Chicago_Sheet
5 2017-18 Chicago_Sheet
If need each DataFrame separate, it is possible by dictionary of DataFrames:
import glob
files = glob.glob('files/*.csv')
dfs_dict = {os.path.basename(fp).split('.')[0] : pd.read_csv(fp) for fp in files}
print (dfs_dict)
print (dfs_dict['Boston_Sheet'])
Season
0 2018-19
1 This
2 list would go
3 till 1960
print (dfs_dict['Chicago_Sheet'])
0 2018-19
1 2017-18
2 This
Then processing in dictionary comprehension:
dfs_dict = {k:v[v['Season'].str.extract('(\d{4})', expand=False).astype(float) > 1980]
for k, v in dfs_dict.items()}
print (dfs_dict)
{'Boston_Sheet': Season
0 2018-19, 'Chicago_Sheet': Season
0 2018-19
1 2017-18}
print (dfs_dict['Boston_Sheet'])
Season
0 2018-19
print (dfs_dict['Chicago_Sheet'])
Season
0 2018-19
1 2017-18