In my previous question where I asked to drop particular rows in Pandas
With help, I was to drop rows that before 1980. The \'Season\' column (that had the years) were i
You need create new list of filtered DataFrames or reaasign old one:
Notice: Dont use variable list
, because builtins
(python code word).
L = [df[df['Season'].str.split('-').str[0].astype(int) > 1980] for df in L]
Loop version:
output = []
for df in L:
df = df[df['Season'].str.split('-').str[0].astype(int) > 1980]
output.append(df)
If need extract only first integers with length 4:
L = [df, df]
L = [df[df['Season'].str.extract('(\d{4})', expand=False).astype(float) > 1980]
for df in L]
print (L)
[ Season
0 2018-19
1 2017-18, Season
0 2018-19
1 2017-18]
EDIT:
If data have same structure I suggest create one big DataFrame with new column for distinguish cities:
import glob
files = glob.glob('files/*.csv')
dfs = [pd.read_csv(fp).assign(City=os.path.basename(fp).split('.')[0]) for fp in files]
df = pd.concat(dfs, ignore_index=True)
print (df)
Season City
0 2018-19 Boston_Sheet
1 This Boston_Sheet
2 list would go Boston_Sheet
3 till 1960 Boston_Sheet
4 2018-19 Chicago_Sheet
5 2017-18 Chicago_Sheet
6 This Chicago_Sheet
df1 = df[df['Season'].str.extract('(\d{4})', expand=False).astype(float) > 1980]
print (df1)
Season City
0 2018-19 Boston_Sheet
4 2018-19 Chicago_Sheet
5 2017-18 Chicago_Sheet
df2 = df1[df1['City'] == 'Boston_Sheet']
print (df2)
Season City
0 2018-19 Boston_Sheet
df3 = df1[df1['City'] == 'Chicago_Sheet']
print (df3)
Season City
4 2018-19 Chicago_Sheet
5 2017-18 Chicago_Sheet
If need each DataFrame separate, it is possible by dictionary of DataFrames:
import glob
files = glob.glob('files/*.csv')
dfs_dict = {os.path.basename(fp).split('.')[0] : pd.read_csv(fp) for fp in files}
print (dfs_dict)
print (dfs_dict['Boston_Sheet'])
Season
0 2018-19
1 This
2 list would go
3 till 1960
print (dfs_dict['Chicago_Sheet'])
0 2018-19
1 2017-18
2 This
Then processing in dictionary comprehension:
dfs_dict = {k:v[v['Season'].str.extract('(\d{4})', expand=False).astype(float) > 1980]
for k, v in dfs_dict.items()}
print (dfs_dict)
{'Boston_Sheet': Season
0 2018-19, 'Chicago_Sheet': Season
0 2018-19
1 2017-18}
print (dfs_dict['Boston_Sheet'])
Season
0 2018-19
print (dfs_dict['Chicago_Sheet'])
Season
0 2018-19
1 2017-18
If you want to modify the list in-place :
for index in range(len(df_list)):
df_list[index] = df_list[index].loc[df_list[index]['Season'].str.split('-').str[0].astype(int) > 1980]
When you're looping through the list object itself, it creates a new object at each iteration, that's getting erased at each turn.
If you're looping using the length of the list, and accessing your data through the index, you will modify the list itself, and not the copy you made with for some_copy_item in df_list
.
Minimal example :
arr = [1, 2, 3, 4, 5]
print(arr) # [1, 2, 3, 4, 5]
for number in arr:
number += 1
print(arr) # [1, 2, 3, 4, 5]
for idx in range(len(arr)):
arr[idx] += 1
print(arr) # [2, 3, 4, 5, 6]