Iterate through a list of dataframes to drop particular rows Pandas

后端未结

关注

 2  1825

抹茶落季

In my previous question where I asked to drop particular rows in Pandas

With help, I was to drop rows that before 1980. The \'Season\' column (that had the years) were i

相关标签:

2条回答

别那么骄傲

2021-01-25 05:49

You need create new list of filtered DataFrames or reaasign old one:

Notice: Dont use variable list, because builtins (python code word).

L = [df[df['Season'].str.split('-').str[0].astype(int) > 1980] for df in L]

Loop version:

output = []
for df in L:
   df = df[df['Season'].str.split('-').str[0].astype(int) > 1980]
   output.append(df)

If need extract only first integers with length 4:

L = [df, df]
L = [df[df['Season'].str.extract('(\d{4})', expand=False).astype(float) > 1980] 
          for df in L]

print (L)
[    Season
0  2018-19
1  2017-18,     Season
0  2018-19
1  2017-18]

EDIT:

If data have same structure I suggest create one big DataFrame with new column for distinguish cities:

import glob

files = glob.glob('files/*.csv')
dfs = [pd.read_csv(fp).assign(City=os.path.basename(fp).split('.')[0]) for fp in files]
df = pd.concat(dfs, ignore_index=True)
print (df)
          Season           City
0        2018-19   Boston_Sheet
1           This   Boston_Sheet
2  list would go   Boston_Sheet
3      till 1960   Boston_Sheet
4        2018-19  Chicago_Sheet
5        2017-18  Chicago_Sheet
6           This  Chicago_Sheet

df1 = df[df['Season'].str.extract('(\d{4})', expand=False).astype(float) > 1980]
print (df1)
     Season           City
0   2018-19   Boston_Sheet
4   2018-19  Chicago_Sheet
5   2017-18  Chicago_Sheet

df2 = df1[df1['City'] == 'Boston_Sheet']
print (df2)
    Season          City
0  2018-19  Boston_Sheet

df3 = df1[df1['City'] == 'Chicago_Sheet']
print (df3)
     Season           City
4   2018-19  Chicago_Sheet
5   2017-18  Chicago_Sheet

If need each DataFrame separate, it is possible by dictionary of DataFrames:

import glob

files = glob.glob('files/*.csv')
dfs_dict = {os.path.basename(fp).split('.')[0] : pd.read_csv(fp) for fp in files}

print (dfs_dict)

print (dfs_dict['Boston_Sheet'])
          Season
0        2018-19
1           This
2  list would go
3      till 1960

print (dfs_dict['Chicago_Sheet'])
0   2018-19
1   2017-18
2      This

Then processing in dictionary comprehension:

dfs_dict = {k:v[v['Season'].str.extract('(\d{4})', expand=False).astype(float) > 1980] 
                 for k, v in dfs_dict.items()}
print (dfs_dict)
{'Boston_Sheet':     Season
0  2018-19, 'Chicago_Sheet':      Season
0   2018-19
1   2017-18}

print (dfs_dict['Boston_Sheet'])
    Season
0  2018-19

print (dfs_dict['Chicago_Sheet'])
     Season
0   2018-19
1   2017-18

0 讨论(0)

灰色年华

2021-01-25 06:01
If you want to modify the list in-place :
```
for index in range(len(df_list)):
    df_list[index] = df_list[index].loc[df_list[index]['Season'].str.split('-').str[0].astype(int) > 1980]
```
When you're looping through the list object itself, it creates a new object at each iteration, that's getting erased at each turn.

If you're looping using the length of the list, and accessing your data through the index, you will modify the list itself, and not the copy you made with for some_copy_item in df_list.

Minimal example :
```
    arr = [1, 2, 3, 4, 5]
    print(arr) # [1, 2, 3, 4, 5]

    for number in arr:
        number += 1
    print(arr) # [1, 2, 3, 4, 5]

    for idx in range(len(arr)):
        arr[idx] += 1
    print(arr) # [2, 3, 4, 5, 6]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...