问题
How can I "concat" a specific column from many Python Pandas dataframes, WHERE another column in each of the many dataframes meets a certain condition (colloquially termed condition "X" here).
In SQL this would be simple using JOIN clause with WHERE df2.Col2 = "X" and df3.Col2 = "X" and df4.col2 = "X"... etc (which can be run dynamically).
In my case, I want to create a big dataframe with all the "Col1"s from each of the many dataframes, but only include the Col1 row values WHERE the corresponding Col2 row value is greater than "0.8". When this condition isn't met, the Col1 value should be "NaN".
Any ideas would be most helpful! Thanks in advance!
回答1:
consider the list
dfs
of pd.DataFrame
s
import pandas as pd
import numpy as np
np.random.seed([3,1415])
dfs = [pd.DataFrame(np.random.rand(10, 2),
columns=['Col1', 'Col2']) for _ in range(5)]
I'll use pd.concat
to join
raw concat
stack values without regard to where it came from
pd.concat([d.Col1.loc[d.Col2.gt(.8)] for d in dfs], ignore_index=True)
0 0.850445
1 0.934829
2 0.879891
3 0.085823
4 0.739635
5 0.700566
6 0.542329
7 0.882029
8 0.496250
9 0.585309
10 0.883372
Name: Col1, dtype: float64
join with source information
use the keys
parameter
pd.concat([d.Col1.loc[d.Col2.gt(.8)] for d in dfs], keys=range(len(dfs)))
0 3 0.850445
5 0.934829
6 0.879891
1 1 0.085823
2 0.739635
7 0.700566
2 4 0.542329
3 3 0.882029
4 0.496250
8 0.585309
4 0 0.883372
Name: Col1, dtype: float64
another approach
use query
pd.concat([d.query('Col2 > .8').Col1 for d in dfs], keys=range(len(dfs)))
0 3 0.850445
5 0.934829
6 0.879891
1 1 0.085823
2 0.739635
7 0.700566
2 4 0.542329
3 3 0.882029
4 0.496250
8 0.585309
4 0 0.883372
Name: Col1, dtype: float64
来源:https://stackoverflow.com/questions/40065641/python-pandas-concat-where-a-condition-is-met