Python Pandas Concat “WHERE” a Condition is met

问题

How can I "concat" a specific column from many Python Pandas dataframes, WHERE another column in each of the many dataframes meets a certain condition (colloquially termed condition "X" here).

In SQL this would be simple using JOIN clause with WHERE df2.Col2 = "X" and df3.Col2 = "X" and df4.col2 = "X"... etc (which can be run dynamically).

In my case, I want to create a big dataframe with all the "Col1"s from each of the many dataframes, but only include the Col1 row values WHERE the corresponding Col2 row value is greater than "0.8". When this condition isn't met, the Col1 value should be "NaN".

Any ideas would be most helpful! Thanks in advance!

回答1:

consider the list dfs of pd.DataFrames

import pandas as pd
import numpy as np


np.random.seed([3,1415])
dfs = [pd.DataFrame(np.random.rand(10, 2),
                    columns=['Col1', 'Col2']) for _ in range(5)]

I'll use pd.concat to join

raw concat
stack values without regard to where it came from

pd.concat([d.Col1.loc[d.Col2.gt(.8)] for d in dfs], ignore_index=True)

0     0.850445
1     0.934829
2     0.879891
3     0.085823
4     0.739635
5     0.700566
6     0.542329
7     0.882029
8     0.496250
9     0.585309
10    0.883372
Name: Col1, dtype: float64

join with source information
use the keys parameter

pd.concat([d.Col1.loc[d.Col2.gt(.8)] for d in dfs], keys=range(len(dfs)))

0  3    0.850445
   5    0.934829
   6    0.879891
1  1    0.085823
   2    0.739635
   7    0.700566
2  4    0.542329
3  3    0.882029
   4    0.496250
   8    0.585309
4  0    0.883372
Name: Col1, dtype: float64

another approach
use query

pd.concat([d.query('Col2 > .8').Col1 for d in dfs], keys=range(len(dfs)))

0  3    0.850445
   5    0.934829
   6    0.879891
1  1    0.085823
   2    0.739635
   7    0.700566
2  4    0.542329
3  3    0.882029
   4    0.496250
   8    0.585309
4  0    0.883372
Name: Col1, dtype: float64

来源：https://stackoverflow.com/questions/40065641/python-pandas-concat-where-a-condition-is-met

标签

python

pandas

join

where

concat