SQL-like window functions in PANDAS: Row Numbering in Python Pandas Dataframe

前端未结

关注

 5  877

我寻月下人不归

I come from a sql background and I use the following data processing step frequently:

Partition the table of data by one or more fields
For each parti

相关标签:

5条回答

傲寒

2020-11-27 04:07

pandas.lib.fast_zip() can create a tuple array from a list of array. You can use this function to create a tuple series, and then rank it:

values = {'key1' : ['a','a','a','b','a','b'],
          'data1' : [1,2,2,3,3,3],
          'data2' : [1,10,2,3,30,20]}

df = pd.DataFrame(values, index=list("abcdef"))

def rank_multi_columns(df, cols, **kw):
    data = []
    for col in cols:
        if col.startswith("-"):
            flag = -1
            col = col[1:]
        else:
            flag = 1
        data.append(flag*df[col])
    values = pd.lib.fast_zip(data)
    s = pd.Series(values, index=df.index)
    return s.rank(**kw)

rank = df.groupby("key1").apply(lambda df:rank_multi_columns(df, ["data1", "-data2"]))

print rank

the result:

a    1
b    2
c    3
d    2
e    4
f    1
dtype: float64

0 讨论(0)

孤独总比滥情好

2020-11-27 04:15
You can use transform and Rank together Here is an example
```
df = pd.DataFrame({'C1' : ['a','a','a','b','b'],
           'C2' : [1,2,3,4,5]})
df['Rank'] = df.groupby(by=['C1'])['C2'].transform(lambda x: x.rank())
df
```
Have a look at Pandas Rank method for more information
0 讨论(0)
发布评论:

提交评论
- 加载中...

生来不讨喜

2020-11-27 04:21

Use groupby.rank function. Here the working example.

df = pd.DataFrame({'C1':['a', 'a', 'a', 'b', 'b'], 'C2': [1, 2, 3, 4, 5]})
df

C1 C2
a  1
a  2
a  3
b  4
b  5

df["RANK"] = df.groupby("C1")["C2"].rank(method="first", ascending=True)
df

C1 C2 RANK
a  1  1
a  2  2
a  3  3
b  4  1
b  5  2

0 讨论(0)

情歌与酒

2020-11-27 04:24

you can also use sort_values(), groupby() and finally cumcount() + 1:

df['RN'] = df.sort_values(['data1','data2'], ascending=[True,False]) \
             .groupby(['key1']) \
             .cumcount() + 1
print(df)

yields:

   data1  data2 key1  RN
0      1      1    a   1
1      2     10    a   2
2      2      2    a   3
3      3      3    b   1
4      3     30    a   4

PS tested with pandas 0.18

0 讨论(0)

温柔的废话

2020-11-27 04:27

You can do this by using groupby twice along with the rank method:

In [11]: g = df.groupby('key1')

Use the min method argument to give values which share the same data1 the same RN:

In [12]: g['data1'].rank(method='min')
Out[12]:
0    1
1    2
2    2
3    1
4    4
dtype: float64

In [13]: df['RN'] = g['data1'].rank(method='min')

And then groupby these results and add the rank with respect to data2:

In [14]: g1 = df.groupby(['key1', 'RN'])

In [15]: g1['data2'].rank(ascending=False) - 1
Out[15]:
0    0
1    0
2    1
3    0
4    0
dtype: float64

In [16]: df['RN'] += g1['data2'].rank(ascending=False) - 1

In [17]: df
Out[17]:
   data1  data2 key1  RN
0      1      1    a   1
1      2     10    a   2
2      2      2    a   3
3      3      3    b   1
4      3     30    a   4

It feels like there ought to be a native way to do this (there may well be!...).

0 讨论(0)