pandas-groupby

Create new column based on condition on other categorical column

不问归期 提交于 2021-02-07 23:56:53
问题 I have a dataframe as shown below Category Value A 10 B 22 A 2 C 30 B 23 B 4 C 8 C 24 A 9 I need to create a Flag column Flag based following conditions If the values of Category A is greater than or equal 5 then Flag=1, else 0 If the values of Category B is greater than or equal 20 then Flag=1, else 0 If the values of Category C is greater than or equal 25 then Flag=1, else 0 Expected output as shown below Category Value Flag A 10 1 B 22 1 A 2 0 C 30 1 B 23 1 B 4 0 C 8 0 C 24 0 A 9 1 I tried

Pandas: Count time interval intersections over a group by

人盡茶涼 提交于 2021-02-07 19:00:45
问题 I have a dataframe of the following form import pandas as pd Out[1]: df = pd.DataFrame({'id':[1,2,3,4,5], 'group':['A','A','A','B','B'], 'start':['2012-08-19','2012-08-22','2013-08-19','2012-08-19','2013-08-19'], 'end':['2012-08-28','2013-09-13','2013-08-19','2012-12-19','2014-08-19']}) id group start end 0 1 A 2012-08-19 2012-08-28 1 2 A 2012-08-22 2013-09-13 2 3 A 2013-08-19 2013-08-21 3 4 B 2012-08-19 2012-12-19 4 5 B 2013-08-19 2014-08-19 For given row in my dataframe I'd like to count

pandas groupby: can I select an agg function by one level of a column MultiIndex?

自闭症网瘾萝莉.ら 提交于 2021-02-07 14:16:14
问题 I have a pandas DataFrame with a MultiIndex of columns: columns=pd.MultiIndex.from_tuples( [(c, i) for c in ['a', 'b'] for i in range(3)]) df = pd.DataFrame(np.random.randn(4, 6), index=[0, 0, 1, 1], columns=columns) print(df) a b 0 1 2 0 1 2 0 0.582804 0.753118 -0.900950 -0.914657 -0.333091 -0.965912 0 0.498002 -0.842624 0.155783 0.559730 -0.300136 -1.211412 1 0.727019 1.522160 1.679025 1.738350 0.593361 0.411907 1 1.253759 -0.806279 -2.177582 -0.099210 -0.839822 -0.211349 I want to group by

Pandas GroupBy and select rows with the minimum value in a specific column

梦想的初衷 提交于 2021-02-07 11:24:26
问题 I am grouping my dataset by column A and then would like to take the minimum value in column B and the corresponding value in column C. data = pd.DataFrame({'A': [1, 2], 'B':[ 2, 4], 'C':[10, 4]}) data A B C 0 1 4 3 1 1 5 4 2 1 2 10 3 2 7 2 4 2 4 4 5 2 6 6 and I would like to get : A B C 0 1 2 10 1 2 4 4 For the moment I am grouping by A, and creating a value that indicates me the rows I will keep in my dataset: a = data.groupby('A').min() a['A'] = a.index to_keep = [str(x[0]) + str(x[1]) for

Pandas: Comparing rows within groups

余生颓废 提交于 2021-02-07 07:26:45
问题 I have a dataframe that is grouped by 'Key'. I need to compare rows within each group to identify whether I want to keep each row of the group or whether I want just one row of a group. In the condition to keep all rows of a group: if there is one row that has the color 'red' and area of '12' and shape of 'circle' AND another row (within the same group) that has a color of 'green' and an area of '13' and shape of 'square', then I want to keep all rows in that group. Otherwise if this scenario

How to perform a cumulative sum of distinct values in pandas dataframe

China☆狼群 提交于 2021-02-07 06:01:50
问题 I have a dataframe like this: id date company ...... 123 2019-01-01 A 224 2019-01-01 B 345 2019-01-01 B 987 2019-01-03 C 334 2019-01-03 C 908 2019-01-04 C 765 2019-01-04 A 554 2019-01-05 A 482 2019-01-05 D and I want to get the cumulative number of unique values over time for the 'company' column. So if a company appears at a later date they are not counted again. My expected output is: date cumulative_count 2019-01-01 2 2019-01-03 3 2019-01-04 3 2019-01-05 4 I've tried: df.groupby(['date'])

How to perform a cumulative sum of distinct values in pandas dataframe

我只是一个虾纸丫 提交于 2021-02-07 06:01:06
问题 I have a dataframe like this: id date company ...... 123 2019-01-01 A 224 2019-01-01 B 345 2019-01-01 B 987 2019-01-03 C 334 2019-01-03 C 908 2019-01-04 C 765 2019-01-04 A 554 2019-01-05 A 482 2019-01-05 D and I want to get the cumulative number of unique values over time for the 'company' column. So if a company appears at a later date they are not counted again. My expected output is: date cumulative_count 2019-01-01 2 2019-01-03 3 2019-01-04 3 2019-01-05 4 I've tried: df.groupby(['date'])

not able to groupby by one level in My dataframe by pandas

|▌冷眼眸甩不掉的悲伤 提交于 2021-02-05 12:29:01
问题 I am importing an excel document and creating a dataframe, df3 . I want to group by only Name . The other uplicate data should reflect as shown in the output. Df3 =pd.read_excel('stats') print (df3) Name ID Month Shift Jon 1 Feb A Jon 1 Jan B Jon 1 Mar C Mike 1 Jan A Mike 1 Jan B Jon 1 Feb C Jon 1 Jan A Output Required: I want to have output like as below in the same format and will save in excel. Please help me on same as I'm stuck here. Note (Month must be ascending order) Will be greatfull

Applying weighted average function to column in pandas groupby object, but weights sum to zero

喜夏-厌秋 提交于 2021-02-05 08:21:11
问题 I am applying different functions to each column in a pandas groupby object. One of these functions is a weighted average, where the weights are the associated values in another column in the DataFrame. However, for a number of my groups the weights sum to zero. Because of this, I get a "Weights sum to zero, can't be normalized" error message when I run the code. Referring to the code below, for the group defined by col1 value x and col2 value y, the sum of the values in col3 in rows with

Applying weighted average function to column in pandas groupby object, but weights sum to zero

ぐ巨炮叔叔 提交于 2021-02-05 08:20:50
问题 I am applying different functions to each column in a pandas groupby object. One of these functions is a weighted average, where the weights are the associated values in another column in the DataFrame. However, for a number of my groups the weights sum to zero. Because of this, I get a "Weights sum to zero, can't be normalized" error message when I run the code. Referring to the code below, for the group defined by col1 value x and col2 value y, the sum of the values in col3 in rows with