pandas-groupby | 易学教程

Create new column based on condition on other categorical column

阅读更多关于 Create new column based on condition on other categorical column

问题 I have a dataframe as shown below Category Value A 10 B 22 A 2 C 30 B 23 B 4 C 8 C 24 A 9 I need to create a Flag column Flag based following conditions If the values of Category A is greater than or equal 5 then Flag=1, else 0 If the values of Category B is greater than or equal 20 then Flag=1, else 0 If the values of Category C is greater than or equal 25 then Flag=1, else 0 Expected output as shown below Category Value Flag A 10 1 B 22 1 A 2 0 C 30 1 B 23 1 B 4 0 C 8 0 C 24 0 A 9 1 I tried

Pandas: Count time interval intersections over a group by

阅读更多关于 Pandas: Count time interval intersections over a group by

问题 I have a dataframe of the following form import pandas as pd Out[1]: df = pd.DataFrame({'id':[1,2,3,4,5], 'group':['A','A','A','B','B'], 'start':['2012-08-19','2012-08-22','2013-08-19','2012-08-19','2013-08-19'], 'end':['2012-08-28','2013-09-13','2013-08-19','2012-12-19','2014-08-19']}) id group start end 0 1 A 2012-08-19 2012-08-28 1 2 A 2012-08-22 2013-09-13 2 3 A 2013-08-19 2013-08-21 3 4 B 2012-08-19 2012-12-19 4 5 B 2013-08-19 2014-08-19 For given row in my dataframe I'd like to count

pandas groupby: can I select an agg function by one level of a column MultiIndex?

阅读更多关于 pandas groupby: can I select an agg function by one level of a column MultiIndex?

问题 I have a pandas DataFrame with a MultiIndex of columns: columns=pd.MultiIndex.from_tuples( [(c, i) for c in ['a', 'b'] for i in range(3)]) df = pd.DataFrame(np.random.randn(4, 6), index=[0, 0, 1, 1], columns=columns) print(df) a b 0 1 2 0 1 2 0 0.582804 0.753118 -0.900950 -0.914657 -0.333091 -0.965912 0 0.498002 -0.842624 0.155783 0.559730 -0.300136 -1.211412 1 0.727019 1.522160 1.679025 1.738350 0.593361 0.411907 1 1.253759 -0.806279 -2.177582 -0.099210 -0.839822 -0.211349 I want to group by

Pandas GroupBy and select rows with the minimum value in a specific column

阅读更多关于 Pandas GroupBy and select rows with the minimum value in a specific column

问题 I am grouping my dataset by column A and then would like to take the minimum value in column B and the corresponding value in column C. data = pd.DataFrame({'A': [1, 2], 'B':[ 2, 4], 'C':[10, 4]}) data A B C 0 1 4 3 1 1 5 4 2 1 2 10 3 2 7 2 4 2 4 4 5 2 6 6 and I would like to get : A B C 0 1 2 10 1 2 4 4 For the moment I am grouping by A, and creating a value that indicates me the rows I will keep in my dataset: a = data.groupby('A').min() a['A'] = a.index to_keep = [str(x[0]) + str(x[1]) for

Pandas: Comparing rows within groups

阅读更多关于 Pandas: Comparing rows within groups

问题 I have a dataframe that is grouped by 'Key'. I need to compare rows within each group to identify whether I want to keep each row of the group or whether I want just one row of a group. In the condition to keep all rows of a group: if there is one row that has the color 'red' and area of '12' and shape of 'circle' AND another row (within the same group) that has a color of 'green' and an area of '13' and shape of 'square', then I want to keep all rows in that group. Otherwise if this scenario

How to perform a cumulative sum of distinct values in pandas dataframe

阅读更多关于 How to perform a cumulative sum of distinct values in pandas dataframe

问题 I have a dataframe like this: id date company ...... 123 2019-01-01 A 224 2019-01-01 B 345 2019-01-01 B 987 2019-01-03 C 334 2019-01-03 C 908 2019-01-04 C 765 2019-01-04 A 554 2019-01-05 A 482 2019-01-05 D and I want to get the cumulative number of unique values over time for the 'company' column. So if a company appears at a later date they are not counted again. My expected output is: date cumulative_count 2019-01-01 2 2019-01-03 3 2019-01-04 3 2019-01-05 4 I've tried: df.groupby(['date'])

How to perform a cumulative sum of distinct values in pandas dataframe

阅读更多关于 How to perform a cumulative sum of distinct values in pandas dataframe

not able to groupby by one level in My dataframe by pandas

阅读更多关于 not able to groupby by one level in My dataframe by pandas

问题 I am importing an excel document and creating a dataframe, df3 . I want to group by only Name . The other uplicate data should reflect as shown in the output. Df3 =pd.read_excel('stats') print (df3) Name ID Month Shift Jon 1 Feb A Jon 1 Jan B Jon 1 Mar C Mike 1 Jan A Mike 1 Jan B Jon 1 Feb C Jon 1 Jan A Output Required: I want to have output like as below in the same format and will save in excel. Please help me on same as I'm stuck here. Note (Month must be ascending order) Will be greatfull

Applying weighted average function to column in pandas groupby object, but weights sum to zero

阅读更多关于 Applying weighted average function to column in pandas groupby object, but weights sum to zero

问题 I am applying different functions to each column in a pandas groupby object. One of these functions is a weighted average, where the weights are the associated values in another column in the DataFrame. However, for a number of my groups the weights sum to zero. Because of this, I get a "Weights sum to zero, can't be normalized" error message when I run the code. Referring to the code below, for the group defined by col1 value x and col2 value y, the sum of the values in col3 in rows with

Applying weighted average function to column in pandas groupby object, but weights sum to zero

阅读更多关于 Applying weighted average function to column in pandas groupby object, but weights sum to zero