pandas-groupby

Applying weighted average function to column in pandas groupby object, carrying over the weights to calculate uncertainties

我的未来我决定 提交于 2021-01-29 12:09:37
问题 I have tried to expand on this question to generalize to the case when one wants to carry over the sum of the weights, in a weighted average, so that one can append to the resulting dataframe the uncertainties on the weighted averages which are 1 / (sqrt(sum_of_weights)) Consider the sample dataframe import pandas as pd import numpy as np df5 = pd.DataFrame.from_dict({'Lab': ['Lab1','Lab1','Lab1','Lab2','Lab2','Lab2','Lab3','Lab3','Lab3'], 'test_type': ['a','a','b','b','c','c','a','a','a'],

pandas groupby with agg not working on multiple columns

三世轮回 提交于 2021-01-29 10:46:31
问题 I'm trying to merge multiple columns, each into a list based on a group by in pandas. Below is the code I'm using grouped_df = df.groupby(['d_id', 'time']).agg({'d_name': lambda x: tuple(x)}, {'ver': lambda x: tuple(x)}, {'f_name': lambda x: tuple(x)}) This only gives me the first column (d_name) in a list with d_id and time in grouped_df. The other two columns do not show as lists. I tried using list earlier but found out that list has an issue with agg function so I resorted to tuple. Let

How to get count of words from DataFrame based on conditions

核能气质少年 提交于 2021-01-29 08:46:39
问题 I have the following two dataframes badges and comments . I have created a list of 'gold users' from badges dataframe whose Class=1 . Here Name means the 'Name of Badge' and Class means the level of Badge (1=Gold, 2=Silver, 3=Bronze). I have already done the text preprocessing on comments['Text'] and now want to find the count of top 10 words for gold users from comments['Text'] . I tried the given code but am getting error "KeyError: "None of [Index(['1532', '290', '1946', '1459', '6094',

Check whether all dates are present in a year in pandas python

无人久伴 提交于 2021-01-29 08:24:37
问题 I have a data column like below, in which some dates are missing. obstime 2012-01-01 2012-01-02 2012-01-03 2012-01-04 .... 2016-12-28 2016-12-29 2016-12-30 2016-12-31 I want to check for all dates for each month for available years. Like in the following image 回答1: Use: #sample data df = pd.DataFrame({'obstime':pd.date_range('2012-01-01', '2016-12-31')}) removed = ['2013-09-01', '2013-09-02', '2013-09-03','2014-10-09','2016-12-30'] removed1 = pd.date_range('2016-12-16', '2016-12-22') removed2

Faster way to group data than pandas groupby

风格不统一 提交于 2021-01-29 08:02:09
问题 I am implementing a Genetic Algorithm. For this algorithm a number of iterations (between 100 to 500) have to be done where in each iteration all 100 individuals are evaluated for their 'fitness'. To this extent, I have written an evaluate function. However, even for one iteration evaluating the fitness of the 100 individuals already takes 13 seconds. I have to speed this up massively in order to implement an efficient algorithm. The evaluate function takes two arguments, and then performs

How to pivot a dataframe?

左心房为你撑大大i 提交于 2021-01-29 05:10:25
问题 What is pivot? How do I pivot? Is this a pivot? Long format to wide format? I've seen a lot of questions that ask about pivot tables. Even if they don't know that they are asking about pivot tables, they usually are. It is virtually impossible to write a canonical question and answer that encompasses all aspects of pivoting... ... But I'm going to give it a go. The problem with existing questions and answers is that often the question is focused on a nuance that the OP has trouble

Calculation within Pandas dataframe group

时间秒杀一切 提交于 2021-01-29 02:44:58
问题 I've Pandas Dataframe as shown below. What I'm trying to do is, partition (or groupby) by BlockID, LineID, WordID , and then within each group use current WordStartX - previous (WordStartX + WordWidth) to derive another column, e.g., WordDistance to indicate the distance between this word and previous word. This post Row operations within a group of a pandas dataframe is very helpful but in my case multiple columns involved (WordStartX and WordWidth). *BlockID LineID WordID WordStartX

Calculation within Pandas dataframe group

我只是一个虾纸丫 提交于 2021-01-29 02:32:21
问题 I've Pandas Dataframe as shown below. What I'm trying to do is, partition (or groupby) by BlockID, LineID, WordID , and then within each group use current WordStartX - previous (WordStartX + WordWidth) to derive another column, e.g., WordDistance to indicate the distance between this word and previous word. This post Row operations within a group of a pandas dataframe is very helpful but in my case multiple columns involved (WordStartX and WordWidth). *BlockID LineID WordID WordStartX

Pandas Groupby First - Extract Index from Original Dataframe

…衆ロ難τιáo~ 提交于 2021-01-28 20:01:13
问题 I have a very simple problem. I'd like to take a data frame, perform a groupby on some columns, and extract the index (in the original data frame) of the first row in each group. How do I do this? I've tried playing with as_index , group_keys , reset_index() and nothing seems to work. 回答1: You need the function first : x = pd.DataFrame([{'name': 'b1', 'group': 'a'}, {'name': 'b2', 'group': 'a'}, {'name': 'b3', 'group': 'a'}, {'name': 'b4', 'group': 'b'}, {'name': 'b5', 'group': 'b'}, {'name':

add rows for all dates between two columns?

邮差的信 提交于 2021-01-28 19:01:43
问题 add rows for all dates between two columns? ID Initiation_Date Step Start_Date End_Date Days P-03 29-11-2018 3 2018-11-29 2018-12-10 11.0 P-04 29-11-2018 4 2018-12-03 2018-12-07 4.0 P-05 29-11-2018 5 2018-12-07 2018-12-07 0.0 回答1: Use: mydata = [{'ID' : '10', 'Entry Date': '10/10/2016', 'Exit Date': '15/10/2016'}, {'ID' : '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016'}] df = pd.DataFrame(mydata) #convert columns to datetimes df[['Entry Date','Exit Date']] = df[['Entry Date','Exit