pandas-groupby

How do you fill NaN with mean of a subset of a group?

假如想象 提交于 2021-02-05 07:11:30
问题 I have a data frame with some values by year and type . I want to replace all NaN values in each year with the mean of values in that year with a specific type. I would like to do this in the most elegant way possible. I'm dealing with a lot of data so less computation would be good as well. Example: df =pd.DataFrame({'year':[1,1,1,2,2,2], 'type':[1,1,2,1,1,2], 'val':[np.nan,5,10,100,200,np.nan]}) I want ALL nan's regardless of their type to be replaced with their respective year mean of all

How do you fill NaN with mean of a subset of a group?

此生再无相见时 提交于 2021-02-05 07:08:18
问题 I have a data frame with some values by year and type . I want to replace all NaN values in each year with the mean of values in that year with a specific type. I would like to do this in the most elegant way possible. I'm dealing with a lot of data so less computation would be good as well. Example: df =pd.DataFrame({'year':[1,1,1,2,2,2], 'type':[1,1,2,1,1,2], 'val':[np.nan,5,10,100,200,np.nan]}) I want ALL nan's regardless of their type to be replaced with their respective year mean of all

Python: Binning based on 2 columns in Pandas

僤鯓⒐⒋嵵緔 提交于 2021-02-05 06:11:11
问题 Looking for a quick and elegant way to bin based on 2 columns in Pandas. Here's my data frame filename height width 0 shopfronts_23092017_3_285.jpg 750.0 560.0 1 shopfronts_200.jpg 4395.0 6020.0 2 shopfronts_25092017_eateries_98.jpg 414.0 621.0 3 shopfronts_101.jpg 480.0 640.0 4 shopfronts_138.jpg 3733.0 8498.0 5 shopfronts_25092017_eateries_95.jpg 187.0 250.0 6 shopfronts_25092017_neon_33.jpg 100.0 200.0 7 shopfronts_322.jpg 682.0 1024.0 8 shopfronts_171.jpg 800.0 600.0 9 shopfronts_23092017

Python: Binning based on 2 columns in Pandas

末鹿安然 提交于 2021-02-05 06:11:05
问题 Looking for a quick and elegant way to bin based on 2 columns in Pandas. Here's my data frame filename height width 0 shopfronts_23092017_3_285.jpg 750.0 560.0 1 shopfronts_200.jpg 4395.0 6020.0 2 shopfronts_25092017_eateries_98.jpg 414.0 621.0 3 shopfronts_101.jpg 480.0 640.0 4 shopfronts_138.jpg 3733.0 8498.0 5 shopfronts_25092017_eateries_95.jpg 187.0 250.0 6 shopfronts_25092017_neon_33.jpg 100.0 200.0 7 shopfronts_322.jpg 682.0 1024.0 8 shopfronts_171.jpg 800.0 600.0 9 shopfronts_23092017

Group Value Count By Column with Pandas Dataframe

冷暖自知 提交于 2021-02-04 08:24:50
问题 I'm not really sure how to ask this, so I apologize if this is a repeat question. I have this data frame that looks something like this: | ID | Attend_x | Attend_y | Attend_z | | 1 | No | No | No | | 2 | No | No | Yes | | 3 | No | Yes | No | | 4 | No | Yes | Yes | I've been trying to figure out the right combination of group_by and count to get it to look like this: | | Yes | No | |Attend_x| 0 | 4 | |Attend_y| 2 | 2 | |Attend_z| 2 | 2 | I'm honestly stumped. So any advice is super appreciated

How do I improve the performance of pandas GroupBy filter operation?

拈花ヽ惹草 提交于 2021-02-02 08:54:20
问题 This is my first time asking a question. I'm working with a large CSV dataset (it contains over 15 million rows and is over 1.5 GB in size). I'm loading the extracts into Pandas dataframes running in Jupyter Notebooks to derive an algorithm based on the dataset. I group the data by MAC address, which results in 1+ million groups. Core to my algorithm development is running this operation: pandas.core.groupby.DataFrameGroupBy.filter Running this operation takes 3 to 5 minutes, depending on the

How do I improve the performance of pandas GroupBy filter operation?

泪湿孤枕 提交于 2021-02-02 08:53:33
问题 This is my first time asking a question. I'm working with a large CSV dataset (it contains over 15 million rows and is over 1.5 GB in size). I'm loading the extracts into Pandas dataframes running in Jupyter Notebooks to derive an algorithm based on the dataset. I group the data by MAC address, which results in 1+ million groups. Core to my algorithm development is running this operation: pandas.core.groupby.DataFrameGroupBy.filter Running this operation takes 3 to 5 minutes, depending on the

Iteration over years to plot different group values as bar plot in pandas

依然范特西╮ 提交于 2021-01-29 20:30:45
问题 I have a dataframe that records number of observations at different locations for different years. I am trying to make a barplot where I can show the total number of observations at different locations for different years. For each location, I want the total observations, for different years to be shown in different colors. My approach is to first make location groups and for each location group, calculate total observation. (I don't think I need to change the index to date - as I am grouping

Plotting with multiple y values with subplot and groupby

扶醉桌前 提交于 2021-01-29 17:28:44
问题 I have a df . See below for the head: Country Date suspected case confirmed cases suspected deaths confirmed deaths 0 Guinea 2014-08-29 25.0 141.0 482.0 648.0 1 Nigeria 2014-08-29 3.0 1.0 15.0 19.0 2 Liberia 2014-08-29 382.0 674.0 322.0 1378.0 By using df.groupby('Country') I want to plot the suspected case against the confirmed case using the Date column for the xaxis . Plotting these as (5, 2) subplots What I've done so far hasn't quite got it yet: fig, ax = plt.subplots(2, 5, sharey=True)

Rolling average matching cases across multiple columns

江枫思渺然 提交于 2021-01-29 14:35:40
问题 I'm sorry if this has been asked but I can't find another question like this. I have a data frame in Pandas like this: Home Away Home_Score Away_Score MIL NYC 1 2 ATL NYC 1 3 NYC PHX 2 1 HOU NYC 1 6 I want to calculate the moving average for each team, but the catch is that I want to do it for all of their games, both home and away combined. So for a moving average window of size 3 for 'NYC' the answer should be (2+3+2)/3 for row 1 and then (3+2+6)/3 for row 2, etc. 回答1: You can exploid stack