summarize

What is the pandas equivalent of dplyr summarize/aggregate by multiple functions?

柔情痞子 提交于 2019-11-29 19:53:09
I'm having issues transitioning to pandas from R where dplyr package can easily group-by and perform multiple summarizations. Please help improve my existing Python pandas code for multiple aggregations: import pandas as pd data = pd.DataFrame( {'col1':[1,1,1,1,1,2,2,2,2,2], 'col2':[1,2,3,4,5,6,7,8,9,0], 'col3':[-1,-2,-3,-4,-5,-6,-7,-8,-9,0] } ) result = [] for k,v in data.groupby('col1'): result.append([k, max(v['col2']), min(v['col3'])]) print pd.DataFrame(result, columns=['col1', 'col2_agg', 'col3_agg']) Issues: too verbose probably can be optimized and efficient. (I rewrote a for-loop

dplyr summarise() with multiple return values from a single function

邮差的信 提交于 2019-11-28 16:48:38
问题 I am wondering if there is a way to use functions with summarise ( dplyr 0.1.2 ) that return multiple values (for instance the describe function from psych package). If not, is it just because it hasn't been implemented yet, or is there a reason that it wouldn't be a good idea? Example: require(psych) require(ggplot2) require(dplyr) dgrp <- group_by(diamonds, cut) describe(dgrp$price) summarise(dgrp, describe(price)) produces: Error: expecting a single value 回答1: With dplyr >= 0.2 we can use

Define and apply custom bins on a dataframe

社会主义新天地 提交于 2019-11-26 19:56:39
Using python I have created following data frame which contains similarity values: cosinFcolor cosinEdge cosinTexture histoFcolor histoEdge histoTexture jaccard 1 0.770 0.489 0.388 0.57500000 0.5845137 0.3920000 0.00000000 2 0.067 0.496 0.912 0.13865546 0.6147309 0.6984127 0.00000000 3 0.514 0.426 0.692 0.36440678 0.4787535 0.5198413 0.05882353 4 0.102 0.430 0.739 0.11297071 0.5288008 0.5436508 0.00000000 5 0.560 0.735 0.554 0.48148148 0.8168083 0.4603175 0.00000000 6 0.029 0.302 0.558 0.08547009 0.3928234 0.4603175 0.00000000 I am trying to write a R script to generate another data frame that

Define and apply custom bins on a dataframe

限于喜欢 提交于 2019-11-26 05:27:32
问题 Using python I have created following data frame which contains similarity values: cosinFcolor cosinEdge cosinTexture histoFcolor histoEdge histoTexture jaccard 1 0.770 0.489 0.388 0.57500000 0.5845137 0.3920000 0.00000000 2 0.067 0.496 0.912 0.13865546 0.6147309 0.6984127 0.00000000 3 0.514 0.426 0.692 0.36440678 0.4787535 0.5198413 0.05882353 4 0.102 0.430 0.739 0.11297071 0.5288008 0.5436508 0.00000000 5 0.560 0.735 0.554 0.48148148 0.8168083 0.4603175 0.00000000 6 0.029 0.302 0.558 0