summarize

dplyr: summarise each column and return list columns

强颜欢笑 提交于 2019-12-12 14:28:55
问题 I am looking to summarize each column in a tibble with a custom summary function that will return different sized tibbles depending on the data. Let’s say my summary function is this: mysummary <- function(x) {quantile(x)[1:sample(1:5, 1)] %>% as_tibble} It can be applied to one column as such: cars %>% summarise(speed.summary = list(mysummary(speed))) But I can't figure out a way to achieve this using summarise_all (or something similar). Using the cars data, the desired output would be:

Summarizing unknown number of column in R using dplyr

不羁岁月 提交于 2019-12-11 06:29:46
问题 I have following data.frame (df) ID1 ID2 Col1 Col2 Col3 Grp A B 1 3 6 G1 C D 3 5 7 G1 E F 4 5 7 G2 G h 5 6 8 G2 What I would like to achieve is the following: - group by Grp, easy - and then summarize so that for each group I sum the columns and create the columns with strings with all ID1s and ID2s It would be something like this: df %>% group_by(Grp) %>% summarize(ID1s=toString(ID1), ID2s=toString(ID2), Col1=sum(Col1), Col2=sum(Col2), Col3=sum(Col3)) Everything is fine whae Iknow the number

Summarize by column efficiently

醉酒当歌 提交于 2019-12-11 04:26:47
问题 I have a big table similar to datadf with 3000 thousand columns and rows, I saw some methods to obtain my expected summary in stack overflow (Frequency of values per column in table), but even the fastest is very slow for my table. EDIT: thx to comments, several methods are currently satisfactory. library(data.table) library(tidyverse) library(microbenchmark) datadf <- data.frame(var1 = rep(letters[1:3], each = 4), var2 = rep(letters[1:4], each = 3), var3 = rep('m', 12), stringsAsFactors = F

Applying group_by and summarise(sum) but keep columns with non-relevant conflicting data?

微笑、不失礼 提交于 2019-12-07 01:34:22
问题 My question is very similar to Applying group_by and summarise on data while keeping all the columns' info but I would like to keep columns which get excluded because they conflict after grouping. Label <- c("203c","203c","204a","204a","204a","204a","204a","204a","204a","204a") Type <- c("wholefish","flesh","flesh","fleshdelip","formula","formuladelip", "formula","formuladelip","wholefish", "wholefishdelip") Proportion <- c(1,1,0.67714,0.67714,0.32285,0.32285,0.32285, 0.32285, 0.67714,0.67714

R - dplyr Summarize and Retain Other Columns [closed]

痴心易碎 提交于 2019-12-04 23:28:30
I am grouping data and then summarizing it, but would also like to retain another column. I do not need to do any evaluations of that column's content as it will always be the same as the group_by column. I can add it to the group_by statement but that does not seem "right". I want to retain State.Full.Name after grouping by State . Thanks TDAAtest <- data.frame(State=sample(state.abb,1000,replace=TRUE)) TDAAtest$State.Full.Name <- state.name[match(TDAAtest$State,state.abb)] TDAA.states <- TDAAtest %>% filter(!is.na(State)) %>% group_by(State) %>% summarize(n=n()) %>% ungroup() %>% arrange

Why does `summarize` drop a group?

霸气de小男生 提交于 2019-12-01 09:10:25
I'm fooling around with babynames pkg. A group_by command works, but after the summarize , one of the groups is dropped from the group list. library(babynames) babynames[1:10000, ] %>% group_by(year, name) %>% head(1) # A tibble: 1 x 5 # Groups: year, name [1] year sex name n prop <dbl> <chr> <chr> <int> <dbl> 1 1880 F Mary 7065 0.07238433 This is fine---two groups, year, name . But after a summarize (which respects the groups correctly), the name group is dropped. Am I missing an easy mistake? babynames[1:10000, ] %>% group_by(year, name) %>% summarise(n = sum(n)) %>% head(1) # A tibble: 1 x

counting the occurrence of substrings in a column in R with group by

邮差的信 提交于 2019-12-01 08:01:08
I would like to count the occurrences of a string in a column ....per group. In this case the string is often a substring in a character column. I have some data e.g. ID String village 1 fd_sec, ht_rm, A 2 NA, ht_rm A 3 fd_sec, B 4 san, ht_rm, C The code that I began with is obviously incorrect, but I am failing on my search to find out I could use the grep function in a column and group by village impacts <- se %>% group_by(village) %>% summarise(c_NA = round(sum(sub$en41_1 == "NA")), c_ht_rm = round(sum(sub$en41_1 == "ht_rm")), c_san = round(sum(sub$en41_1 == "san")), c_fd_sec = round(sum

counting the occurrence of substrings in a column in R with group by

大憨熊 提交于 2019-12-01 06:28:33
问题 I would like to count the occurrences of a string in a column ....per group. In this case the string is often a substring in a character column. I have some data e.g. ID String village 1 fd_sec, ht_rm, A 2 NA, ht_rm A 3 fd_sec, B 4 san, ht_rm, C The code that I began with is obviously incorrect, but I am failing on my search to find out I could use the grep function in a column and group by village impacts <- se %>% group_by(village) %>% summarise(c_NA = round(sum(sub$en41_1 == "NA")), c_ht

What is the pandas equivalent of dplyr summarize/aggregate by multiple functions?

倾然丶 夕夏残阳落幕 提交于 2019-11-30 10:15:42
问题 I'm having issues transitioning to pandas from R where dplyr package can easily group-by and perform multiple summarizations. Please help improve my existing Python pandas code for multiple aggregations: import pandas as pd data = pd.DataFrame( {'col1':[1,1,1,1,1,2,2,2,2,2], 'col2':[1,2,3,4,5,6,7,8,9,0], 'col3':[-1,-2,-3,-4,-5,-6,-7,-8,-9,0] } ) result = [] for k,v in data.groupby('col1'): result.append([k, max(v['col2']), min(v['col3'])]) print pd.DataFrame(result, columns=['col1', 'col2_agg

dplyr summarise() with multiple return values from a single function

痞子三分冷 提交于 2019-11-29 20:17:54
I am wondering if there is a way to use functions with summarise ( dplyr 0.1.2 ) that return multiple values (for instance the describe function from psych package). If not, is it just because it hasn't been implemented yet, or is there a reason that it wouldn't be a good idea? Example: require(psych) require(ggplot2) require(dplyr) dgrp <- group_by(diamonds, cut) describe(dgrp$price) summarise(dgrp, describe(price)) produces: Error: expecting a single value With dplyr >= 0.2 we can use do function for this: library(ggplot2) library(psych) library(dplyr) diamonds %>% group_by(cut) %>% do