问题
In R language - I have lets say I have a DF with two columns Fam and Prop both categorical, now Fam has repeated names like Algea, Fungi, etc and column Prop has categorical numbers and NA's. How can I get a table/output that for each value of A it tells me how many values are not. NA example:
Fam Prop
-------------
Algea one
Fungi two
Algea NA
Algea three
Fungi one
Fungi NA
Output:
Algea 2
Fungi 2
I know using the count function should be a direction for the solution but can't seem to solve it, because the Fam column has repeating values.
回答1:
Maybe something like this.
library(dplyr)
data %>% group_by(Fam) %>% summarise(sum(!is.na(Prop)))
回答2:
Four solutions:
Base R frame:
aggregate(DF$Prop, by=list(Fam=DF$Fam), FUN=function(a) sum(!is.na(a)))
# Fam x
# 1 A 5
# 2 B 6
# 3 C 4
Base R, "table" (which is not a frame, see as.data.frame(xtabs(...))
to see the frame variant ... a little different):
xtabs(~ Fam + is.na(Prop), data=DF)
# is.na(Prop)
# Fam FALSE TRUE
# A 5 1
# B 6 1
# C 4 3
dplyr
:
library(dplyr)
DF %>%
group_by(Fam) %>%
summarize(n = sum(!is.na(Prop)))
# # A tibble: 3 x 2
# Fam n
# <fct> <int>
# 1 A 5
# 2 B 6
# 3 C 4
data.table
library(data.table)
# data.table 1.11.4 Latest news: http://r-datatable.com
# Attaching package: 'data.table'
# The following objects are masked from 'package:dplyr':
# between, first, last
DT <- as.data.table(DF)
DT[,sum(!is.na(Prop)),keyby=.(Fam)]
# Fam V1
# 1: A 5
# 2: B 6
# 3: C 4
Data:
DF <- data.frame(Fam=sample(c('A','B','C'), size=20, replace=TRUE), Prop=sample(c('one','two','three'), size=20, replace=TRUE))
DF$Prop[sample(20,size=5)] <- NA
DF
# Fam Prop
# 1 B one
# 2 B three
# 3 C <NA>
# 4 A <NA>
# 5 C one
# 6 A two
# 7 B one
# 8 A three
# 9 B two
# 10 C one
# 11 C two
# 12 B three
# 13 C <NA>
# 14 C <NA>
# 15 A one
# 16 A one
# 17 B three
# 18 A two
# 19 C two
# 20 B <NA>
回答3:
Some dplyr
possibilities:
df %>%
add_count(Fam, miss = !is.na(Prop)) %>%
group_by(Fam) %>%
summarise(Non_miss = first(n[miss = TRUE]))
df %>%
filter(!is.na(Prop)) %>%
group_by(Fam) %>%
tally()
df %>%
filter(!is.na(Prop)) %>%
group_by(Fam) %>%
summarise(Non_miss = n())
回答4:
Base R shortest (and fastest?) solution
number.of.not.NAs <- table(df$Fam[!is.na(df$Prop)])
It takes df$Fam
but chooses only elements which have not NA
in the df$Prop
vector positions. And then using the table
function which you mentioned.
Base R usual solution
Alternatively, you can split the data frame into a list of data frame by df$Fam
,
and then count for each data frame, how many non-NA elements are in the second column - the usual split-apply-combine way. (But I guess, the table
method above is faster).
dfsList <- split(df, df$Fam)
number.of.not.NAs <- sapply(dfsList, function(df) sum(!is.na(df$Prop)))
来源:https://stackoverflow.com/questions/52772838/counting-not-nas-for-values-of-some-column-for-each-value-of-another-row