na

Conditionally selecting columns in dplyr where certain proportion of values is NA

非 Y 不嫁゛ 提交于 2021-01-18 05:16:28
问题 Data I'm working with a data set resembling the data.frame generated below: set.seed(1) dta <- data.frame(observation = 1:20, valueA = runif(n = 20), valueB = runif(n = 20), valueC = runif(n = 20), valueD = runif(n = 20)) dta[2:5,3] <- NA dta[2:10,4] <- NA dta[7:20,5] <- NA The columns have NA values with the last column having more than 60% of observations NAs . > sapply(dta, function(x) {table(is.na(x))}) $observation FALSE 20 $valueA FALSE 20 $valueB FALSE TRUE 16 4 $valueC FALSE TRUE 11 9

Count total missing values by group?

自闭症网瘾萝莉.ら 提交于 2021-01-04 04:25:54
问题 EDIT: input very new to this. I have a similar problem to this: group by and then count missing variables? Taking the input data from that question: df1 <- data.frame( Z = sample(LETTERS[1:5], size = 10000, replace = T), X1 = sample(c(1:10,NA), 10000, replace = T), X2 = sample(c(1:25,NA), 10000, replace = T), X3 = sample(c(1:5,NA), 10000, replace = T)) as one user proposed, it's possible to use summarise_each : df1 %>% group_by(Z) %>% summarise_each(funs(sum(is.na(.)))) #Source: local data

Count total missing values by group?

允我心安 提交于 2021-01-04 04:24:55
问题 EDIT: input very new to this. I have a similar problem to this: group by and then count missing variables? Taking the input data from that question: df1 <- data.frame( Z = sample(LETTERS[1:5], size = 10000, replace = T), X1 = sample(c(1:10,NA), 10000, replace = T), X2 = sample(c(1:25,NA), 10000, replace = T), X3 = sample(c(1:5,NA), 10000, replace = T)) as one user proposed, it's possible to use summarise_each : df1 %>% group_by(Z) %>% summarise_each(funs(sum(is.na(.)))) #Source: local data

Count total missing values by group?

时间秒杀一切 提交于 2021-01-04 04:23:51
问题 EDIT: input very new to this. I have a similar problem to this: group by and then count missing variables? Taking the input data from that question: df1 <- data.frame( Z = sample(LETTERS[1:5], size = 10000, replace = T), X1 = sample(c(1:10,NA), 10000, replace = T), X2 = sample(c(1:25,NA), 10000, replace = T), X3 = sample(c(1:5,NA), 10000, replace = T)) as one user proposed, it's possible to use summarise_each : df1 %>% group_by(Z) %>% summarise_each(funs(sum(is.na(.)))) #Source: local data

Count total missing values by group?

心不动则不痛 提交于 2021-01-04 04:21:25
问题 EDIT: input very new to this. I have a similar problem to this: group by and then count missing variables? Taking the input data from that question: df1 <- data.frame( Z = sample(LETTERS[1:5], size = 10000, replace = T), X1 = sample(c(1:10,NA), 10000, replace = T), X2 = sample(c(1:25,NA), 10000, replace = T), X3 = sample(c(1:5,NA), 10000, replace = T)) as one user proposed, it's possible to use summarise_each : df1 %>% group_by(Z) %>% summarise_each(funs(sum(is.na(.)))) #Source: local data

Filling NA using linear regression in R

会有一股神秘感。 提交于 2020-12-30 04:12:58
问题 I have a data with one time column and 2 variables.(example below) df <- structure(list(time = c(15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26), var1 = c(20.4, 31.5, NA, 53.7, 64.8, NA, NA, NA, NA, 120.3, NA, 142.5), var2 = c(30.6, 47.25, 63.9, 80.55, 97.2, 113.85, 130.5, 147.15, 163.8, 180.45, 197.1, 213.75)), .Names = c("time", "var1", "var2"), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame")) The var1 has few NA and I want to fill the NA with linear regression between

Date columns with NAs in R - unexpected behaviour with mutate

醉酒当歌 提交于 2020-08-27 19:55:04
问题 I'm trying to follow this process with a dataset. Here is a test dataframe: id <- c("Johnboy","Johnboy","Johnboy") orderno <- c(2,2,1) validorder <- c(0,1,1) ordertype <- c(95,94,95) orderdate <- as.Date(c("2019-06-17","2019-03-26","2018-08-23")) df <- data.frame(id, orderno, validorder, ordertype, orderdate) Then I do the following: ## compute order date for order types df <- df %>% mutate(orderdate_dried = if_else(validorder == 1 & ordertype == 95, orderdate, as.Date(NA)), orderdate_fresh =

Date columns with NAs in R - unexpected behaviour with mutate

给你一囗甜甜゛ 提交于 2020-08-27 19:52:07
问题 I'm trying to follow this process with a dataset. Here is a test dataframe: id <- c("Johnboy","Johnboy","Johnboy") orderno <- c(2,2,1) validorder <- c(0,1,1) ordertype <- c(95,94,95) orderdate <- as.Date(c("2019-06-17","2019-03-26","2018-08-23")) df <- data.frame(id, orderno, validorder, ordertype, orderdate) Then I do the following: ## compute order date for order types df <- df %>% mutate(orderdate_dried = if_else(validorder == 1 & ordertype == 95, orderdate, as.Date(NA)), orderdate_fresh =

mean( ,na.rm=TRUE) still returns NA

两盒软妹~` 提交于 2020-08-22 19:15:51
问题 I'm very new to R (moving over from SPSS). I'm using RStudio on a Mac running Mavericks. Please answer my question in words of 2 syllables as this is my first real attempt at anything like this. I've worked through some basic tutorials and can make things work on all the sample data. I have a data set with 64,000-ish rows and about 20 columns. I want to get the mean of the variable "hold_time", but whatever I try I get either NA or NA and a warning message I have tried all of the following: >