na | 易学教程

Conditionally selecting columns in dplyr where certain proportion of values is NA

阅读更多关于 Conditionally selecting columns in dplyr where certain proportion of values is NA

问题 Data I'm working with a data set resembling the data.frame generated below: set.seed(1) dta <- data.frame(observation = 1:20, valueA = runif(n = 20), valueB = runif(n = 20), valueC = runif(n = 20), valueD = runif(n = 20)) dta[2:5,3] <- NA dta[2:10,4] <- NA dta[7:20,5] <- NA The columns have NA values with the last column having more than 60% of observations NAs . > sapply(dta, function(x) {table(is.na(x))}) $observation FALSE 20 $valueA FALSE 20 $valueB FALSE TRUE 16 4 $valueC FALSE TRUE 11 9

Count total missing values by group?

阅读更多关于 Count total missing values by group?

问题 EDIT: input very new to this. I have a similar problem to this: group by and then count missing variables? Taking the input data from that question: df1 <- data.frame( Z = sample(LETTERS[1:5], size = 10000, replace = T), X1 = sample(c(1:10,NA), 10000, replace = T), X2 = sample(c(1:25,NA), 10000, replace = T), X3 = sample(c(1:5,NA), 10000, replace = T)) as one user proposed, it's possible to use summarise_each : df1 %>% group_by(Z) %>% summarise_each(funs(sum(is.na(.)))) #Source: local data

Count total missing values by group?

阅读更多关于 Count total missing values by group?

Count total missing values by group?

阅读更多关于 Count total missing values by group?

Count total missing values by group?

阅读更多关于 Count total missing values by group?

Filling NA using linear regression in R

阅读更多关于 Filling NA using linear regression in R

问题 I have a data with one time column and 2 variables.(example below) df <- structure(list(time = c(15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26), var1 = c(20.4, 31.5, NA, 53.7, 64.8, NA, NA, NA, NA, 120.3, NA, 142.5), var2 = c(30.6, 47.25, 63.9, 80.55, 97.2, 113.85, 130.5, 147.15, 163.8, 180.45, 197.1, 213.75)), .Names = c("time", "var1", "var2"), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame")) The var1 has few NA and I want to fill the NA with linear regression between

cut() a variable with missing values

阅读更多关于 cut() a variable with missing values

来源： https://stackoverflow.com/questions/50882620/cut-a-variable-with-missing-values

Date columns with NAs in R - unexpected behaviour with mutate

阅读更多关于 Date columns with NAs in R - unexpected behaviour with mutate

问题 I'm trying to follow this process with a dataset. Here is a test dataframe: id <- c("Johnboy","Johnboy","Johnboy") orderno <- c(2,2,1) validorder <- c(0,1,1) ordertype <- c(95,94,95) orderdate <- as.Date(c("2019-06-17","2019-03-26","2018-08-23")) df <- data.frame(id, orderno, validorder, ordertype, orderdate) Then I do the following: ## compute order date for order types df <- df %>% mutate(orderdate_dried = if_else(validorder == 1 & ordertype == 95, orderdate, as.Date(NA)), orderdate_fresh =

Date columns with NAs in R - unexpected behaviour with mutate

阅读更多关于 Date columns with NAs in R - unexpected behaviour with mutate

mean( ,na.rm=TRUE) still returns NA

阅读更多关于 mean( ,na.rm=TRUE) still returns NA

问题 I'm very new to R (moving over from SPSS). I'm using RStudio on a Mac running Mavericks. Please answer my question in words of 2 syllables as this is my first real attempt at anything like this. I've worked through some basic tutorials and can make things work on all the sample data. I have a data set with 64,000-ish rows and about 20 columns. I want to get the mean of the variable "hold_time", but whatever I try I get either NA or NA and a warning message I have tried all of the following: >