missing-data | 易学教程

Partially merge two datasets and fill in NAs in R

阅读更多关于 Partially merge two datasets and fill in NAs in R

问题 I have two datasets a = raw dataset with thousands of observations of different weather events STATE EVTYPE 1 AL WINTER STORM 2 AL TORNADO 3 AL TSTM WIND 4 AL TSTM WIND 5 AL TSTM WIND 6 AL HAIL 7 AL HIGH WIND 8 AL TSTM WIND 9 AL TSTM WIND 10 AL TSTM WIND b = a dictionary table, which has a standard spelling for some weather events. EVTYPE evmatch 1 HIGH SURF ADVISORY <NA> 2 COASTAL FLOOD COASTAL FLOOD 3 FLASH FLOOD FLASH FLOOD 4 LIGHTNING LIGHTNING 5 TSTM WIND <NA> 6 TSTM WIND (G45) <NA> both

Fill missing date values in column by adding delivery interval to another date column

阅读更多关于 Fill missing date values in column by adding delivery interval to another date column

问题 Data: DB1 <- data.frame(orderItemID = 1:10, orderDate = c("2013-01-21","2013-03-31","2013-04-12","2013-06-01","2014-01-01", "2014-02-19","2014-02-27","2014-10-02","2014-10-31","2014-11-21"), deliveryDate = c("2013-01-23", "2013-03-01", "NA", "2013-06-04", "2014-01-03", "NA", "2014-02-28", "2014-10-04", "2014-11-01", "2014-11-23")) Expected Outcome: DB1 <- data.frame(orderItemID = 1:10, orderDate= c("2013-01-21","2013-03-31","2013-04-12","2013-06-01","2014-01-01", "2014-02-19","2014-02-27",

Filling missing value in group

阅读更多关于 Filling missing value in group

I have data frame where some of the values are missing A 1 A NA A NA B NA B 2 B NA C NA C NA C NA How can I fill in groups where I have data? Alternative solution, though perhaps a bit flawed in how many assumptions it makes: library(dplyr) y %>% group_by(V1) %>% arrange(V2) %>% mutate(V2 = V2[1]) # Source: local data frame [9 x 2] # Groups: V1 [3] # V1 V2 # (chr) (int) # 1 A 1 # 2 A 1 # 3 A 1 # 4 B 2 # 5 B 2 # 6 B 2 # 7 C NA # 8 C NA # 9 C NA You can also use fill from tidyr : library(dplyr) library(tidyr) df1 %>% group_by(ID) %>% fill(v1) %>% fill(v1, .direction = "up") Result: # A tibble: 9

Obtain unstandardized factor scores from factor analysis in R

阅读更多关于 Obtain unstandardized factor scores from factor analysis in R

问题 I'm conducting a factor analysis of several variables in R using factanal() (but am open to using other packages). I want to determine each case's factor score, but I want the factor scores to be unstandardized and on the original metric of the input variables. When I run the factor analysis and obtain the factor scores, they are standardized with a normal distribution of mean=0, SD=1, and are not on the original metric of the input variables. How can I obtain unstandardized factor scores

Filling missing value in group

阅读更多关于 Filling missing value in group

问题 I have data frame where some of the values are missing A 1 A NA A NA B NA B 2 B NA C NA C NA C NA How can I fill in groups where I have data? 回答1: Alternative solution, though perhaps a bit flawed in how many assumptions it makes: library(dplyr) y %>% group_by(V1) %>% arrange(V2) %>% mutate(V2 = V2[1]) # Source: local data frame [9 x 2] # Groups: V1 [3] # V1 V2 # (chr) (int) # 1 A 1 # 2 A 1 # 3 A 1 # 4 B 2 # 5 B 2 # 6 B 2 # 7 C NA # 8 C NA # 9 C NA 回答2: You can also use fill from tidyr :

R plotting a dataset with NA Values [duplicate]

阅读更多关于 R plotting a dataset with NA Values [duplicate]

This question already has an answer here: How to connect dots where there are missing values? 4 answers I'm trying to plot a dataset consisting of numbers and some NA entries in R. V1,V2,V3 2, 4, 3 NA, 5, 4 NA,NA,NA NA, 7, 3 6, 6, 9 Should return the same lines in the plot, as if I had entered: V1,V2,V3 2, 4, 3 3, 5, 4 4, 6, 3.5 5, 7, 3 6, 6, 9 What I need R to do is basically plotting the dataset as points, an then connect these points by straight lines, which - due to the size of the dataset - would be much more efficient then the actual calculation of each interpolated value within the

Median imputation using sapply

阅读更多关于 Median imputation using sapply

I want to replace missing values in columns of a dataframe. I have written the following code MedianImpute <- function(data=data) { for(i in 1:ncol(data)) { if(class(data[,i]) %in% c("numeric","integer")) { if(sum(is.na(data[,i]))) { data[is.na(data[,i]),i] <- median(data[,i],na.rm = TRUE) } } } return(data) } This returns the dataframe with the NAs replaced by the column median. I do no want to use for loop, how can I get the same result using any of the apply functions in R? This is actually a subtle problem, so worth a bit of discussion (IMO). You have a data frame and want to impute

Median imputation using sapply

阅读更多关于 Median imputation using sapply

问题 I want to replace missing values in columns of a dataframe. I have written the following code MedianImpute <- function(data=data) { for(i in 1:ncol(data)) { if(class(data[,i]) %in% c("numeric","integer")) { if(sum(is.na(data[,i]))) { data[is.na(data[,i]),i] <- median(data[,i],na.rm = TRUE) } } } return(data) } This returns the dataframe with the NAs replaced by the column median. I do no want to use for loop, how can I get the same result using any of the apply functions in R? 回答1: This is

Impute missing data with mean by group

阅读更多关于 Impute missing data with mean by group

I have a categorical variable with three levels ( A , B , and C ). I also have a continuous variable with some missing values on it. I would like to replace the NA values with the mean of its group. This is, missing observations from group A has to be replaced with the mean of group A . I know I can just calculate each group's mean and replace missing values, but I'm sure there's another way to do so more efficiently with loops. A <- subset(data, group == "A") mean(A$variable, rm.na = TRUE) A$variable[which(is.na(A$variable))] <- mean(A$variable, na.rm = TRUE) Now, I understand I could do the

Whitespace string can't be replaced with NA in R

阅读更多关于 Whitespace string can't be replaced with NA in R

问题 I want to substitute whitespaces with NA. A simple way could be df[df == ""] <- NA , and that works for most of the cells of my data frame....but not for everyone! I have the following code: library(rvest) library(dplyr) library(tidyr) #Read website htmlpage <- read_html("http://www.soccervista.com/results-Liga_MX_Apertura-2016_2017-844815.html") #Extract table df <- htmlpage %>% html_nodes("table") %>% html_table() df <- as.data.frame(df) #Set whitespaces into NA's df[df == ""] <- NA I