missing-data

Partially merge two datasets and fill in NAs in R

这一生的挚爱 提交于 2019-12-01 13:32:47
问题 I have two datasets a = raw dataset with thousands of observations of different weather events STATE EVTYPE 1 AL WINTER STORM 2 AL TORNADO 3 AL TSTM WIND 4 AL TSTM WIND 5 AL TSTM WIND 6 AL HAIL 7 AL HIGH WIND 8 AL TSTM WIND 9 AL TSTM WIND 10 AL TSTM WIND b = a dictionary table, which has a standard spelling for some weather events. EVTYPE evmatch 1 HIGH SURF ADVISORY <NA> 2 COASTAL FLOOD COASTAL FLOOD 3 FLASH FLOOD FLASH FLOOD 4 LIGHTNING LIGHTNING 5 TSTM WIND <NA> 6 TSTM WIND (G45) <NA> both

Fill missing date values in column by adding delivery interval to another date column

為{幸葍}努か 提交于 2019-12-01 12:58:08
问题 Data: DB1 <- data.frame(orderItemID = 1:10, orderDate = c("2013-01-21","2013-03-31","2013-04-12","2013-06-01","2014-01-01", "2014-02-19","2014-02-27","2014-10-02","2014-10-31","2014-11-21"), deliveryDate = c("2013-01-23", "2013-03-01", "NA", "2013-06-04", "2014-01-03", "NA", "2014-02-28", "2014-10-04", "2014-11-01", "2014-11-23")) Expected Outcome: DB1 <- data.frame(orderItemID = 1:10, orderDate= c("2013-01-21","2013-03-31","2013-04-12","2013-06-01","2014-01-01", "2014-02-19","2014-02-27",

Filling missing value in group

不问归期 提交于 2019-12-01 12:54:46
I have data frame where some of the values are missing A 1 A NA A NA B NA B 2 B NA C NA C NA C NA How can I fill in groups where I have data? Alternative solution, though perhaps a bit flawed in how many assumptions it makes: library(dplyr) y %>% group_by(V1) %>% arrange(V2) %>% mutate(V2 = V2[1]) # Source: local data frame [9 x 2] # Groups: V1 [3] # V1 V2 # (chr) (int) # 1 A 1 # 2 A 1 # 3 A 1 # 4 B 2 # 5 B 2 # 6 B 2 # 7 C NA # 8 C NA # 9 C NA You can also use fill from tidyr : library(dplyr) library(tidyr) df1 %>% group_by(ID) %>% fill(v1) %>% fill(v1, .direction = "up") Result: # A tibble: 9

Obtain unstandardized factor scores from factor analysis in R

谁说胖子不能爱 提交于 2019-12-01 12:32:44
问题 I'm conducting a factor analysis of several variables in R using factanal() (but am open to using other packages). I want to determine each case's factor score, but I want the factor scores to be unstandardized and on the original metric of the input variables. When I run the factor analysis and obtain the factor scores, they are standardized with a normal distribution of mean=0, SD=1, and are not on the original metric of the input variables. How can I obtain unstandardized factor scores

Filling missing value in group

自闭症网瘾萝莉.ら 提交于 2019-12-01 11:49:21
问题 I have data frame where some of the values are missing A 1 A NA A NA B NA B 2 B NA C NA C NA C NA How can I fill in groups where I have data? 回答1: Alternative solution, though perhaps a bit flawed in how many assumptions it makes: library(dplyr) y %>% group_by(V1) %>% arrange(V2) %>% mutate(V2 = V2[1]) # Source: local data frame [9 x 2] # Groups: V1 [3] # V1 V2 # (chr) (int) # 1 A 1 # 2 A 1 # 3 A 1 # 4 B 2 # 5 B 2 # 6 B 2 # 7 C NA # 8 C NA # 9 C NA 回答2: You can also use fill from tidyr :

R plotting a dataset with NA Values [duplicate]

泄露秘密 提交于 2019-12-01 11:45:48
This question already has an answer here: How to connect dots where there are missing values? 4 answers I'm trying to plot a dataset consisting of numbers and some NA entries in R. V1,V2,V3 2, 4, 3 NA, 5, 4 NA,NA,NA NA, 7, 3 6, 6, 9 Should return the same lines in the plot, as if I had entered: V1,V2,V3 2, 4, 3 3, 5, 4 4, 6, 3.5 5, 7, 3 6, 6, 9 What I need R to do is basically plotting the dataset as points, an then connect these points by straight lines, which - due to the size of the dataset - would be much more efficient then the actual calculation of each interpolated value within the

Median imputation using sapply

喜夏-厌秋 提交于 2019-12-01 11:30:35
I want to replace missing values in columns of a dataframe. I have written the following code MedianImpute <- function(data=data) { for(i in 1:ncol(data)) { if(class(data[,i]) %in% c("numeric","integer")) { if(sum(is.na(data[,i]))) { data[is.na(data[,i]),i] <- median(data[,i],na.rm = TRUE) } } } return(data) } This returns the dataframe with the NAs replaced by the column median. I do no want to use for loop, how can I get the same result using any of the apply functions in R? This is actually a subtle problem, so worth a bit of discussion (IMO). You have a data frame and want to impute

Median imputation using sapply

只愿长相守 提交于 2019-12-01 09:32:10
问题 I want to replace missing values in columns of a dataframe. I have written the following code MedianImpute <- function(data=data) { for(i in 1:ncol(data)) { if(class(data[,i]) %in% c("numeric","integer")) { if(sum(is.na(data[,i]))) { data[is.na(data[,i]),i] <- median(data[,i],na.rm = TRUE) } } } return(data) } This returns the dataframe with the NAs replaced by the column median. I do no want to use for loop, how can I get the same result using any of the apply functions in R? 回答1: This is

Impute missing data with mean by group

怎甘沉沦 提交于 2019-12-01 08:41:14
I have a categorical variable with three levels ( A , B , and C ). I also have a continuous variable with some missing values on it. I would like to replace the NA values with the mean of its group. This is, missing observations from group A has to be replaced with the mean of group A . I know I can just calculate each group's mean and replace missing values, but I'm sure there's another way to do so more efficiently with loops. A <- subset(data, group == "A") mean(A$variable, rm.na = TRUE) A$variable[which(is.na(A$variable))] <- mean(A$variable, na.rm = TRUE) Now, I understand I could do the

Whitespace string can't be replaced with NA in R

☆樱花仙子☆ 提交于 2019-12-01 08:15:37
问题 I want to substitute whitespaces with NA. A simple way could be df[df == ""] <- NA , and that works for most of the cells of my data frame....but not for everyone! I have the following code: library(rvest) library(dplyr) library(tidyr) #Read website htmlpage <- read_html("http://www.soccervista.com/results-Liga_MX_Apertura-2016_2017-844815.html") #Extract table df <- htmlpage %>% html_nodes("table") %>% html_table() df <- as.data.frame(df) #Set whitespaces into NA's df[df == ""] <- NA I