missing-data

R plotting a dataset with NA Values [duplicate]

半腔热情 提交于 2019-12-04 02:01:21
问题 This question already has answers here : How to connect dots where there are missing values? (4 answers) Closed 6 years ago . I'm trying to plot a dataset consisting of numbers and some NA entries in R. V1,V2,V3 2, 4, 3 NA, 5, 4 NA,NA,NA NA, 7, 3 6, 6, 9 Should return the same lines in the plot, as if I had entered: V1,V2,V3 2, 4, 3 3, 5, 4 4, 6, 3.5 5, 7, 3 6, 6, 9 What I need R to do is basically plotting the dataset as points, an then connect these points by straight lines, which - due to

Imputing missing values using ARIMA model

余生颓废 提交于 2019-12-03 21:20:52
I am trying to impute missing values in a time series with an ARIMA model in R. I tried this code but no success. x <- AirPassengers x[90:100] <- NA fit <- auto.arima(x) fitted(fit)[90:100] ## this is giving me NAs plot(x) lines(fitted(fit), col="red") The fitted model is not imputing the missing values. Any idea on how this is done? fitted gives in-sample one-step forecasts. The "right" way to do what you want is via a Kalman smoother. A rough approximation good enough for most purposes is obtained using the average of the forward and backward forecasts for the missing section. Like this: x <

How to create missing value for repeated measurement data?

扶醉桌前 提交于 2019-12-03 21:12:04
问题 I have a data set that not every subject’s observations were observed at the exact same time points, but I want to turn it in to a data set that every one’s observations were observed at the exact same time points (so that I can use it in SAS proc traj). For example, suppose I have dataset "m": id <- c(1,1,1,1,2,2,3,3,3) age <- c(2,3,4,5,3,6,2,5,8) IQ <- c(3,4,5,4,6,5,3,8,10) m <- data.frame(id,age,IQ) > m id age IQ 1 1 2 3 2 1 3 4 3 1 4 5 4 1 5 4 5 2 3 6 6 2 6 5 7 3 2 3 8 3 5 8 9 3 8 10 >

Multidimensional scaling with missing values in dissimilarity matrix

偶尔善良 提交于 2019-12-03 13:40:40
问题 I have a dissimilarity matrix on which I would like to perform multidimensional scaling (MDS) using the sklearn.manifold.MDS function. The dissimilarity between some elements in this matrix is not meaningful and I am thus wondering if there is a way to run MDS on a sparse matrix or on a matrix with missing values? According to this question, dissimilarities with 0 are considered as missing values, but I was unable to find this statement in the official documentation. Isn't a dissimilarity

visual structure of a data.frame: locations of NAs and much more

試著忘記壹切 提交于 2019-12-03 13:21:08
问题 I want to represent the structure of a data frame (or matrix, or data.table whatever) on a single plot with color-coding. I guess that could be very useful for many people handling various types of data, to visualize it in a single glance. Perhaps someone have already developed a package to do it, but I couldn't find one (just this). So here is a rough mockup of my "vision", kind of a heatmap, showing in color codes: the NA locations, the class of variables (factors (how many levels?),

R gbm handling of missing values

匆匆过客 提交于 2019-12-03 09:48:09
问题 Does anyone know how gbm in R handles missing values? I can't seem to find any explanation using google. 回答1: To explain what gbm does with missing predictors, let's first visualize a single tree of a gbm object. Suppose you have a gbm object mygbm . Using pretty.gbm.tree(mygbm, i.tree=1) you can visualize the first tree on mygbm, e.g.: SplitVar SplitCodePred LeftNode RightNode MissingNode ErrorReduction Weight Prediction 0 46 1.629728e+01 1 5 9 26.462908 1585 -4.396393e-06 1 45 1.850000e+01

In gnuplot, with “set datafile missing”, how to ignore both “nan” and “-nan”?

爷,独闯天下 提交于 2019-12-03 07:47:40
The gnuplot command set datafile missing "nan" tells gnuplot to ignore nan data values in the data file. How to ignore both nan and -nan ? I tried the following in gnuplot, but then the effect of the first statement is overwritten by the next. gnuplot> set datafile missing "-nan" gnuplot> set datafile missing "nan" Is it possible to somewhow embed a grep -v nan in the gnuplot command, or even some kind of regexp to exclude any imaginable non-numerical data? Christoph It is not possible to use a regexp for set datafile missing , but you can use any program to filter you data before plotting,

Specify different types of missing values (NAs)

假如想象 提交于 2019-12-03 06:47:45
I'm interested to specify types of missing values. I have data that have different types of missing and I am trying to code these values as missing in R, but I am looking for a solution were I can still distinguish between them. Say I have some data that looks like this, set.seed(667) df <- data.frame(a = sample(c("Don't know/Not sure","Unknown","Refused","Blue", "Red", "Green"), 20, rep=TRUE), b = sample(c(1, 2, 3, 77, 88, 99), 10, rep=TRUE), f = round(rnorm(n=10, mean=.90, sd=.08), digits = 2), g = sample(c("C","M","Y","K"), 10, rep=TRUE) ); df # a b f g # 1 Unknown 2 0.78 M # 2 Refused 2 0

visual structure of a data.frame: locations of NAs and much more

情到浓时终转凉″ 提交于 2019-12-03 03:27:23
I want to represent the structure of a data frame (or matrix, or data.table whatever) on a single plot with color-coding. I guess that could be very useful for many people handling various types of data, to visualize it in a single glance. Perhaps someone have already developed a package to do it, but I couldn't find one (just this ). So here is a rough mockup of my "vision", kind of a heatmap, showing in color codes: the NA locations, the class of variables (factors (how many levels?), numeric (with color gradient, zeros, outliers...), strings) dimensions etc..... So far I have just written a

How to create “NA” for missing data in a time series

ぃ、小莉子 提交于 2019-12-03 01:32:37
问题 I have several files of data that look like this: X code year month day pp 1 4515 1953 6 1 0 2 4515 1953 6 2 0 3 4515 1953 6 3 0 4 4515 1953 6 4 0 5 4515 1953 6 5 3.5 Sometimes there is data missing, but I don't have NAs, the rows simply don't exist. I need to create NAs when the data is missing. I though I could start by identifying when that occurs by converting it to a zoo object and check for strict regularity (I never used zoo before), I used the following code: z.date<-paste(CET$year,