missing-data | 易学教程

R plotting a dataset with NA Values [duplicate]

阅读更多关于 R plotting a dataset with NA Values [duplicate]

问题 This question already has answers here : How to connect dots where there are missing values? (4 answers) Closed 6 years ago . I'm trying to plot a dataset consisting of numbers and some NA entries in R. V1,V2,V3 2, 4, 3 NA, 5, 4 NA,NA,NA NA, 7, 3 6, 6, 9 Should return the same lines in the plot, as if I had entered: V1,V2,V3 2, 4, 3 3, 5, 4 4, 6, 3.5 5, 7, 3 6, 6, 9 What I need R to do is basically plotting the dataset as points, an then connect these points by straight lines, which - due to

Imputing missing values using ARIMA model

阅读更多关于 Imputing missing values using ARIMA model

I am trying to impute missing values in a time series with an ARIMA model in R. I tried this code but no success. x <- AirPassengers x[90:100] <- NA fit <- auto.arima(x) fitted(fit)[90:100] ## this is giving me NAs plot(x) lines(fitted(fit), col="red") The fitted model is not imputing the missing values. Any idea on how this is done? fitted gives in-sample one-step forecasts. The "right" way to do what you want is via a Kalman smoother. A rough approximation good enough for most purposes is obtained using the average of the forward and backward forecasts for the missing section. Like this: x <

How to create missing value for repeated measurement data?

阅读更多关于 How to create missing value for repeated measurement data?

问题 I have a data set that not every subject’s observations were observed at the exact same time points, but I want to turn it in to a data set that every one’s observations were observed at the exact same time points (so that I can use it in SAS proc traj). For example, suppose I have dataset "m": id <- c(1,1,1,1,2,2,3,3,3) age <- c(2,3,4,5,3,6,2,5,8) IQ <- c(3,4,5,4,6,5,3,8,10) m <- data.frame(id,age,IQ) > m id age IQ 1 1 2 3 2 1 3 4 3 1 4 5 4 1 5 4 5 2 3 6 6 2 6 5 7 3 2 3 8 3 5 8 9 3 8 10 >

Multidimensional scaling with missing values in dissimilarity matrix

阅读更多关于 Multidimensional scaling with missing values in dissimilarity matrix

问题 I have a dissimilarity matrix on which I would like to perform multidimensional scaling (MDS) using the sklearn.manifold.MDS function. The dissimilarity between some elements in this matrix is not meaningful and I am thus wondering if there is a way to run MDS on a sparse matrix or on a matrix with missing values? According to this question, dissimilarities with 0 are considered as missing values, but I was unable to find this statement in the official documentation. Isn't a dissimilarity

visual structure of a data.frame: locations of NAs and much more

阅读更多关于 visual structure of a data.frame: locations of NAs and much more

问题 I want to represent the structure of a data frame (or matrix, or data.table whatever) on a single plot with color-coding. I guess that could be very useful for many people handling various types of data, to visualize it in a single glance. Perhaps someone have already developed a package to do it, but I couldn't find one (just this). So here is a rough mockup of my "vision", kind of a heatmap, showing in color codes: the NA locations, the class of variables (factors (how many levels?),

R gbm handling of missing values

阅读更多关于 R gbm handling of missing values

问题 Does anyone know how gbm in R handles missing values? I can't seem to find any explanation using google. 回答1: To explain what gbm does with missing predictors, let's first visualize a single tree of a gbm object. Suppose you have a gbm object mygbm . Using pretty.gbm.tree(mygbm, i.tree=1) you can visualize the first tree on mygbm, e.g.: SplitVar SplitCodePred LeftNode RightNode MissingNode ErrorReduction Weight Prediction 0 46 1.629728e+01 1 5 9 26.462908 1585 -4.396393e-06 1 45 1.850000e+01

In gnuplot, with “set datafile missing”, how to ignore both “nan” and “-nan”?

阅读更多关于 In gnuplot, with “set datafile missing”, how to ignore both “nan” and “-nan”?

The gnuplot command set datafile missing "nan" tells gnuplot to ignore nan data values in the data file. How to ignore both nan and -nan ? I tried the following in gnuplot, but then the effect of the first statement is overwritten by the next. gnuplot> set datafile missing "-nan" gnuplot> set datafile missing "nan" Is it possible to somewhow embed a grep -v nan in the gnuplot command, or even some kind of regexp to exclude any imaginable non-numerical data? Christoph It is not possible to use a regexp for set datafile missing , but you can use any program to filter you data before plotting,

Specify different types of missing values (NAs)

阅读更多关于 Specify different types of missing values (NAs)

I'm interested to specify types of missing values. I have data that have different types of missing and I am trying to code these values as missing in R, but I am looking for a solution were I can still distinguish between them. Say I have some data that looks like this, set.seed(667) df <- data.frame(a = sample(c("Don't know/Not sure","Unknown","Refused","Blue", "Red", "Green"), 20, rep=TRUE), b = sample(c(1, 2, 3, 77, 88, 99), 10, rep=TRUE), f = round(rnorm(n=10, mean=.90, sd=.08), digits = 2), g = sample(c("C","M","Y","K"), 10, rep=TRUE) ); df # a b f g # 1 Unknown 2 0.78 M # 2 Refused 2 0

visual structure of a data.frame: locations of NAs and much more

阅读更多关于 visual structure of a data.frame: locations of NAs and much more

I want to represent the structure of a data frame (or matrix, or data.table whatever) on a single plot with color-coding. I guess that could be very useful for many people handling various types of data, to visualize it in a single glance. Perhaps someone have already developed a package to do it, but I couldn't find one (just this ). So here is a rough mockup of my "vision", kind of a heatmap, showing in color codes: the NA locations, the class of variables (factors (how many levels?), numeric (with color gradient, zeros, outliers...), strings) dimensions etc..... So far I have just written a

How to create “NA” for missing data in a time series

阅读更多关于 How to create “NA” for missing data in a time series

问题 I have several files of data that look like this: X code year month day pp 1 4515 1953 6 1 0 2 4515 1953 6 2 0 3 4515 1953 6 3 0 4 4515 1953 6 4 0 5 4515 1953 6 5 3.5 Sometimes there is data missing, but I don't have NAs, the rows simply don't exist. I need to create NAs when the data is missing. I though I could start by identifying when that occurs by converting it to a zoo object and check for strict regularity (I never used zoo before), I used the following code: z.date<-paste(CET$year,