missing-data | 易学教程

Convert NA into a factor level

阅读更多关于 Convert NA into a factor level

问题 I have a vector with NA values that I would like to replace by a new factor level NA . a = as.factor(as.character(c(1, 1, 2, 2, 3, NA))) a [1] 1 1 2 2 3 <NA> Levels: 1 2 3 This works, but it seems like a strange way to do it. a = as.factor(ifelse(is.na(a), "NA", a)) class(a) [1] "factor" This is the expected output: a [1] 1 1 2 2 3 NA Levels: 1 2 3 NA 回答1: You can use addNA() . x <- c(1, 1, 2, 2, 3, NA) addNA(x) # [1] 1 1 2 2 3 <NA> # Levels: 1 2 3 <NA> This is basically a convenience

Randomly insert NAs into dataframe proportionaly

阅读更多关于 Randomly insert NAs into dataframe proportionaly

问题 I have a complete dataframe. I want to 20% of the values in the dataframe to be replaced by NAs to simulate random missing data. A <- c(1:10) B <- c(11:20) C <- c(21:30) df<- data.frame(A,B,C) Can anyone suggest a quick way of doing that? 回答1: df <- data.frame(A = 1:10, B = 11:20, c = 21:30) head(df) ## A B c ## 1 1 11 21 ## 2 2 12 22 ## 3 3 13 23 ## 4 4 14 24 ## 5 5 15 25 ## 6 6 16 26 as.data.frame(lapply(df, function(cc) cc[ sample(c(TRUE, NA), prob = c(0.85, 0.15), size = length(cc),

Remove NA values from a vector

阅读更多关于 Remove NA values from a vector

问题 I have a huge vector which has a couple of NA values, and I'm trying to find the max value in that vector (the vector is all numbers), but I can't do this because of the NA values. How can I remove the NA values so that I can compute the max? 回答1: Trying ?max , you'll see that it actually has a na.rm = argument, set by default to FALSE . (That's the common default for many other R functions, including sum() , mean() , etc.) Setting na.rm=TRUE does just what you're asking for: d <- c(1, 100,

How to find position of missing values in a vector

阅读更多关于 How to find position of missing values in a vector

问题 What features does the R language have to find missing values in dataframe or at least, how to know that the dataframe has missing values? 回答1: x = matrix(rep(c(NA, 1,NA), 3), ncol=3, nrow=3) print(x) [,1] [,2] [,3] [1,] NA NA NA [2,] 1 1 1 [3,] NA NA NA matrix of boolean values: is the value NA is.na(x) [,1] [,2] [,3] [1,] TRUE TRUE TRUE [2,] FALSE FALSE FALSE [3,] TRUE TRUE TRUE indices of NA values: which(is.na(x), arr.ind = T) row col [1,] 1 1 [2,] 3 1 [3,] 1 2 [4,] 3 2 [5,] 1 3 [6,] 3 3

Working with missing values in Deedle Time Series in F# (2)

阅读更多关于 Working with missing values in Deedle Time Series in F# (2)

问题 This question is related to Working with missing values in Deedle Time Series in F# (1) Suppose i have a Series<'K,'T opt> with some missing values For example i have obtained a series series4;; val it : Series<int,int opt> = 1 -> 1 2 -> 2 3 -> 3 4 -> <missing> I could have got it this way: let series1 = Series.ofObservations [(1,1);(2,2);(3,3)] let series2 = Series.ofObservations [(1,2);(2,2);(3,1);(4,4)] let series3 = series1.Zip(series2,JoinKind.Outer);; let series4 = series3 |> Series

How to treat NaN or non aligned values as 1s or 0s in multiplying pandas DataFrames

阅读更多关于 How to treat NaN or non aligned values as 1s or 0s in multiplying pandas DataFrames

问题 I want to treat non aligned or missing (NaN, Inf, -Inf) values as 1s or 0s. df1 = pd.DataFrame({"x":[1, 2, 3, 4, 5], "y":[3, 4, 5, 6, 7]}, index=['a', 'b', 'c', 'd', 'e']) df2 = pd.DataFrame({"y":[1, NaN, 3, 4, 5], "z":[3, 4, 5, 6, 7]}, index=['b', 'c', 'd', 'e', 'f']) Above code results in the following df1 * df2 x y z a NaN NaN NaN b NaN 4.0 NaN c NaN NaN NaN d NaN 18.0 NaN e NaN 28.0 NaN f NaN NaN NaN I want to ignore NaNs and also treat non aligned values as 1s in either the left or right

Filling in missing values with forward-backward method with lag in SAS

阅读更多关于 Filling in missing values with forward-backward method with lag in SAS

问题 Assume that you have a table with user name, counter and score for each counter. data have; input user $ counter score; cards; A 1 . A 2 . A 3 40 A 4 . A 5 20 A 6 . B 1 30 B 2 . C 1 . C 2 . C 3 . ; run; Some scores are missing beween some counters, and you want to put the same score as previous counter. So the result will look like below: A 1 40 A 2 40 A 3 40 A 4 40 A 5 20 A 6 20 B 1 30 B 2 30 C 1 . C 2 . C 3 . I managed to fill the missing score values forward by using the lag function like

Handling missing values in R

阅读更多关于 Handling missing values in R

问题 I have the data as following Name Location profits loss sales address revenue stocks AA London 20 30 2 Lheigts,20109 54 45 BB Boston NA NA NA KicK,30029 NA NA CC Mumbai NA 2 NA New, 10023 43 NA I would like to have output like this which will delete cases for which profits, loss, sales, revenue, stocks are completely missing. Any ideas? Name Location profits loss sales address revenue stocks AA London 20 30 2 Lheigts,20109 54 45 CC Mumbai NA 2 NA New, 10023 43 NA 回答1: Try df1[rowSums(!is.na

How do I replace all NA with mean in R? [duplicate]

阅读更多关于 How do I replace all NA with mean in R? [duplicate]

问题 This question already has answers here : Replace missing values with column mean (11 answers) Closed 2 years ago . I have over 1500 columns in my dataset and 100+ of them contains at least one NA. I know I can replace NAs in a single column by d$var[is.na(d$var)] <- mean(d$var, na.rm=TRUE) but how do I do this too ALL the NAs in my dataset? Thank you! 回答1: We can use na.aggregate from zoo . Loop through the columns of dataset (assuming all the columns are numeric ), apply the na.aggregate to

NA when trying to summarize a subset of data (R)

阅读更多关于 NA when trying to summarize a subset of data (R)

问题 Whole vector is ok and has no NAs : > summary(data$marks) Min. 1st Qu. Median Mean 3rd Qu. Max. 1.00 6.00 6.00 6.02 7.00 7.00 > length(data$marks) [1] 2528 However, when trying to calculate a subset using a criteria I receive lots of NAs : > summary(data[data$student=="John",]$marks) Min. 1st Qu. Median Mean 3rd Qu. Max. NA's 1.000 6.000 6.000 6.169 7.000 7.000 464 > length(data[data$student=="John",]$marks) [1] 523 回答1: I think the problem is that you have missing values for student . As a