missing-data

Python, Pandas : Return only those rows which have missing values

ぐ巨炮叔叔 提交于 2019-11-28 18:13:49
While working in Pandas in Python... I'm working with a dataset that contains some missing values, and I'd like to return a dataframe which contains only those rows which have missing data. Is there a nice way to do this? (My current method to do this is an inefficient "look to see what index isn't in the dataframe without the missing values, then make a df out of those indices.") metersk You can use any axis=1 to check for least one True per row, then filter with boolean indexing : null_data = df[df.isnull().any(axis=1)] Similar to metersk's answer, null_data = df[np.logical_or.reduce(df

How to replace NA (missing values) in a data frame with neighbouring values

血红的双手。 提交于 2019-11-28 16:54:31
862 2006-05-19 6.241603 5.774208 863 2006-05-20 NA NA 864 2006-05-21 NA NA 865 2006-05-22 6.383929 5.906426 866 2006-05-23 6.782068 6.268758 867 2006-05-24 6.534616 6.013767 868 2006-05-25 6.370312 5.856366 869 2006-05-26 6.225175 5.781617 870 2006-05-27 NA NA I have a data frame x like above with some NA, which i want to fill using neighboring non-NA values like for 2006-05-20 it will be avg of 19&22 How do it is the question? Properly formatted your data looks like this 862 2006-05-19 6.241603 5.774208 863 2006-05-20 NA NA 864 2006-05-21 NA NA 865 2006-05-22 6.383929 5.906426 866 2006-05-23

Delete rows with blank values in one particular column

断了今生、忘了曾经 提交于 2019-11-28 15:43:01
I am working on a large dataset, with some rows with NAs and others with blanks: df <- data.frame(ID = c(1:7), home_pc = c("","CB4 2DT", "NE5 7TH", "BY5 8IB", "DH4 6PB","MP9 7GH","KN4 5GH"), start_pc = c(NA,"Home", "FC5 7YH","Home", "CB3 5TH", "BV6 5PB",NA), end_pc = c(NA,"CB5 4FG","Home","","Home","",NA)) How do I remove the NAs and blanks in one go (in the start_pc and end_pc columns)? I have in the past used: df<- df[-which(is.na(df$start_pc)), ] ... to remove the NAs - is there a similar command to remove the blanks? df[!(is.na(df$start_pc) | df$start_pc==""), ] Andrie It is the same

Function to change Null to NA

僤鯓⒐⒋嵵緔 提交于 2019-11-28 13:31:16
I'm trying to write a function that turns Null values into NA. A summary of one of my column looks like this: a b 12 210 468 I'd like to change the 12 empty values to NA. I also have a few other factor columns for which I'd like to change Null values to NA, so I borrowed some stuff from here and there to come up with this: # change nulls to NAs nullToNA <- function(df){ # split df into numeric & non-numeric functions a<-df[,sapply(df, is.numeric), drop = FALSE] b<-df[,sapply(df, Negate(is.numeric)), drop = FALSE] # Change empty strings to NA b<-b[lapply(b,function(x) levels(x) <- c(levels(x),

Elegant way to report missing values in a data.frame

假装没事ソ 提交于 2019-11-28 13:29:12
问题 Here's a little piece of code I wrote to report variables with missing values from a data frame. I'm trying to think of a more elegant way to do this, one that perhaps returns a data.frame, but I'm stuck: for (Var in names(airquality)) { missing <- sum(is.na(airquality[,Var])) if (missing > 0) { print(c(Var,missing)) } } Edit: I'm dealing with data.frames with dozens to hundreds of variables, so it's key that we only report variables with missing values. 回答1: Just use sapply > sapply

Error in na.fail.default: missing values in object - but no missing values

随声附和 提交于 2019-11-28 11:58:00
I am trying to run a lme model with these data: tot_nochc=runif(10,1,15) cor_partner=factor(c(1,1,0,1,0,0,0,0,1,0)) age=runif(10,18,75) agecu=age^3 day=factor(c(1,2,2,3,3,NA,NA,4,4,4)) dt=as.data.frame(cbind(tot_nochc,cor_partner,agecu,day)) attach(dt) corpart.lme.1=lme(tot_nochc~cor_partner+agecu+cor_partner *agecu, random = ~cor_partner+agecu+cor_partner *agecu |day, na.exclude(day)) I get this error code: Error in na.fail.default(list(cor_partner = c(1L, 1L, 2L, 1L, 1L, 1L, : missing values in object I am aware there are similar questions in the forum. However, in my case: cor_partner has

MATLAB: Using interpolation to replace missing values (NaN)

隐身守侯 提交于 2019-11-28 11:19:22
I have cell array each containing a sequence of values as a row vector. The sequences contain some missing values represented by NaN . I would like to replace all NaNs using some sort of interpolation method, how can I can do this in MATLAB? I am also open to other suggestions on how to deal with these missing values. Consider this sample data to illustrate the problem: seq = {randn(1,10); randn(1,7); randn(1,8)}; for i=1:numel(seq) %# simulate some missing values ind = rand( size(seq{i}) ) < 0.2; seq{i}(ind) = nan; end The resulting sequences: seq{1} ans = -0.50782 -0.32058 NaN -3.0292 -0

Replace missing values with a value from another column

若如初见. 提交于 2019-11-28 10:37:39
If I have: s <- data.frame(ID=c(191, 282, 202, 210), Group=c("", "A", "", "B"), stringsAsFactors=FALSE) s ID Group 1 191 2 282 A 3 202 4 210 B I can replace the empty cells with N like this: ds$Group[ds$Group==""]<-"N" s ID Group 1 191 N 2 282 A 3 202 N 4 210 B But I would need to replace the empty cells with a value from another column. How can I accomplish this?: s ID Group Group2 1 191 D D 2 282 A G 3 202 G G 4 210 B D ifelse(test, yes, no) is a handy function to do just that, and it can be used on vectors. Using your last data.frame: s <- data.frame(ID = c(191, 282, 202, 210), Group = c(""

Replace NaN or missing values with rolling mean or other interpolation

末鹿安然 提交于 2019-11-28 10:09:43
I have a pandas dataframe with monthly data that I want to compute a 12 months moving average for. Data for for every month of January is missing, however (NaN), so I am using pd.rolling_mean(data["variable"]), 12, center=True) but it just gives me all NaN values. Is there a simple way that I can ignore the NaN values? I understand that in practice this would become a 11-month moving average. The dataframe has other variables which have January data, so I don't want to just throw out the January columns and do an 11 month moving average. There are several ways to approach this, and the best

Multivariate LSTM with missing values

杀马特。学长 韩版系。学妹 提交于 2019-11-28 09:59:45
I am working on a Time Series Forecasting problem using LSTM. The input contains several features, so I am using a Multivariate LSTM. The problem is that there are some missing values, for example: Feature 1 Feature 2 ... Feature n 1 2 4 nan 2 5 8 10 3 8 8 5 4 nan 7 7 5 6 nan 12 Instead of interpolating the missing values, that can introduce bias in the results, because sometimes there are a lot of consecutive timestamps with missing values on the same feature, I would like to know if there is a way to let the LSTM learn with the missing values, for example, using a masking layer or something