missing-data

Using R to shift values to the left of data.frame [duplicate]

六眼飞鱼酱① 提交于 2019-12-18 09:32:47
问题 This question already has answers here : How to move cells with a value row-wise to the left in a dataframe [duplicate] (5 answers) Shifting non-NA cells to the left (5 answers) Closed last year . Okay, so I have this data.frame: A B C 1 yellow purple <NA> 2 <NA> <NA> yellow 3 orange yellow <NA> 4 orange <NA> brown 5 <NA> brown purple 6 yellow purple pink 7 purple green pink 8 yellow pink green 9 purple orange <NA> 10 purple <NA> brown And I am interested in taking all the missing values from

Using R to shift values to the left of data.frame [duplicate]

£可爱£侵袭症+ 提交于 2019-12-18 09:30:04
问题 This question already has answers here : How to move cells with a value row-wise to the left in a dataframe [duplicate] (5 answers) Shifting non-NA cells to the left (5 answers) Closed last year . Okay, so I have this data.frame: A B C 1 yellow purple <NA> 2 <NA> <NA> yellow 3 orange yellow <NA> 4 orange <NA> brown 5 <NA> brown purple 6 yellow purple pink 7 purple green pink 8 yellow pink green 9 purple orange <NA> 10 purple <NA> brown And I am interested in taking all the missing values from

Winsorizing data by column in pandas with NaN

二次信任 提交于 2019-12-18 08:54:35
问题 I'd like to winsorize several columns of data in a pandas Data Frame. Each column has some NaN, which affects the winsorization, so they need to be removed. The only way I know how to do this is to remove them for all of the data, rather than remove them only column-by-column. MWE: import numpy as np import pandas as pd from scipy.stats.mstats import winsorize # Create Dataframe N, M, P = 10**5, 4, 10**2 dates = pd.date_range('2001-01-01', periods=N//P, freq='D').repeat(P) df = pd.DataFrame

Python: Sliding windowed mean, ignoring missing data

人走茶凉 提交于 2019-12-18 06:17:30
问题 I am currently trying to process an experimental timeseries dataset, which has missing values. I would like to calculate the sliding windowed mean of this dataset along time, while handling nan values. The correct way for me to do it is to compute inside each window the sum of the finite elements and divide it with their number. This nonlinearity forces me to use non convolutional methods to face this problem, thus I have a severe time bottleneck in this part of the process. As a code example

Error in na.fail.default: missing values in object - but no missing values

流过昼夜 提交于 2019-12-17 19:58:05
问题 I am trying to run a lme model with these data: tot_nochc=runif(10,1,15) cor_partner=factor(c(1,1,0,1,0,0,0,0,1,0)) age=runif(10,18,75) agecu=age^3 day=factor(c(1,2,2,3,3,NA,NA,4,4,4)) dt=as.data.frame(cbind(tot_nochc,cor_partner,agecu,day)) attach(dt) corpart.lme.1=lme(tot_nochc~cor_partner+agecu+cor_partner *agecu, random = ~cor_partner+agecu+cor_partner *agecu |day, na.exclude(day)) I get this error code: Error in na.fail.default(list(cor_partner = c(1L, 1L, 2L, 1L, 1L, 1L, : missing

Pandas: How to fill null values with mean of a groupby?

╄→гoц情女王★ 提交于 2019-12-17 19:39:17
问题 I have a dataset will some missing data that looks like this: id category value 1 A NaN 2 B NaN 3 A 10.5 4 C NaN 5 A 2.0 6 B 1.0 I need to fill in the nulls to use the data in a model. Every time a category occurs for the first time it is NULL. The way I want to do is for cases like category A and B that have more than one value replace the nulls with the average of that category. And for category C with only single occurrence just fill in the average of the rest of the data. I know that I

Replace missing values with a value from another column

巧了我就是萌 提交于 2019-12-17 19:37:22
问题 If I have: s <- data.frame(ID=c(191, 282, 202, 210), Group=c("", "A", "", "B"), stringsAsFactors=FALSE) s ID Group 1 191 2 282 A 3 202 4 210 B I can replace the empty cells with N like this: ds$Group[ds$Group==""]<-"N" s ID Group 1 191 N 2 282 A 3 202 N 4 210 B But I would need to replace the empty cells with a value from another column. How can I accomplish this?: s ID Group Group2 1 191 D D 2 282 A G 3 202 G G 4 210 B D 回答1: ifelse(test, yes, no) is a handy function to do just that, and it

Replace NaN or missing values with rolling mean or other interpolation

落花浮王杯 提交于 2019-12-17 18:55:20
问题 I have a pandas dataframe with monthly data that I want to compute a 12 months moving average for. Data for for every month of January is missing, however (NaN), so I am using pd.rolling_mean(data["variable"]), 12, center=True) but it just gives me all NaN values. Is there a simple way that I can ignore the NaN values? I understand that in practice this would become a 11-month moving average. The dataframe has other variables which have January data, so I don't want to just throw out the

how to insert missing observations on a data frame

岁酱吖の 提交于 2019-12-17 16:53:43
问题 I have a data that are observations over time. Unfortunately, some large gaps of time points are missing on a treatment. They are not coded as NA and if I make a plot out of them it becomes apparent. My data frame looks like this. The number of samples per time points are irregular. (edit: sorry for not making the example reproducible)s structure(list(A = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,

Filling in missing (blanks) in a data table, per category - backwards and forwards

瘦欲@ 提交于 2019-12-17 07:26:21
问题 I am working with a large data set of billing records for my clinical practice over 11 years . Quite a few of the rows are missing the referring physician. However, using some rules I can quite easily fill them in but do not know how to implement it in data.table under R. I know that there are things such as na.locf in the zoo package and self rolling join in the data.table package. The examples that I have seen are too simplistic and do not help me. Here is some fictitious data to orient you