missing-data | 易学教程

Pandas - Fill NaN using multiple values

阅读更多关于 Pandas - Fill NaN using multiple values

问题 I have a column ( lets call it Column X) containing around 16000 NaN values. The column has two possible values, 1 or 0 ( so like a binary ) I want to fill the NaN values in column X, but i don't want to use a single value for ALL the NaN entries. say for instance that; i want to fill 50% of the NaN values with '1' and the other 50% with '0'. I have read the ' fillna() ' documentation but i have not found any such relevant information which could satisfy this functionality. I have literally

multiple imputation and multigroup SEM in R

阅读更多关于 multiple imputation and multigroup SEM in R

问题 I want to perform multigroup SEM on imputed data using the R packages mice and semTools , specifically the runMI function that calls Lavaan . I am able to do so when imputing the entire dataset at once, but whilst trawling through stackoverflow/stackexchange I have come across the recommendation to impute data separately for each level of a grouping variable (e.g. men, women), so that the features of each group are preserved (e.g. https://stats.stackexchange.com/questions/149053/questions-on

multiple imputation and multigroup SEM in R

阅读更多关于 multiple imputation and multigroup SEM in R

data.table replace NA with mean for multiple columns and by id

阅读更多关于 data.table replace NA with mean for multiple columns and by id

问题 If I have the following data.table: dat <- data.table("id"=c(1,1,1,1,2,2,2,2), "var1"=c(NA,1,2,2,1,1,2,2), "var2"=c(4,4,4,4,5,5,NA,4), "var3"=c(4,4,4,NA,5,5,5,4)) id var1 var2 var3 1: 1 NA 4 4 2: 1 1 4 4 3: 1 2 4 4 4: 1 2 4 NA 5: 2 1 5 5 6: 2 1 5 5 7: 2 2 NA 5 8: 2 2 4 4 How can I replace the missing values with the mean of each column within id? In my actual data I have many variables which for only ones I wish to replace so how could be done in a general way so that for example it is not

Fill in missing data pandas

阅读更多关于 Fill in missing data pandas

问题 How can I fill in the missing data in this dateframe. Missing values for days when no sales are made. How can I fill in the missing values for days where 0 of an item were sold at a particular store and date? Input Dates Store Item Sales 2017-01-01 Chicago Apple 10 2017-01-02 NewYork Pear 10 2017-01-03 Chicago Apple 10 Output Dates Store Item Sales 2017-01-01 Chicago Apple 10 2017-01-01 Chicago Pear 0 2017-01-02 Chicago Apple 0 2017-01-02 Chicago Pear 0 2017-01-03 Chicago Apple 10 2017-01-03

Fill in missing data pandas

阅读更多关于 Fill in missing data pandas

Fill in missing data pandas

阅读更多关于 Fill in missing data pandas

SAS Proc Import CSV and missing data

阅读更多关于 SAS Proc Import CSV and missing data

问题 So, I'm trying to import some datasets in SAS and join them, the only problem is that I get this error after joining them - proc import datafile='filepath/datasetA.csv' out = dataA dbms= csv replace; run; proc import datafile='filepath\datasetB.csv' out = dataB dbms= csv replace; run; /* combine them all into one dataset*/ data DataC; set &dataA. &dataB; run; ERROR: Variable column_k has been defined as both character and numeric The column in question looks something like this in both of the

SAS Proc Import CSV and missing data

阅读更多关于 SAS Proc Import CSV and missing data

Substituting missing values in Python

阅读更多关于 Substituting missing values in Python

问题 I want to substitute missing values (None) with the last previous known value. This is my code. But it doesn't work. Any suggestions for a better algorithm? t = [[1, 3, None, 5, None], [2, None, None, 3, 1], [4, None, 2, 1, None]] def treat_missing_values(table): for line in table: for value in line: if value == None: value = line[line.index(value)-1] return table print treat_missing_values(t) 回答1: This is probably how I'd do it: >>> def treat_missing_values(table): ... for line in table: ...