missing-data

Pandas - Fill NaN using multiple values

久未见 提交于 2020-06-26 06:01:49
问题 I have a column ( lets call it Column X) containing around 16000 NaN values. The column has two possible values, 1 or 0 ( so like a binary ) I want to fill the NaN values in column X, but i don't want to use a single value for ALL the NaN entries. say for instance that; i want to fill 50% of the NaN values with '1' and the other 50% with '0'. I have read the ' fillna() ' documentation but i have not found any such relevant information which could satisfy this functionality. I have literally

multiple imputation and multigroup SEM in R

匆匆过客 提交于 2020-06-17 15:17:18
问题 I want to perform multigroup SEM on imputed data using the R packages mice and semTools , specifically the runMI function that calls Lavaan . I am able to do so when imputing the entire dataset at once, but whilst trawling through stackoverflow/stackexchange I have come across the recommendation to impute data separately for each level of a grouping variable (e.g. men, women), so that the features of each group are preserved (e.g. https://stats.stackexchange.com/questions/149053/questions-on

multiple imputation and multigroup SEM in R

纵然是瞬间 提交于 2020-06-17 15:17:13
问题 I want to perform multigroup SEM on imputed data using the R packages mice and semTools , specifically the runMI function that calls Lavaan . I am able to do so when imputing the entire dataset at once, but whilst trawling through stackoverflow/stackexchange I have come across the recommendation to impute data separately for each level of a grouping variable (e.g. men, women), so that the features of each group are preserved (e.g. https://stats.stackexchange.com/questions/149053/questions-on

data.table replace NA with mean for multiple columns and by id

一曲冷凌霜 提交于 2020-06-14 06:45:09
问题 If I have the following data.table: dat <- data.table("id"=c(1,1,1,1,2,2,2,2), "var1"=c(NA,1,2,2,1,1,2,2), "var2"=c(4,4,4,4,5,5,NA,4), "var3"=c(4,4,4,NA,5,5,5,4)) id var1 var2 var3 1: 1 NA 4 4 2: 1 1 4 4 3: 1 2 4 4 4: 1 2 4 NA 5: 2 1 5 5 6: 2 1 5 5 7: 2 2 NA 5 8: 2 2 4 4 How can I replace the missing values with the mean of each column within id? In my actual data I have many variables which for only ones I wish to replace so how could be done in a general way so that for example it is not

Fill in missing data pandas

╄→尐↘猪︶ㄣ 提交于 2020-04-06 18:17:33
问题 How can I fill in the missing data in this dateframe. Missing values for days when no sales are made. How can I fill in the missing values for days where 0 of an item were sold at a particular store and date? Input Dates Store Item Sales 2017-01-01 Chicago Apple 10 2017-01-02 NewYork Pear 10 2017-01-03 Chicago Apple 10 Output Dates Store Item Sales 2017-01-01 Chicago Apple 10 2017-01-01 Chicago Pear 0 2017-01-02 Chicago Apple 0 2017-01-02 Chicago Pear 0 2017-01-03 Chicago Apple 10 2017-01-03

Fill in missing data pandas

喜夏-厌秋 提交于 2020-04-06 18:12:58
问题 How can I fill in the missing data in this dateframe. Missing values for days when no sales are made. How can I fill in the missing values for days where 0 of an item were sold at a particular store and date? Input Dates Store Item Sales 2017-01-01 Chicago Apple 10 2017-01-02 NewYork Pear 10 2017-01-03 Chicago Apple 10 Output Dates Store Item Sales 2017-01-01 Chicago Apple 10 2017-01-01 Chicago Pear 0 2017-01-02 Chicago Apple 0 2017-01-02 Chicago Pear 0 2017-01-03 Chicago Apple 10 2017-01-03

Fill in missing data pandas

别等时光非礼了梦想. 提交于 2020-04-06 18:12:00
问题 How can I fill in the missing data in this dateframe. Missing values for days when no sales are made. How can I fill in the missing values for days where 0 of an item were sold at a particular store and date? Input Dates Store Item Sales 2017-01-01 Chicago Apple 10 2017-01-02 NewYork Pear 10 2017-01-03 Chicago Apple 10 Output Dates Store Item Sales 2017-01-01 Chicago Apple 10 2017-01-01 Chicago Pear 0 2017-01-02 Chicago Apple 0 2017-01-02 Chicago Pear 0 2017-01-03 Chicago Apple 10 2017-01-03

SAS Proc Import CSV and missing data

穿精又带淫゛_ 提交于 2020-02-23 06:27:50
问题 So, I'm trying to import some datasets in SAS and join them, the only problem is that I get this error after joining them - proc import datafile='filepath/datasetA.csv' out = dataA dbms= csv replace; run; proc import datafile='filepath\datasetB.csv' out = dataB dbms= csv replace; run; /* combine them all into one dataset*/ data DataC; set &dataA. &dataB; run; ERROR: Variable column_k has been defined as both character and numeric The column in question looks something like this in both of the

SAS Proc Import CSV and missing data

◇◆丶佛笑我妖孽 提交于 2020-02-23 06:26:48
问题 So, I'm trying to import some datasets in SAS and join them, the only problem is that I get this error after joining them - proc import datafile='filepath/datasetA.csv' out = dataA dbms= csv replace; run; proc import datafile='filepath\datasetB.csv' out = dataB dbms= csv replace; run; /* combine them all into one dataset*/ data DataC; set &dataA. &dataB; run; ERROR: Variable column_k has been defined as both character and numeric The column in question looks something like this in both of the

Substituting missing values in Python

泪湿孤枕 提交于 2020-02-02 13:14:07
问题 I want to substitute missing values (None) with the last previous known value. This is my code. But it doesn't work. Any suggestions for a better algorithm? t = [[1, 3, None, 5, None], [2, None, None, 3, 1], [4, None, 2, 1, None]] def treat_missing_values(table): for line in table: for value in line: if value == None: value = line[line.index(value)-1] return table print treat_missing_values(t) 回答1: This is probably how I'd do it: >>> def treat_missing_values(table): ... for line in table: ...