missing-data

replace missing values in categorical data

可紊 提交于 2019-12-11 17:28:21
问题 Let's suppose I have a column with categorical data "red" "green" "blue" and empty cells red green red blue NaN I'm sure that the NaN belongs to red green blue, should I replace the NaN by the average of the colors or is a too strong assumption? It will be col1 | col2 | col3 1 0 0 0 1 0 1 0 0 0 0 1 0.5 0.25 0.25 Or even scale the last row but keeping the ratio so these values have less influence? Usually what is the best practice? 0.25 0.125 0.125 回答1: It depends on what you want to do with

Aggregate by group and get count, mean and sd of non-NA values for different data.frame columns

别等时光非礼了梦想. 提交于 2019-12-11 16:03:14
问题 I am having some difficulty counting non-missing values by group through the function below (which also gives sd, and mean): test <- do.call(data.frame, aggregate(. ~ treatment, have, function(x) c(n = sum(!is.na(x)), mean = mean(x), sd = sd(x)))) It ends up giving me the number of non-missing for all columns in the dataframe instead of just a single column. I have been looking through SO for some advice and found this, this, and this helpful, but I can't figure out why the aggregate with the

Merge three list to Dictionary but everything is out of place/not printed

泪湿孤枕 提交于 2019-12-11 14:26:34
问题 I asked this question yesterday Merging three lists into into one dictionary The number one answer was the most correct in what I needed but re-looking at MY automobile.txt file I realized two things. It has multiple car makers in the list and only one of each is listed. Every data seems to be out place but a few. I'm so close but so far. Here is what I Have Please Note if it makes a difference This is being read from a .txt file I appended and split the data into three list. I know the data

Fill in missing values in pandas dataframe using mean

做~自己de王妃 提交于 2019-12-11 14:08:51
问题 datetime 2012-01-01 125.5010 2012-01-02 NaN 2012-01-03 125.5010 2013-01-04 NaN 2013-01-05 125.5010 2013-02-28 125.5010 2014-02-28 125.5010 2016-01-02 125.5010 2016-01-04 125.5010 2016-02-28 NaN I would like to fill in the missig values in this dataframe by using a climatology computed from the dataset i.e fill in missing 28th feb 2016 value by averaging values of 28th feb from other years. How do i do this? 回答1: You can use groupby by month and day and transform with fillna mean: print df

Fill in time series gaps with both LCOF and NOCB methods but acknowledge breaks in time series

孤者浪人 提交于 2019-12-11 12:50:04
问题 There are edits to this post at the end. I have a large dataset of daily dietary records for a population of individuals. There are data missing at random from each of the individuals. This is an example for one individual (I will eventually generalize this solution to the population): > str(final_daily) 'data.frame': 387 obs. of 10 variables: $ Date : chr "2014-08-13" "2014-08-14" "2014-08-15" "2014-08-16" ... $ MEID.1 : Factor w/ 97 levels "","1","1.1","1.1a",..: NA NA NA 17 24 NA NA NA NA

How to combine two columns of a data-frame with missing data? [duplicate]

时光怂恿深爱的人放手 提交于 2019-12-11 10:32:58
问题 This question already has answers here : How to implement coalesce efficiently in R (8 answers) Coalesce two string columns with alternating missing values to one (6 answers) Closed 2 years ago . This is an extension of this earlier question. How can I combine two columns of a data frame as data <- data.frame('a' = c('A','B','C','D','E'), 'x' = c("t",2,NA,NA,NA), 'y' = c(NA,NA,NA,4,"r")) displayed as 'a' 'x' 'y' A t NA B 2 NA C NA NA D NA 4 E NA r to get 'a' 'mycol' A t B 2 C NA D 4 E r I

Removing dates with less than Full observations

为君一笑 提交于 2019-12-11 09:32:35
问题 I have an xts object that covers 169 days of high frequency 5 minute regular observations, but on some of the days there are missing observations, i.e less than 288 data points. How do I remove these so to have only days with full data points? find days in data ddx = endpoints(dxts, on="days"); days = format(index(dxts)[ddx], "%Y-%m-%d"); for (day in days) { x = dxts[day]; cat('', day, "has", length(x), "records...\n"); } I tried RTAQ::exchangeHoursOnly(dxts, daybegin = "00:00:00", dayend =

“Error in 1:ncol(x) : argument of length 0” when using Amelia in R

為{幸葍}努か 提交于 2019-12-11 09:01:34
问题 This question was migrated from Cross Validated because it can be answered on Stack Overflow. Migrated 6 years ago . I am working with panel data. I have well over 6,000 country-year observations, and have specified my Amelia imputation as follows: (CountDependentVariable, m=5, ts="year", cs="cowcode", sqrts=c("OtherCountVariable2", "OtherCount3", "OtherCount4"), ords=c("OrdinalVar1", "Ordinal Variable 2"), lgstc=c("ProportionVariale"), noms=c("NominalVar1"),p2s = 0, idvars = c("country"))

Handling missing data for the main loss, which is present for auxiliary loss

ε祈祈猫儿з 提交于 2019-12-11 08:56:07
问题 I want to construct a Keras model for a dataset with a main target and an auxiliary target. I have data for the auxiliary target for all entries in my dataset, but for the main target I have data only for a subset of all data points. Consider the following example, which is supposed to predict max(min(x1, x2), x3) but for some values it is only given my auxiliary target, min(x1, x2) . from keras.models import Model from keras.optimizers import Adadelta from keras.losses import mean_squared

R : add a column with missing values to a dataframe

百般思念 提交于 2019-12-11 08:08:08
问题 I am using financial data and the row names of my main dataframe are dates. > assets[1:3,1:5] ALD SFN TCO KIM CTX 2003-01-03 48.1 23.98 23.5 23 22.34 2003-01-06 48.1 23.98 23.5 23 22.34 2003-01-07 48.1 23.98 23.5 23 22.34 I would like to add a column (here I want to add FOC$close to assets) from a dataframe that is of same type but some dates are missing : > FOC[1:3,1:2] Close Adj.Close 2003-01-03 510 510 2003-01-07 518 518 The missing values should just be NA's, so it would look like that :