missing-data | 易学教程

replace missing values in categorical data

阅读更多关于 replace missing values in categorical data

问题 Let's suppose I have a column with categorical data "red" "green" "blue" and empty cells red green red blue NaN I'm sure that the NaN belongs to red green blue, should I replace the NaN by the average of the colors or is a too strong assumption? It will be col1 | col2 | col3 1 0 0 0 1 0 1 0 0 0 0 1 0.5 0.25 0.25 Or even scale the last row but keeping the ratio so these values have less influence? Usually what is the best practice? 0.25 0.125 0.125 回答1: It depends on what you want to do with

Aggregate by group and get count, mean and sd of non-NA values for different data.frame columns

阅读更多关于 Aggregate by group and get count, mean and sd of non-NA values for different data.frame columns

问题 I am having some difficulty counting non-missing values by group through the function below (which also gives sd, and mean): test <- do.call(data.frame, aggregate(. ~ treatment, have, function(x) c(n = sum(!is.na(x)), mean = mean(x), sd = sd(x)))) It ends up giving me the number of non-missing for all columns in the dataframe instead of just a single column. I have been looking through SO for some advice and found this, this, and this helpful, but I can't figure out why the aggregate with the

Merge three list to Dictionary but everything is out of place/not printed

阅读更多关于 Merge three list to Dictionary but everything is out of place/not printed

问题 I asked this question yesterday Merging three lists into into one dictionary The number one answer was the most correct in what I needed but re-looking at MY automobile.txt file I realized two things. It has multiple car makers in the list and only one of each is listed. Every data seems to be out place but a few. I'm so close but so far. Here is what I Have Please Note if it makes a difference This is being read from a .txt file I appended and split the data into three list. I know the data

Fill in missing values in pandas dataframe using mean

阅读更多关于 Fill in missing values in pandas dataframe using mean

问题 datetime 2012-01-01 125.5010 2012-01-02 NaN 2012-01-03 125.5010 2013-01-04 NaN 2013-01-05 125.5010 2013-02-28 125.5010 2014-02-28 125.5010 2016-01-02 125.5010 2016-01-04 125.5010 2016-02-28 NaN I would like to fill in the missig values in this dataframe by using a climatology computed from the dataset i.e fill in missing 28th feb 2016 value by averaging values of 28th feb from other years. How do i do this? 回答1: You can use groupby by month and day and transform with fillna mean: print df

Fill in time series gaps with both LCOF and NOCB methods but acknowledge breaks in time series

阅读更多关于 Fill in time series gaps with both LCOF and NOCB methods but acknowledge breaks in time series

问题 There are edits to this post at the end. I have a large dataset of daily dietary records for a population of individuals. There are data missing at random from each of the individuals. This is an example for one individual (I will eventually generalize this solution to the population): > str(final_daily) 'data.frame': 387 obs. of 10 variables: $ Date : chr "2014-08-13" "2014-08-14" "2014-08-15" "2014-08-16" ... $ MEID.1 : Factor w/ 97 levels "","1","1.1","1.1a",..: NA NA NA 17 24 NA NA NA NA

How to combine two columns of a data-frame with missing data? [duplicate]

阅读更多关于 How to combine two columns of a data-frame with missing data? [duplicate]

问题 This question already has answers here : How to implement coalesce efficiently in R (8 answers) Coalesce two string columns with alternating missing values to one (6 answers) Closed 2 years ago . This is an extension of this earlier question. How can I combine two columns of a data frame as data <- data.frame('a' = c('A','B','C','D','E'), 'x' = c("t",2,NA,NA,NA), 'y' = c(NA,NA,NA,4,"r")) displayed as 'a' 'x' 'y' A t NA B 2 NA C NA NA D NA 4 E NA r to get 'a' 'mycol' A t B 2 C NA D 4 E r I

Removing dates with less than Full observations

阅读更多关于 Removing dates with less than Full observations

问题 I have an xts object that covers 169 days of high frequency 5 minute regular observations, but on some of the days there are missing observations, i.e less than 288 data points. How do I remove these so to have only days with full data points? find days in data ddx = endpoints(dxts, on="days"); days = format(index(dxts)[ddx], "%Y-%m-%d"); for (day in days) { x = dxts[day]; cat('', day, "has", length(x), "records...\n"); } I tried RTAQ::exchangeHoursOnly(dxts, daybegin = "00:00:00", dayend =

“Error in 1:ncol(x) : argument of length 0” when using Amelia in R

阅读更多关于 “Error in 1:ncol(x) : argument of length 0” when using Amelia in R

问题 This question was migrated from Cross Validated because it can be answered on Stack Overflow. Migrated 6 years ago . I am working with panel data. I have well over 6,000 country-year observations, and have specified my Amelia imputation as follows: (CountDependentVariable, m=5, ts="year", cs="cowcode", sqrts=c("OtherCountVariable2", "OtherCount3", "OtherCount4"), ords=c("OrdinalVar1", "Ordinal Variable 2"), lgstc=c("ProportionVariale"), noms=c("NominalVar1"),p2s = 0, idvars = c("country"))

Handling missing data for the main loss, which is present for auxiliary loss

阅读更多关于 Handling missing data for the main loss, which is present for auxiliary loss

问题 I want to construct a Keras model for a dataset with a main target and an auxiliary target. I have data for the auxiliary target for all entries in my dataset, but for the main target I have data only for a subset of all data points. Consider the following example, which is supposed to predict max(min(x1, x2), x3) but for some values it is only given my auxiliary target, min(x1, x2) . from keras.models import Model from keras.optimizers import Adadelta from keras.losses import mean_squared

R : add a column with missing values to a dataframe

阅读更多关于 R : add a column with missing values to a dataframe

问题 I am using financial data and the row names of my main dataframe are dates. > assets[1:3,1:5] ALD SFN TCO KIM CTX 2003-01-03 48.1 23.98 23.5 23 22.34 2003-01-06 48.1 23.98 23.5 23 22.34 2003-01-07 48.1 23.98 23.5 23 22.34 I would like to add a column (here I want to add FOC$close to assets) from a dataframe that is of same type but some dates are missing : > FOC[1:3,1:2] Close Adj.Close 2003-01-03 510 510 2003-01-07 518 518 The missing values should just be NA's, so it would look like that :