missing-data

R- create new dataframe variable from subset of two variables with missing data NA

痞子三分冷 提交于 2019-12-13 05:52:57
问题 I have a simple example data frame with two data columns (data1 and data2) and two grouping variables (Measure 1 and 2). Measure 1 and 2 have missing data NA. d <- data.frame(Measure1 = 1:2, Measure2 = 3:4, data1 = 1:10, data2 = 11:20) d$Measure1[4]=NA d$Measure2[8]=NA d Measure1 Measure2 data1 data2 1 1 3 1 11 2 2 4 2 12 3 1 3 3 13 4 NA 4 4 14 5 1 3 5 15 6 2 4 6 16 7 1 3 7 17 8 2 NA 8 18 9 1 3 9 19 10 2 4 10 20 I want to create a new variable ( d$new ) that contains data1, but only for rows

Visualisation of missing-data occurrence frequency by using seaborn

柔情痞子 提交于 2019-12-13 04:17:14
问题 I'd like to create a 24x20 matrix(8 sections each has 60 cells or 6x10) for visualization of frequency of missing-data occurrence through cycles (=each 480-values ) in dataset via panda dataframe and plot it for each columns 'A' , 'B' , 'C' . So far I could map the create csv files and mapped the values in right way in matrix and plot it via sns.heatmap(df.isnull()) after changed the missing-data ( nan & inf ) into 0 or something like 0.01234 which has the least influence on data and in the

Replace missing values with mean (Weka)

孤街醉人 提交于 2019-12-13 02:44:31
问题 in Weka there is a filter called "ReplaceMissingValues" that permit to replace all missing values in a dataset using the mean of each attribute. I'd like to replace missing values, for a certain attribute, using the mean of values that belong to a certain class. For example in a binary dataset I think that is more correct to replace a missing value for an attribute in record that belong to the positive class using the mean calculated with only the records that belong to the positive class. So

How to substitute several NA with values within the DF using if-else in R?

怎甘沉沦 提交于 2019-12-13 02:38:53
问题 thank you for your time. I have the following data (snippet). Its from longitudinal data, reformed to a wide-format-file of work status, each colum represents one month, each row an individual. Code: j1992_12 = c(1, 10, 1, 7, 1, 1) j1993_01 = c( 1, 1, 1, NA, 3, 1) j1993_02 = c( 1, 1, 1, NA, 3, 1) j1993_03 = c( 1, 8, 1, NA, 3, 1) j1993_04 = c( 1, 8, 1, NA, 3, 1) j1993_05 = c( 1, 8, 1, NA, 3, 1) j1993_06 = c( 1, 8, 1, NA, 3, 1) j1993_07 = c( 1, 8, 1, NA, 3, 1) j1993_08 = c( 1, 8, 1, NA, 3, 1)

Missing values for the data to be used in a Neural Network model for prediction

≡放荡痞女 提交于 2019-12-12 16:48:40
问题 I currently have a lot of data that will be used to train a prediction neural network (gigabytes of weather data for major airports around the US). I have data for almost every day, but some airports have missing values in their data. For example, an airport might not have existed before 1995, so I have no data before then for that specific location. Also, some are missing whole years (one might span from 1990 to 2011, missing 2003). What can I do to train with these missing values without

'NaTType' object has no attribute 'days'

左心房为你撑大大i 提交于 2019-12-12 12:20:01
问题 I have a column in my dataset which represents a date in ms and sometimes its values is nan (actually my columns is of type str and sometimes its valus is 'nan' ). I want to compute the epoch in days of this column. The problem is that when doing the difference of two dates: (pd.to_datetime('now') - pd.to_datetime(np.nan)).days if one is nan it is converted to NaT and the difference is of type NaTType which hasn't the attribute days . In my case I would like to have nan as a result. Other

Highcharts: Displaying Linechart with missing datapoints

本小妞迷上赌 提交于 2019-12-12 09:19:09
问题 I am calculating the average-value of properties for each week of the year. And I want to display these information in a line chart (x-Axis is the week of year, y-Axis the average value and the different lines represent different properties). But for any given property I do not necessarily have a datapoint for each week of the year. If I do not have such a datapoint I want my line for this property to interpolate between the datapoints I have. Anyone else run into a similiar issue? 回答1:

R: fill missing value with prior values [duplicate]

吃可爱长大的小学妹 提交于 2019-12-12 02:08:30
问题 This question already has answers here : Replacing NAs with latest non-NA value (15 answers) Closed 2 years ago . I have a dataframe that looks like this: d <- data.frame(county = c("Abilene", rep(NA, 5), "Cook", rep(NA, 4), "Blah", NA, "Allegheny", rep(NA, 3))) county 1 Abilene 2 <NA> 3 <NA> 4 <NA> 5 <NA> 6 <NA> 7 Cook 8 <NA> 9 <NA> 10 <NA> 11 <NA> 12 Blah 13 <NA> 14 Allegheny 15 <NA> 16 <NA> 17 <NA> I want to fill in the <NA> with the value of the previous non-missing county name. In other

How Can I Make Sure All My .CSV Data Gets Imported as NA instead of Blank in R?

為{幸葍}努か 提交于 2019-12-12 01:36:12
问题 This question was migrated from Cross Validated because it can be answered on Stack Overflow. Migrated 5 years ago . In my dataset, I'm using have four assessments I'm trying to predict: 1 [Good] to 4 [Bad]. My model seems to be working using the polr function to predict values using ordered logistic regression -- though it's giving me the 'warning message': In cbind(race, partisanship, sex, age) : number of rows of result is not a multiple of vector length (arg 4) , because there are some

Crosstab query: Getting Null Data for Missing Data from Access DB

冷暖自知 提交于 2019-12-11 18:24:40
问题 I have data in Access Database which contains data for multiple days. But it sometime have missing data for some dates. In example, I have data for myDate Location Price 11/1/2013 South 10 11/1/2013 West 20 11/1/2013 East 10 11/2/2013 South 10 11/2/2013 West 20 11/2/2013 East 10 11/4/2013 South 10 <---- 11/3/2013 Data Missing 11/4/2013 West 30 11/4/2013 East 10 The way I tried to solve it was to find missing date in Access Database, and filled it with Null value using calender table. myDate