missing-data

Fill NaN values

梦想的初衷 提交于 2019-12-02 02:50:43
问题 I have a dataframe TIMESTAMP P_ACT_KW PERIODE_TARIF P_SOUSCR 2016-01-01 00:00:00 116 HC 250 2016-01-01 00:10:00 121 HC 250 2016-01-01 00:20:00 121 NaN 250 To use this dataframe, I must to fill the NaN values by (HC or HP) based on this condition: If (hour extracted from TIMESTAMP is in {0,1,2, 3, 4, 5, 22, 23} So I replace NaN by HC, else by HP. I did this function: def prep_data(data): data['PERIODE_TARIF']=np.where(data['PERIODE_TARIF']in (0, 1,2, 3, 4, 5, 22, 23),'HC','HP') return data But

Issue with NA values in R

会有一股神秘感。 提交于 2019-12-02 00:41:21
I feel this should be something easy, I have looked x the internet, but I keep getting error messages. I have done plenty of analytics in the past but am new to R and programming. I have a pretty basic function to calculate means x columns of data: columnmean <-function(y){ nc <- ncol(y) means <- numeric(nc) for(i in 1:nc) { means[i] <- mean(y[,i]) } means } I'm in RStudio and testing it using the included 'airquality' dataset. When I load the AQ dataset and run my function: data("airquality") columnmean(airquality) I get back: NA NA 9.957516 77.882353 6.993464 15.803922 Because the first two

Fill missing value based on probability of occurrence

雨燕双飞 提交于 2019-12-01 23:52:20
This is what my data.table/dataframe looks lke library(data.table) dt <- fread(' STATE ZIP PA 19333 PA 19327 PA 19333 PA NA PA 19355 PA 19333 PA NA PA 19355 PA NA ') I have three missing values in the ZIP column. I want to fill the missing values with nonmissing sample values of ZIPs according to their probability of occuring in the dataset. So for example ZIP 19333 occurs three times in the dataset and ZIP 19355 occurs twice in the dataset and 19327 occurs once. So ZIP 19333 has 50% probability of occurring in the dataset for PA , and 19355 has a 33.33% chance and 19327 has a 16.17% chance of

reshape from base vs dcast from reshape2 with missing values

橙三吉。 提交于 2019-12-01 20:29:15
Whis this data frame, df <- expand.grid(id="01", parameter=c("blood", "saliva"), visit=c("V1", "V2", "V3")) df$value <- c(1:6) df$sex <- rep("f", 6) df > df id parameter visit value sex 1 01 blood V1 1 f 2 01 saliva V1 2 f 3 01 blood V2 3 f 4 01 saliva V2 4 f 5 01 blood V3 5 f 6 01 saliva V3 6 f When I reshape it in the "wide" format, I get identical results with both the base reshape function and the dcast function from reshape2 . reshape(df, timevar="visit", idvar=c("id", "parameter", "sex"), direction="wide") id parameter sex value.V1 value.V2 value.V3 1 01 blood f 1 3 5 2 01 saliva f 2 4 6

SQL Server Interpolate Missing rows

最后都变了- 提交于 2019-12-01 20:28:16
问题 I have the following table which records a value per day. The problem is that sometimes days are missing. I want to write a SQL query that will: Return the missing days Calculate the missing value using linear interpolation So from the following source table: Date Value -------------------- 2010/01/10 10 2010/01/11 15 2010/01/13 25 2010/01/16 40 I want to return: Date Value -------------------- 2010/01/10 10 2010/01/11 15 2010/01/12 20 2010/01/13 25 2010/01/14 30 2010/01/15 35 2010/01/16 40

SQL Server Interpolate Missing rows

独自空忆成欢 提交于 2019-12-01 19:51:47
I have the following table which records a value per day. The problem is that sometimes days are missing. I want to write a SQL query that will: Return the missing days Calculate the missing value using linear interpolation So from the following source table: Date Value -------------------- 2010/01/10 10 2010/01/11 15 2010/01/13 25 2010/01/16 40 I want to return: Date Value -------------------- 2010/01/10 10 2010/01/11 15 2010/01/12 20 2010/01/13 25 2010/01/14 30 2010/01/15 35 2010/01/16 40 Any help would be greatly appreciated. declare @MaxDate date declare @MinDate date select @MaxDate = MAX

pandas - merging with missing values

烈酒焚心 提交于 2019-12-01 15:46:49
There appears to be a quirk with the pandas merge function. It considers NaN values to be equal, and will merge NaN s with other NaN s: >>> foo = DataFrame([ ['a',1,2], ['b',4,5], ['c',7,8], [np.NaN,10,11] ], columns=['id','x','y']) >>> bar = DataFrame([ ['a',3], ['c',9], [np.NaN,12] ], columns=['id','z']) >>> pd.merge(foo, bar, how='left', on='id') Out[428]: id x y z 0 a 1 2 3 1 b 4 5 NaN 2 c 7 8 9 3 NaN 10 11 12 [4 rows x 4 columns] This is unlike any RDB I've seen, normally missing values are treated with agnosticism and won't be merged together as if they are equal. This is especially

Partially merge two datasets and fill in NAs in R

三世轮回 提交于 2019-12-01 15:29:08
I have two datasets a = raw dataset with thousands of observations of different weather events STATE EVTYPE 1 AL WINTER STORM 2 AL TORNADO 3 AL TSTM WIND 4 AL TSTM WIND 5 AL TSTM WIND 6 AL HAIL 7 AL HIGH WIND 8 AL TSTM WIND 9 AL TSTM WIND 10 AL TSTM WIND b = a dictionary table, which has a standard spelling for some weather events. EVTYPE evmatch 1 HIGH SURF ADVISORY <NA> 2 COASTAL FLOOD COASTAL FLOOD 3 FLASH FLOOD FLASH FLOOD 4 LIGHTNING LIGHTNING 5 TSTM WIND <NA> 6 TSTM WIND (G45) <NA> both are merged into df_new by evtype library(dplyr) df_new <- left_join(a, b, by = c("EVTYPE")) STATE

pandas - merging with missing values

北城余情 提交于 2019-12-01 15:18:32
问题 There appears to be a quirk with the pandas merge function. It considers NaN values to be equal, and will merge NaN s with other NaN s: >>> foo = DataFrame([ ['a',1,2], ['b',4,5], ['c',7,8], [np.NaN,10,11] ], columns=['id','x','y']) >>> bar = DataFrame([ ['a',3], ['c',9], [np.NaN,12] ], columns=['id','z']) >>> pd.merge(foo, bar, how='left', on='id') Out[428]: id x y z 0 a 1 2 3 1 b 4 5 NaN 2 c 7 8 9 3 NaN 10 11 12 [4 rows x 4 columns] This is unlike any RDB I've seen, normally missing values