missing-data

Oracle SQL, fill missing value with the closest non-missing

怎甘沉沦 提交于 2019-12-24 07:36:32
问题 I have a dataset in which I want to fill missing values witht the closest non-missing value. I found two elegant solutions in the answers to this question, but I don't understand why they are not working for me. Table: create table Tab1(data date, V1 number); insert into Tab1 values (date '2000-01-01', 1); insert into Tab1 values (date '2000-02-01', 1); insert into Tab1 values (date '2000-03-01', 1); insert into Tab1 values (date '2000-04-01', 1); insert into Tab1 values (date '2000-05-01',

Lookup with Missing Labels

这一生的挚爱 提交于 2019-12-24 06:50:47
问题 I have a code that uses a dataframe to look up a value (P) given it's column label (X): df_1 = pd.DataFrame({'X': [1,2,3,1,1,2,1,3,2,1]}) df_2 = pd.DataFrame({ 1 : [1,2,3,4,1,2,3,4,1,2], 2 : [4,1,2,3,4,1,2,1,2,3], 3 : [2,3,4,1,2,3,4,1,2,5]}) df_1['P'] = df_2 .lookup(df_1.index, df_1['X']) When I give it a label in df_1 but don't include that label in df_2, like this: df_1 = pd.DataFrame({'X': [7,2,3,1,1,2,1,3,2,1]}) I get: KeyError: 'One or more column labels was not found' How can I skip

Converting nested list with missing values to data frame in R

若如初见. 提交于 2019-12-24 06:48:20
问题 I have a list of geocode output from the googleway package (ggmap geocode wouldn't work with an API key) stored in a list, each element of which contains two lists. However, for addresses in which no result was found, the structure of the list is different, frustrating my attempts to convert the list to a dataframe. The structure of a "non-missing" result (created with dput()) is as follows (ignore the gibberish, RStudio doesn't display Cyrillic correctly in the console): structure(list

PHP: Missing records when writing to file

人盡茶涼 提交于 2019-12-24 05:43:55
问题 My telecom vendor is sending me a report each time a message goes out. I have written a very simple PHP script that receive values via HTTP GET. Using fwrite I write the query parameter to a CSV file.The filename is report.csv with the current date as a prefix. Here is the code : <?php error_reporting(E_ALL ^ E_NOTICE); date_default_timezone_set('America/New_York'); //setting a the CSV File $fileDate = date("m-d-Y") ; $filename = $fileDate."_Report.csv"; $directory = "./csv_archive/"; /

VaR calculation with complete missing column

我只是一个虾纸丫 提交于 2019-12-24 04:34:03
问题 I need to calculate rolling VaR of stock returns. From this post: Using rollapply function for VaR calculation using R , I understand that columns having complete missing cases will give error. But since the starting date and end date of stock returns for various firms are different, it creates missing values when data is converted from long to wide format. Estimation can be done using only rows with no missing values but this leads to serious loss of data. Thus, is there any way to perform

r: complete value in missing date

你说的曾经没有我的故事 提交于 2019-12-24 03:55:07
问题 In R, If I have this data date.hour temp 2014-01-05 20:00:00 16 2014-01-06 20:00:00 14 2014-01-06 22:00:00 18 and with seq I can get a sequence of date time begin <- as.POSIXct('2014-1-5') end <- as.POSIXct('2014-1-7') seq(begin, end, by=2*3600) how I can complete the data to something similar to date.hour temp 2014-01-05 00:00:00 NA 2014-01-05 02:00:00 NA ... 2014-01-05 18:00:00 NA 2014-01-05 20:00:00 16 2014-01-05 22:00:00 NA ... 2014-01-06 20:00:00 18 2014-01-06 22:00:00 14 ... 2014-01-07

Pandas: filling missing values by weighted average in each group

别说谁变了你拦得住时间么 提交于 2019-12-23 22:19:25
问题 I have a dataFrame where 'value'column has missing values. I'd like to filling missing values by weighted average within each 'name' group. There was post on how to fill the missing values by simple average in each group but not weighted average. Thanks a lot! df = pd.DataFrame({'value': [1, np.nan, 3, 2, 3, 1, 3, np.nan, np.nan],'weight':[3,1,1,2,1,2,2,1,1], 'name': ['A','A', 'A','B','B','B', 'C','C','C']}) name value weight 0 A 1.0 3 1 A NaN 1 2 A 3.0 1 3 B 2.0 2 4 B 3.0 1 5 B 1.0 2 6 C 3.0

Filling a series based on key value pairs

主宰稳场 提交于 2019-12-23 22:12:43
问题 I have a data frame that includes a dictionary of codes and names. Some of the names are blank. I'm trying to fill those blanks based on other examples where the code matches a name. Thus far I have sorted into a df by code but can't figure out how to "fill in the blanks". Here is my code: import pandas as pd import json from pandas.io.json import json_normalize df = pd.read_json('data/world_bank_projects.json') theme= pd.DataFrame([val for pair in df['mjtheme_namecode'].values for val in

Sort data by number of NA's in each line

你。 提交于 2019-12-23 12:57:30
问题 I want to sort a data frame that has some missing values. name dist1 dist2 dist3 prop1 prop2 prop3 month2 month5 month10 month25 month50 issue 1 A1 232.0 1462.91 232.0000 728.00 0.370 0.05633453 1188.1 1188.1 1188.1 1188.1 1188.1 Yes 2 A2 142.0 58.26 2847.7690 17.10 0.080 0.07667063 14581.6 15382.0 19510.9 25504.0 NA Yes 3 A3 102.0 1160.94 102.0000 53.40 0.090 0.07667063 144.8 144.8 144.8 291.8 761.4 Yes 4 A4 126.0 1377.23 126.0000 64.30 2.120 0.11040091 366.5 496.8 665.3 NA NA Yes 5 A5 118.0

Return FALSE for duplicated NA values when using the function duplicated()

纵饮孤独 提交于 2019-12-23 09:19:29
问题 just wondering why duplicated behaves the way it does with NAs: > duplicated(c(NA,NA,NA,1,2,2)) [1] FALSE TRUE TRUE FALSE FALSE TRUE where in fact > NA == NA [1] NA is there a way to achieve that duplicated marks NAs as false, like this? > duplicated(c(NA,NA,NA,1,2,2)) [1] FALSE FALSE FALSE FALSE FALSE TRUE 回答1: You use the argument incomparables for the function duplicated like this : > duplicated(c(NA,NA,NA,1,2,2)) [1] FALSE TRUE TRUE FALSE FALSE TRUE > duplicated(c(NA,NA,NA,1,2,2)