missing-data | 易学教程

Oracle SQL, fill missing value with the closest non-missing

阅读更多关于 Oracle SQL, fill missing value with the closest non-missing

问题 I have a dataset in which I want to fill missing values witht the closest non-missing value. I found two elegant solutions in the answers to this question, but I don't understand why they are not working for me. Table: create table Tab1(data date, V1 number); insert into Tab1 values (date '2000-01-01', 1); insert into Tab1 values (date '2000-02-01', 1); insert into Tab1 values (date '2000-03-01', 1); insert into Tab1 values (date '2000-04-01', 1); insert into Tab1 values (date '2000-05-01',

Lookup with Missing Labels

阅读更多关于 Lookup with Missing Labels

问题 I have a code that uses a dataframe to look up a value (P) given it's column label (X): df_1 = pd.DataFrame({'X': [1,2,3,1,1,2,1,3,2,1]}) df_2 = pd.DataFrame({ 1 : [1,2,3,4,1,2,3,4,1,2], 2 : [4,1,2,3,4,1,2,1,2,3], 3 : [2,3,4,1,2,3,4,1,2,5]}) df_1['P'] = df_2 .lookup(df_1.index, df_1['X']) When I give it a label in df_1 but don't include that label in df_2, like this: df_1 = pd.DataFrame({'X': [7,2,3,1,1,2,1,3,2,1]}) I get: KeyError: 'One or more column labels was not found' How can I skip

Converting nested list with missing values to data frame in R

阅读更多关于 Converting nested list with missing values to data frame in R

问题 I have a list of geocode output from the googleway package (ggmap geocode wouldn't work with an API key) stored in a list, each element of which contains two lists. However, for addresses in which no result was found, the structure of the list is different, frustrating my attempts to convert the list to a dataframe. The structure of a "non-missing" result (created with dput()) is as follows (ignore the gibberish, RStudio doesn't display Cyrillic correctly in the console): structure(list

PHP: Missing records when writing to file

阅读更多关于 PHP: Missing records when writing to file

问题 My telecom vendor is sending me a report each time a message goes out. I have written a very simple PHP script that receive values via HTTP GET. Using fwrite I write the query parameter to a CSV file.The filename is report.csv with the current date as a prefix. Here is the code : <?php error_reporting(E_ALL ^ E_NOTICE); date_default_timezone_set('America/New_York'); //setting a the CSV File $fileDate = date("m-d-Y") ; $filename = $fileDate."_Report.csv"; $directory = "./csv_archive/"; /

VaR calculation with complete missing column

阅读更多关于 VaR calculation with complete missing column

问题 I need to calculate rolling VaR of stock returns. From this post: Using rollapply function for VaR calculation using R , I understand that columns having complete missing cases will give error. But since the starting date and end date of stock returns for various firms are different, it creates missing values when data is converted from long to wide format. Estimation can be done using only rows with no missing values but this leads to serious loss of data. Thus, is there any way to perform

r: complete value in missing date

阅读更多关于 r: complete value in missing date

问题 In R, If I have this data date.hour temp 2014-01-05 20:00:00 16 2014-01-06 20:00:00 14 2014-01-06 22:00:00 18 and with seq I can get a sequence of date time begin <- as.POSIXct('2014-1-5') end <- as.POSIXct('2014-1-7') seq(begin, end, by=2*3600) how I can complete the data to something similar to date.hour temp 2014-01-05 00:00:00 NA 2014-01-05 02:00:00 NA ... 2014-01-05 18:00:00 NA 2014-01-05 20:00:00 16 2014-01-05 22:00:00 NA ... 2014-01-06 20:00:00 18 2014-01-06 22:00:00 14 ... 2014-01-07

Pandas: filling missing values by weighted average in each group

阅读更多关于 Pandas: filling missing values by weighted average in each group

问题 I have a dataFrame where 'value'column has missing values. I'd like to filling missing values by weighted average within each 'name' group. There was post on how to fill the missing values by simple average in each group but not weighted average. Thanks a lot! df = pd.DataFrame({'value': [1, np.nan, 3, 2, 3, 1, 3, np.nan, np.nan],'weight':[3,1,1,2,1,2,2,1,1], 'name': ['A','A', 'A','B','B','B', 'C','C','C']}) name value weight 0 A 1.0 3 1 A NaN 1 2 A 3.0 1 3 B 2.0 2 4 B 3.0 1 5 B 1.0 2 6 C 3.0

Filling a series based on key value pairs

阅读更多关于 Filling a series based on key value pairs

问题 I have a data frame that includes a dictionary of codes and names. Some of the names are blank. I'm trying to fill those blanks based on other examples where the code matches a name. Thus far I have sorted into a df by code but can't figure out how to "fill in the blanks". Here is my code: import pandas as pd import json from pandas.io.json import json_normalize df = pd.read_json('data/world_bank_projects.json') theme= pd.DataFrame([val for pair in df['mjtheme_namecode'].values for val in

Sort data by number of NA's in each line

阅读更多关于 Sort data by number of NA's in each line

问题 I want to sort a data frame that has some missing values. name dist1 dist2 dist3 prop1 prop2 prop3 month2 month5 month10 month25 month50 issue 1 A1 232.0 1462.91 232.0000 728.00 0.370 0.05633453 1188.1 1188.1 1188.1 1188.1 1188.1 Yes 2 A2 142.0 58.26 2847.7690 17.10 0.080 0.07667063 14581.6 15382.0 19510.9 25504.0 NA Yes 3 A3 102.0 1160.94 102.0000 53.40 0.090 0.07667063 144.8 144.8 144.8 291.8 761.4 Yes 4 A4 126.0 1377.23 126.0000 64.30 2.120 0.11040091 366.5 496.8 665.3 NA NA Yes 5 A5 118.0

Return FALSE for duplicated NA values when using the function duplicated()

阅读更多关于 Return FALSE for duplicated NA values when using the function duplicated()

问题 just wondering why duplicated behaves the way it does with NAs: > duplicated(c(NA,NA,NA,1,2,2)) [1] FALSE TRUE TRUE FALSE FALSE TRUE where in fact > NA == NA [1] NA is there a way to achieve that duplicated marks NAs as false, like this? > duplicated(c(NA,NA,NA,1,2,2)) [1] FALSE FALSE FALSE FALSE FALSE TRUE 回答1: You use the argument incomparables for the function duplicated like this : > duplicated(c(NA,NA,NA,1,2,2)) [1] FALSE TRUE TRUE FALSE FALSE TRUE > duplicated(c(NA,NA,NA,1,2,2)