missing-data

How to na.locf in R without using additional packages [duplicate]

老子叫甜甜 提交于 2020-01-10 05:57:58
问题 This question already has answers here : propagating data within a vector (5 answers) Closed 6 years ago . Given a vector such as (say) c(2,NA,5,NA,NA,1,NA) the problem is to "last observation carry forward" resulting in vector c(2,2,5,5,5,1,1) . As answered here, na.locf from the zoo package can do this. However, given the simplicity of the problem, and the fact that this is to be performed many times from a "blank" R environment, I would like to do this without loading packages . Is there a

MATLAB: Using interpolation to replace missing values (NaN)

会有一股神秘感。 提交于 2020-01-09 10:47:28
问题 I have cell array each containing a sequence of values as a row vector. The sequences contain some missing values represented by NaN . I would like to replace all NaNs using some sort of interpolation method, how can I can do this in MATLAB? I am also open to other suggestions on how to deal with these missing values. Consider this sample data to illustrate the problem: seq = {randn(1,10); randn(1,7); randn(1,8)}; for i=1:numel(seq) %# simulate some missing values ind = rand( size(seq{i}) ) <

MATLAB: Using interpolation to replace missing values (NaN)

隐身守侯 提交于 2020-01-09 10:46:12
问题 I have cell array each containing a sequence of values as a row vector. The sequences contain some missing values represented by NaN . I would like to replace all NaNs using some sort of interpolation method, how can I can do this in MATLAB? I am also open to other suggestions on how to deal with these missing values. Consider this sample data to illustrate the problem: seq = {randn(1,10); randn(1,7); randn(1,8)}; for i=1:numel(seq) %# simulate some missing values ind = rand( size(seq{i}) ) <

Sample a single row, per column, within a subset of a data frame in R, while following conditions

社会主义新天地 提交于 2020-01-07 03:03:21
问题 As an example of my data, I have GROUP 1 with three rows of data, and GROUP 2 with two rows of data, in a data frame: GROUP VARIABLE 1 VARIABLE 2 VARIABLE 3 1 2 6 5 1 4 NA 1 1 NA 3 8 2 1 NA 2 2 9 NA NA I would like to sample a single variable, per column from GROUP 1, to make a new row representing GROUP 1. I do not want to sample one single and complete row from GROUP 1, but rather the sampling needs to occur individually for each column. I would like to do the same for GROUP 2. Also, the

Fill in missing values (NAs) with values from another dataframe in R

℡╲_俬逩灬. 提交于 2020-01-06 15:28:13
问题 How do I subset missing values in one dataframe with values from another? Let's say I have two datasets: dataset 1 shows the amount of food that is produced by a country each day. country day tonnes of food ## 1 china 1 6 ## 2 china 1 NA ## 3 china 2 2 ## 4 china 2 NA dataset2 is the average amount of food by day country day average tonnes of food ## 1 china 1 6 ## 3 china 2 2 How can I fill in the NAs of dataset1 with the averages from dataset2. I.e. IF is.na(dataset1$tonnes) is TRUE then

Report missing data in database

断了今生、忘了曾经 提交于 2020-01-05 05:41:07
问题 SQL Fiddle I have a dynamic long (>1000) list of components and their respective asset types in Excel. Example: Component Asset Type 0738.D100.L00.55 9211.D108.D07.01_02.02 0738.D100.L00.71 0738.D100.L00.55_04.04 0738.D100.M02.55 0738.D100.M00.60_03.03 0990.OH05.A00.09 0738.D100.M00.60_03.03 Some of these combinations may not exist in the SQL database. I want a query that outputs these combinations. Components and their respective asset type can be requested as follows Select C.Code, AT.Code

introducing a gap in continuous x axis using ggplot

梦想的初衷 提交于 2020-01-05 04:32:08
问题 This is kinda a build-on on my previous post creating an stacked area/bar plot with missing values (all the script I run can be found there). In this post, however, Im asking if its possible to leave a gap in an continuous x axis? I have a time-serie (month-by-month) over a year, but for one sample one month is missing and I would like to show this month as a complete gap in the plot. Almost like plotting a graph for Jan-Aug (Sep is missing) and one for Oct-Dec and merging these with a gap

add exact proportion of random missing values to data.frame

二次信任 提交于 2020-01-03 14:17:09
问题 I would like to add random NA to a data.frame in R. So far I've looked into these questions: R: Randomly insert NAs into dataframe proportionaly How do I add random NAs into a data frame add random missing values to a complete data frame (in R) Many solutions were provided here, but I couldn't find one that comply with these 5 conditions: Add really random NA, and not the same amount by row or by column Work with every class of variable that one can encounter in a data.frame (numeric,

replacing NA values

♀尐吖头ヾ 提交于 2020-01-03 04:35:13
问题 I have a variable that has three values, NA, Yes, MayBe. When I use levels and class function on that variable I get theses values > levels(Data1$Case) "Yes" "May Be" > class(Data1$Case) "factor" I am trying to replace the NA values with No so I use this code Data1$Col1[is.na(Data1$Col1)]= "No" I am getting an error, In `[<-.factor`(`*tmp*`, is.na(Data1$Col1), value = c(NA, : invalid factor level, NA generated I wrote an ifelse statement to replace the NA, Data1$Col1=ifelse(is.na(Data1$Col1_

Efficient solution for forward filling missing values in a pandas dataframe column?

大憨熊 提交于 2020-01-02 21:10:22
问题 I need to forward fill values in a column of a dataframe within groups. I should note that the first value in a group is never missing by construction. I have the following solutions at the moment. df = pd.DataFrame({'a': [1,1,2,2,2], 'b': [1, np.nan, 2, np.nan, np.nan]}) # desired output a b 1 1 1 1 2 2 2 2 2 2 Here are the three solutions that I've tried so far. # really slow solutions df['b'] = df.groupby('a')['b'].transform(lambda x: x.fillna(method='ffill')) df['b'] = df.groupby('a')['b'