missing-data | 易学教程

R - Calculate difference (similarity measure) between similar datasets

阅读更多关于 R - Calculate difference (similarity measure) between similar datasets

问题 I have seen many questions that touch on this topic but haven't yet found an answer. If I have missed a question that does answer this question, please do mark this and point us to the question. Scenario: We have a benchmark dataset, we have imputation methods, we systematically delete values from the benchmark and use two different imputation methods. Thus we have a benchmark, imputedData1 and imputedData2. Question: Is there a function that can produce a number that represents the

Aligning sequences with missing values

阅读更多关于 Aligning sequences with missing values

问题 The language I'm using is R, but you don't necessarily need to know about R to answer the question. Question: I have a sequence that can be considered the ground truth, and another sequence that is a shifted version of the first, with some missing values. I'd like to know how to align the two. setup I have a sequence ground.truth that is basically a set of times: ground.truth <- rep( seq(1,by=4,length.out=10), 5 ) + rep( seq(0,length.out=5,by=4*10+30), each=10 ) Think of ground.truth as times

Automatically join missing data gaps in Highcharts JS

阅读更多关于 Automatically join missing data gaps in Highcharts JS

问题 I'm currently looking to implement Highcharts JS into my application, using months as the x-axis categories. However, I have gaps in my data, and wish for the chart to automatically connect the gaps. For example, if I don't have any data for March, I want February and April to connect with a linear line. Using the highcharts demo, I have edited the data to demonstrate what currently happens by default: http://jsfiddle.net/kf26t/1/ data: [7.0, 10.0, null, 14.5, 18.2, 21.5, 25.2, 26.5, 23.3, 18

Row-by-row fillna with respect to a specific column?

阅读更多关于 Row-by-row fillna with respect to a specific column?

问题 I have the following pandas dataframe and I would like to fill the NaNs in columns A-C in a row-wise fashion with values from columns D. Is there an explicit way to do this where I can define that all the NaNs should depend row-wise on values in column D? I couldn't find a way to explicitly do this in fillna(). Note that there are additional columns E-Z which have their own NaNs and may have other rules for filling in NaNs, and should be left untouched . A B C D E 158 158 158 177 ... 158 158

Imputer on some Dataframe columns in Python

阅读更多关于 Imputer on some Dataframe columns in Python

问题 I am learning how to use Imputer on Python. This is my code: df=pd.DataFrame([["XXL", 8, "black", "class 1", 22], ["L", np.nan, "gray", "class 2", 20], ["XL", 10, "blue", "class 2", 19], ["M", np.nan, "orange", "class 1", 17], ["M", 11, "green", "class 3", np.nan], ["M", 7, "red", "class 1", 22]]) df.columns=["size", "price", "color", "class", "boh"] from sklearn.preprocessing import Imputer imp=Imputer(missing_values="NaN", strategy="mean" ) imp.fit(df["price"]) df["price"]=imp.transform(df[

In gnuplot, with “set datafile missing”, how to ignore both “nan” and “-nan”?

阅读更多关于 In gnuplot, with “set datafile missing”, how to ignore both “nan” and “-nan”?

问题 The gnuplot command set datafile missing "nan" tells gnuplot to ignore nan data values in the data file. How to ignore both nan and -nan ? I tried the following in gnuplot, but then the effect of the first statement is overwritten by the next. gnuplot> set datafile missing "-nan" gnuplot> set datafile missing "nan" Is it possible to somewhow embed a grep -v nan in the gnuplot command, or even some kind of regexp to exclude any imaginable non-numerical data? 回答1: It is not possible to use a

Range on a field containing NAs

阅读更多关于 Range on a field containing NAs

问题 I'm using a data set where the 11th column on a csv file has numeric data. It contains some NA values too. Here is the str of the object: str(dataheart) num [1:4706] 14.3 18.5 18.1 NA NA NA 17.7 18 15.9 NA ... So, as a new student of R, I had expected the result of range(dataheart) to be the min and max values.From looking at the CSV file with data, I know that the min and max are 10.1 and 21.9. But the above returns a vector [1] NA NA Is my understanding of this function incorrect? 回答1: You

R - Replace specific value contents with NA [duplicate]

阅读更多关于 R - Replace specific value contents with NA [duplicate]

问题 This question already has answers here : Replacing character values with NA in a data frame (6 answers) Closed 4 months ago . I have a fairly large data frame that has multiple "-" which represent missing data. The data frame consisted of multiple Excel files, which could not use the "na.strings =" or alternative function, so I had to import them with the "-" representation. How can I replace all "-" in the data frame with NA / missing values? The data frame consists of 200 columns of

Counting not NA's for values of some column for each value of another row [duplicate]

阅读更多关于 Counting not NA's for values of some column for each value of another row [duplicate]

问题 This question already has answers here : dplyr count non-NA value in group by [duplicate] (3 answers) Closed last year . In R language - I have lets say I have a DF with two columns Fam and Prop both categorical, now Fam has repeated names like Algea, Fungi, etc and column Prop has categorical numbers and NA's. How can I get a table/output that for each value of A it tells me how many values are not. NA example: Fam Prop ------------- Algea one Fungi two Algea NA Algea three Fungi one Fungi

Group values with identical ID into columns without summerizing them in R

阅读更多关于 Group values with identical ID into columns without summerizing them in R

问题 I have a dataframe that looks like this, but with a lot more Proteins Protein z Irak4 -2.46 Irak4 -0.13 Itk -0.49 Itk 4.22 Itk -0.51 Ras 1.53 For further operations I need the data to be grouped by Proteinname into columns like this. Irak4 Itk Ras -2.46 -0.49 1.53 -0.13 4.22 NA NA -0.51 NA I tried different packages like dplyr or reshape, but did not manage to transform the data into the desired format. Is there any way to achieve this? I think the missing datapoints for some Proteins are the