missing-data

R - Calculate difference (similarity measure) between similar datasets

时光怂恿深爱的人放手 提交于 2020-01-02 11:03:50
问题 I have seen many questions that touch on this topic but haven't yet found an answer. If I have missed a question that does answer this question, please do mark this and point us to the question. Scenario: We have a benchmark dataset, we have imputation methods, we systematically delete values from the benchmark and use two different imputation methods. Thus we have a benchmark, imputedData1 and imputedData2. Question: Is there a function that can produce a number that represents the

Aligning sequences with missing values

末鹿安然 提交于 2020-01-02 07:08:35
问题 The language I'm using is R, but you don't necessarily need to know about R to answer the question. Question: I have a sequence that can be considered the ground truth, and another sequence that is a shifted version of the first, with some missing values. I'd like to know how to align the two. setup I have a sequence ground.truth that is basically a set of times: ground.truth <- rep( seq(1,by=4,length.out=10), 5 ) + rep( seq(0,length.out=5,by=4*10+30), each=10 ) Think of ground.truth as times

Automatically join missing data gaps in Highcharts JS

∥☆過路亽.° 提交于 2020-01-02 02:19:16
问题 I'm currently looking to implement Highcharts JS into my application, using months as the x-axis categories. However, I have gaps in my data, and wish for the chart to automatically connect the gaps. For example, if I don't have any data for March, I want February and April to connect with a linear line. Using the highcharts demo, I have edited the data to demonstrate what currently happens by default: http://jsfiddle.net/kf26t/1/ data: [7.0, 10.0, null, 14.5, 18.2, 21.5, 25.2, 26.5, 23.3, 18

Row-by-row fillna with respect to a specific column?

久未见 提交于 2020-01-01 18:19:32
问题 I have the following pandas dataframe and I would like to fill the NaNs in columns A-C in a row-wise fashion with values from columns D. Is there an explicit way to do this where I can define that all the NaNs should depend row-wise on values in column D? I couldn't find a way to explicitly do this in fillna(). Note that there are additional columns E-Z which have their own NaNs and may have other rules for filling in NaNs, and should be left untouched . A B C D E 158 158 158 177 ... 158 158

Imputer on some Dataframe columns in Python

倖福魔咒の 提交于 2020-01-01 04:43:07
问题 I am learning how to use Imputer on Python. This is my code: df=pd.DataFrame([["XXL", 8, "black", "class 1", 22], ["L", np.nan, "gray", "class 2", 20], ["XL", 10, "blue", "class 2", 19], ["M", np.nan, "orange", "class 1", 17], ["M", 11, "green", "class 3", np.nan], ["M", 7, "red", "class 1", 22]]) df.columns=["size", "price", "color", "class", "boh"] from sklearn.preprocessing import Imputer imp=Imputer(missing_values="NaN", strategy="mean" ) imp.fit(df["price"]) df["price"]=imp.transform(df[

In gnuplot, with “set datafile missing”, how to ignore both “nan” and “-nan”?

纵饮孤独 提交于 2020-01-01 03:19:45
问题 The gnuplot command set datafile missing "nan" tells gnuplot to ignore nan data values in the data file. How to ignore both nan and -nan ? I tried the following in gnuplot, but then the effect of the first statement is overwritten by the next. gnuplot> set datafile missing "-nan" gnuplot> set datafile missing "nan" Is it possible to somewhow embed a grep -v nan in the gnuplot command, or even some kind of regexp to exclude any imaginable non-numerical data? 回答1: It is not possible to use a

Range on a field containing NAs

ε祈祈猫儿з 提交于 2019-12-31 10:04:10
问题 I'm using a data set where the 11th column on a csv file has numeric data. It contains some NA values too. Here is the str of the object: str(dataheart) num [1:4706] 14.3 18.5 18.1 NA NA NA 17.7 18 15.9 NA ... So, as a new student of R, I had expected the result of range(dataheart) to be the min and max values.From looking at the CSV file with data, I know that the min and max are 10.1 and 21.9. But the above returns a vector [1] NA NA Is my understanding of this function incorrect? 回答1: You

R - Replace specific value contents with NA [duplicate]

佐手、 提交于 2019-12-31 07:41:44
问题 This question already has answers here : Replacing character values with NA in a data frame (6 answers) Closed 4 months ago . I have a fairly large data frame that has multiple "-" which represent missing data. The data frame consisted of multiple Excel files, which could not use the "na.strings =" or alternative function, so I had to import them with the "-" representation. How can I replace all "-" in the data frame with NA / missing values? The data frame consists of 200 columns of

Counting not NA's for values of some column for each value of another row [duplicate]

倾然丶 夕夏残阳落幕 提交于 2019-12-31 06:58:37
问题 This question already has answers here : dplyr count non-NA value in group by [duplicate] (3 answers) Closed last year . In R language - I have lets say I have a DF with two columns Fam and Prop both categorical, now Fam has repeated names like Algea, Fungi, etc and column Prop has categorical numbers and NA's. How can I get a table/output that for each value of A it tells me how many values are not. NA example: Fam Prop ------------- Algea one Fungi two Algea NA Algea three Fungi one Fungi

Group values with identical ID into columns without summerizing them in R

两盒软妹~` 提交于 2019-12-31 03:40:44
问题 I have a dataframe that looks like this, but with a lot more Proteins Protein z Irak4 -2.46 Irak4 -0.13 Itk -0.49 Itk 4.22 Itk -0.51 Ras 1.53 For further operations I need the data to be grouped by Proteinname into columns like this. Irak4 Itk Ras -2.46 -0.49 1.53 -0.13 4.22 NA NA -0.51 NA I tried different packages like dplyr or reshape, but did not manage to transform the data into the desired format. Is there any way to achieve this? I think the missing datapoints for some Proteins are the