missing-data

Imputing missing values linearly in R

依然范特西╮ 提交于 2019-12-01 05:24:39
I have a data frame with missing values: X Y Z 54 57 57 100 58 58 NA NA NA NA NA NA NA NA NA 60 62 56 NA NA NA NA NA NA 69 62 62 I want to impute the NA values linearly from the known values so that the dataframe looks: X Y Z 54 57 57 100 58 58 90 59 57.5 80 60 57 70 61 56.5 60 62 56 63 62 58 66 62 60 69 60 62 thanks Base R's approxfun() returns a function that will linearly interpolate the data it is handed. ## Make easily reproducible data df <- read.table(text="X Y Z 54 57 57 100 58 58 NA NA NA NA NA NA NA NA NA 60 62 56 NA NA NA NA NA NA 69 62 62", header=T) ## See how this works on a

Impute missing data with mean by group

喜欢而已 提交于 2019-12-01 05:10:58
问题 I have a categorical variable with three levels ( A , B , and C ). I also have a continuous variable with some missing values on it. I would like to replace the NA values with the mean of its group. This is, missing observations from group A has to be replaced with the mean of group A . I know I can just calculate each group's mean and replace missing values, but I'm sure there's another way to do so more efficiently with loops. A <- subset(data, group == "A") mean(A$variable, rm.na = TRUE) A

Imputing missing values linearly in R

不想你离开。 提交于 2019-12-01 01:58:55
问题 I have a data frame with missing values: X Y Z 54 57 57 100 58 58 NA NA NA NA NA NA NA NA NA 60 62 56 NA NA NA NA NA NA 69 62 62 I want to impute the NA values linearly from the known values so that the dataframe looks: X Y Z 54 57 57 100 58 58 90 59 57.5 80 60 57 70 61 56.5 60 62 56 63 62 58 66 62 60 69 60 62 thanks 回答1: Base R's approxfun() returns a function that will linearly interpolate the data it is handed. ## Make easily reproducible data df <- read.table(text="X Y Z 54 57 57 100 58

How to create missing value for repeated measurement data?

和自甴很熟 提交于 2019-11-30 23:08:52
I have a data set that not every subject’s observations were observed at the exact same time points, but I want to turn it in to a data set that every one’s observations were observed at the exact same time points (so that I can use it in SAS proc traj). For example, suppose I have dataset "m": id <- c(1,1,1,1,2,2,3,3,3) age <- c(2,3,4,5,3,6,2,5,8) IQ <- c(3,4,5,4,6,5,3,8,10) m <- data.frame(id,age,IQ) > m id age IQ 1 1 2 3 2 1 3 4 3 1 4 5 4 1 5 4 5 2 3 6 6 2 6 5 7 3 2 3 8 3 5 8 9 3 8 10 > unique(age) [1] 2 3 4 5 6 8 I want to turn m to m2. But I can only do that manually. id2 <- c(1,1,1,1,1,1

Pandas read_csv fills empty values with string 'nan', instead of parsing date

霸气de小男生 提交于 2019-11-30 21:40:46
I assign np.nan to the missing values in a column of a DataFrame. The DataFrame is then written to a csv file using to_csv. The resulting csv file correctly has nothing between the commas for the missing values if I open the file with a text editor. But when I read that csv file back into a DataFrame using read_csv, the missing values become the string 'nan' instead of NaN. As a result, isnull() does not work. For example: In [13]: df Out[13]: index value date 0 975 25.35 nan 1 976 26.28 nan 2 977 26.24 nan 3 978 25.76 nan 4 979 26.08 nan In [14]: df.date.isnull() Out[14]: 0 False 1 False 2

Using R to insert a value for missing data with a value from another data frame

无人久伴 提交于 2019-11-30 15:40:29
All, I have a question that I fear might be too pedestrian to ask here, but searching for it elsewhere is leading me astray. I may not be using the right search terms. I have a panel data frame (country-year) in R with some missing values on a given variable. I'm trying to impute them with the value from another vector in another data frame. Here's an illustration of what I am trying to do. Assume Data is the data frame of interest, which has missing values on a given vector that I'm trying to impute from another donor data frame. It looks like this. country year x 70 1920 9.234 70 1921 9.234

missing value in highcharts line graph results in no line, just points

时光毁灭记忆、已成空白 提交于 2019-11-30 11:38:18
please take a look at this: http://jsfiddle.net/2rNzr/ var chart = new Highcharts.Chart({ chart: { renderTo: 'container' }, xAxis: { categories: ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'] }, series: [{ data: [29.9, '', 106.4, 129.2, 144.0, 176.0, 135.6, 148.5, 216.4, 194.1, 95.6, 54.4] }] }); you'll notice that the data has a blank value in it (the second value), which causes the line graph to display incorrectly. Is this a bug? What is the correct way of specifying a missing value so there will be a gap in the line graph? i.e. I would NOT want the

How do I get a summary count of missing/NaN data by column in 'pandas'?

烂漫一生 提交于 2019-11-30 10:47:32
问题 In R I can quickly see a count of missing data using the summary command, but the equivalent pandas DataFrame method, describe does not report these values. I gather I can do something like len(mydata.index) - mydata.count() to compute the number of missing values for each column, but I wonder if there's a better idiom (or if my approach is even right). 回答1: Both describe and info report the count of non-missing values. In [1]: df = DataFrame(np.random.randn(10,2)) In [2]: df.iloc[3:6,0] = np

Find missing month after grouping with dplyr

流过昼夜 提交于 2019-11-30 09:26:22
问题 I have a data frame with two columns that I am grouping by with dplyr , a column of months (as numerics, e.g. 1 through 12), and several columns with statistical data following that (values unimportant). An example: ID_1 ID_2 month st1 st2 1 1 1 0.5 0.2 1 1 2 0.7 0.9 1 1 3 1.1 1.7 1 1 4 2.6 0.8 1 1 5 1.8 1.3 1 1 6 2.1 2.2 1 1 7 0.5 0.2 1 1 8 0.7 0.9 1 1 9 1.1 1.7 1 1 10 2.6 0.8 1 1 11 1.8 1.3 1 1 12 2.1 2.2 1 2 1 0.5 0.2 1 2 2 0.7 0.9 1 2 3 1.1 1.7 1 2 4 2.6 0.8 1 2 5 1.8 1.3 1 2 6 2.1 2.2 1

str.format() raises KeyError

亡梦爱人 提交于 2019-11-30 05:33:48
The following code raises a KeyError exception: addr_list_formatted = [] addr_list_idx = 0 for addr in addr_list: # addr_list is a list addr_list_idx = addr_list_idx + 1 addr_list_formatted.append(""" "{0}" { "gamedir" "str" "address" "{1}" } """.format(addr_list_idx, addr)) Why? I am using Python 3.1. Lasse Vågsæther Karlsen The problem is those { and } characters you have there that don't specify a key for formatting. You need to double them up, so change your code to: addr_list_formatted.append(""" "{0}" {{ "gamedir" "str" "address" "{1}" }} """.format(addr_list_idx, addr)) 来源: https:/