missing-data | 易学教程

Imputing missing values linearly in R

阅读更多关于 Imputing missing values linearly in R

I have a data frame with missing values: X Y Z 54 57 57 100 58 58 NA NA NA NA NA NA NA NA NA 60 62 56 NA NA NA NA NA NA 69 62 62 I want to impute the NA values linearly from the known values so that the dataframe looks: X Y Z 54 57 57 100 58 58 90 59 57.5 80 60 57 70 61 56.5 60 62 56 63 62 58 66 62 60 69 60 62 thanks Base R's approxfun() returns a function that will linearly interpolate the data it is handed. ## Make easily reproducible data df <- read.table(text="X Y Z 54 57 57 100 58 58 NA NA NA NA NA NA NA NA NA 60 62 56 NA NA NA NA NA NA 69 62 62", header=T) ## See how this works on a

Impute missing data with mean by group

阅读更多关于 Impute missing data with mean by group

问题 I have a categorical variable with three levels ( A , B , and C ). I also have a continuous variable with some missing values on it. I would like to replace the NA values with the mean of its group. This is, missing observations from group A has to be replaced with the mean of group A . I know I can just calculate each group's mean and replace missing values, but I'm sure there's another way to do so more efficiently with loops. A <- subset(data, group == "A") mean(A$variable, rm.na = TRUE) A

Imputing missing values linearly in R

阅读更多关于 Imputing missing values linearly in R

问题 I have a data frame with missing values: X Y Z 54 57 57 100 58 58 NA NA NA NA NA NA NA NA NA 60 62 56 NA NA NA NA NA NA 69 62 62 I want to impute the NA values linearly from the known values so that the dataframe looks: X Y Z 54 57 57 100 58 58 90 59 57.5 80 60 57 70 61 56.5 60 62 56 63 62 58 66 62 60 69 60 62 thanks 回答1: Base R's approxfun() returns a function that will linearly interpolate the data it is handed. ## Make easily reproducible data df <- read.table(text="X Y Z 54 57 57 100 58

How to create missing value for repeated measurement data?

阅读更多关于 How to create missing value for repeated measurement data?

I have a data set that not every subject’s observations were observed at the exact same time points, but I want to turn it in to a data set that every one’s observations were observed at the exact same time points (so that I can use it in SAS proc traj). For example, suppose I have dataset "m": id <- c(1,1,1,1,2,2,3,3,3) age <- c(2,3,4,5,3,6,2,5,8) IQ <- c(3,4,5,4,6,5,3,8,10) m <- data.frame(id,age,IQ) > m id age IQ 1 1 2 3 2 1 3 4 3 1 4 5 4 1 5 4 5 2 3 6 6 2 6 5 7 3 2 3 8 3 5 8 9 3 8 10 > unique(age) [1] 2 3 4 5 6 8 I want to turn m to m2. But I can only do that manually. id2 <- c(1,1,1,1,1,1

Pandas read_csv fills empty values with string 'nan', instead of parsing date

阅读更多关于 Pandas read_csv fills empty values with string 'nan', instead of parsing date

I assign np.nan to the missing values in a column of a DataFrame. The DataFrame is then written to a csv file using to_csv. The resulting csv file correctly has nothing between the commas for the missing values if I open the file with a text editor. But when I read that csv file back into a DataFrame using read_csv, the missing values become the string 'nan' instead of NaN. As a result, isnull() does not work. For example: In [13]: df Out[13]: index value date 0 975 25.35 nan 1 976 26.28 nan 2 977 26.24 nan 3 978 25.76 nan 4 979 26.08 nan In [14]: df.date.isnull() Out[14]: 0 False 1 False 2

Using R to insert a value for missing data with a value from another data frame

阅读更多关于 Using R to insert a value for missing data with a value from another data frame

All, I have a question that I fear might be too pedestrian to ask here, but searching for it elsewhere is leading me astray. I may not be using the right search terms. I have a panel data frame (country-year) in R with some missing values on a given variable. I'm trying to impute them with the value from another vector in another data frame. Here's an illustration of what I am trying to do. Assume Data is the data frame of interest, which has missing values on a given vector that I'm trying to impute from another donor data frame. It looks like this. country year x 70 1920 9.234 70 1921 9.234

missing value in highcharts line graph results in no line, just points

阅读更多关于 missing value in highcharts line graph results in no line, just points

please take a look at this: http://jsfiddle.net/2rNzr/ var chart = new Highcharts.Chart({ chart: { renderTo: 'container' }, xAxis: { categories: ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'] }, series: [{ data: [29.9, '', 106.4, 129.2, 144.0, 176.0, 135.6, 148.5, 216.4, 194.1, 95.6, 54.4] }] }); you'll notice that the data has a blank value in it (the second value), which causes the line graph to display incorrectly. Is this a bug? What is the correct way of specifying a missing value so there will be a gap in the line graph? i.e. I would NOT want the

How do I get a summary count of missing/NaN data by column in 'pandas'?

阅读更多关于 How do I get a summary count of missing/NaN data by column in 'pandas'?

问题 In R I can quickly see a count of missing data using the summary command, but the equivalent pandas DataFrame method, describe does not report these values. I gather I can do something like len(mydata.index) - mydata.count() to compute the number of missing values for each column, but I wonder if there's a better idiom (or if my approach is even right). 回答1: Both describe and info report the count of non-missing values. In [1]: df = DataFrame(np.random.randn(10,2)) In [2]: df.iloc[3:6,0] = np

Find missing month after grouping with dplyr

阅读更多关于 Find missing month after grouping with dplyr

问题 I have a data frame with two columns that I am grouping by with dplyr , a column of months (as numerics, e.g. 1 through 12), and several columns with statistical data following that (values unimportant). An example: ID_1 ID_2 month st1 st2 1 1 1 0.5 0.2 1 1 2 0.7 0.9 1 1 3 1.1 1.7 1 1 4 2.6 0.8 1 1 5 1.8 1.3 1 1 6 2.1 2.2 1 1 7 0.5 0.2 1 1 8 0.7 0.9 1 1 9 1.1 1.7 1 1 10 2.6 0.8 1 1 11 1.8 1.3 1 1 12 2.1 2.2 1 2 1 0.5 0.2 1 2 2 0.7 0.9 1 2 3 1.1 1.7 1 2 4 2.6 0.8 1 2 5 1.8 1.3 1 2 6 2.1 2.2 1

str.format() raises KeyError

阅读更多关于 str.format() raises KeyError

The following code raises a KeyError exception: addr_list_formatted = [] addr_list_idx = 0 for addr in addr_list: # addr_list is a list addr_list_idx = addr_list_idx + 1 addr_list_formatted.append(""" "{0}" { "gamedir" "str" "address" "{1}" } """.format(addr_list_idx, addr)) Why? I am using Python 3.1. Lasse Vågsæther Karlsen The problem is those { and } characters you have there that don't specify a key for formatting. You need to double them up, so change your code to: addr_list_formatted.append(""" "{0}" {{ "gamedir" "str" "address" "{1}" }} """.format(addr_list_idx, addr)) 来源： https:/