missing-data | 易学教程

Reading multiple files and calculating mean based on user input

阅读更多关于 Reading multiple files and calculating mean based on user input

I am trying to write a function in R which takes 3 inputs: Directory pollutant id I have a directory on my computer full of CSV's files i.e. over 300. What this function would do is shown in the below prototype: pollutantmean <- function(directory, pollutant, id = 1:332) { ## 'directory' is a character vector of length 1 indicating ## the location of the CSV files ## 'pollutant' is a character vector of length 1 indicating ## the name of the pollutant for which we will calculate the ## mean; either "sulfate" or "nitrate". ## 'id' is an integer vector indicating the monitor ID numbers ## to be

Replace missing values with column mean

阅读更多关于 Replace missing values with column mean

I am not sure how to loop over each column to replace the NA values with the column mean. When I am trying to replace for one column using the following, it works well. Column1[is.na(Column1)] <- round(mean(Column1, na.rm = TRUE)) The code for looping over columns is not working: for(i in 1:ncol(data)){ data[i][is.na(data[i])] <- round(mean(data[i], na.rm = TRUE)) } the values are not replaced. Can someone please help me with this? A relatively simple modification of your code should solve the issue: for(i in 1:ncol(data)){ data[is.na(data[,i]), i] <- mean(data[,i], na.rm = TRUE) } If DF is

R - Fill missing dates by group

阅读更多关于 R - Fill missing dates by group

问题 In my data, there exist observations for some IDs in some months and not for others, e.g. dat <- data.frame(c(1, 1, 1, 2, 3, 3, 3, 4, 4, 4), c(rep(30, 2), rep(25, 5), rep(20, 3)), c('2017-01-01', '2017-02-01', '2017-04-01', '2017-02-01', '2017-01-01', '2017-02-01', '2017-03-01', '2017-01-01', '2017-02-01', '2017-04-01')) colnames(dat) <- c('id', 'value', 'date') I would like to, for each id value, insert a row that includes the month(s) missing for that id and NA for value . Is there a way to

Replace NA with previous or next value, by group, using dplyr

阅读更多关于 Replace NA with previous or next value, by group, using dplyr

I have a data frame which is arranged by descending order of date. ps1 = data.frame(userID = c(21,21,21,22,22,22,23,23,23), color = c(NA,'blue','red','blue',NA,NA,'red',NA,'gold'), age = c('3yrs','2yrs',NA,NA,'3yrs',NA,NA,'4yrs',NA), gender = c('F',NA,'M',NA,NA,'F','F',NA,'F') ) I wish to impute(replace) NA values with previous values and grouped by userID In case the first row of a userID has NA then replace with the next set of values for that userid group. I am trying to use dplyr and zoo packages something like this...but its not working cleanedFUG <- filteredUserGroup %>% group_by(UserID)

Select NA in a data.table in R

阅读更多关于 Select NA in a data.table in R

问题 How do I select all the rows that have a missing value in the primary key in a data table. DT = data.table(x=rep(c("a","b",NA),each=3), y=c(1,3,6), v=1:9) setkey(DT,x) Selecting for a particular value is easy DT["a",] Selecting for the missing values seems to require a vector search. One cannot use binary search. Am I correct? DT[NA,]# does not work DT[is.na(x),] #does work 回答1: Fortunately, DT[is.na(x),] is nearly as fast as (e.g.) DT["a",] , so in practice, this may not really matter much:

Remove NA values from a vector

阅读更多关于 Remove NA values from a vector

I have a huge vector which has a couple of NA values, and I'm trying to find the max value in that vector (the vector is all numbers), but I can't do this because of the NA values. How can I remove the NA values so that I can compute the max? Trying ?max , you'll see that it actually has a na.rm = argument, set by default to FALSE . (That's the common default for many other R functions, including sum() , mean() , etc.) Setting na.rm=TRUE does just what you're asking for: d <- c(1, 100, NA, 10) max(d, na.rm=TRUE) If you do want to remove all of the NA s, use this idiom instead: d <- d[!is.na(d)

Replace NA in column with value in adjacent column

阅读更多关于 Replace NA in column with value in adjacent column

This question is related to a post with a similar title ( replace NA in an R vector with adjacent values ). I would like to scan a column in a data frame and replace NA's with the value in the adjacent cell. In the aforementioned post, the solution was to replace the NA not with the value from the adjacent vector (e.g. the adjacent element in the data matrix) but was a conditional replace for a fixed value. Below is a reproducible example of my problem: UNIT <- c(NA,NA, 200, 200, 200, 200, 200, 300, 300, 300,300) STATUS <-c('ACTIVE','INACTIVE','ACTIVE','ACTIVE','INACTIVE','ACTIVE','INACTIVE',

Can't drop NAN with dropna in pandas

阅读更多关于 Can't drop NAN with dropna in pandas

问题 I import pandas as pd and run the code below and get the following result Code: traindataset = pd.read_csv(\'/Users/train.csv\') print traindataset.dtypes print traindataset.shape print traindataset.iloc[25,3] traindataset.dropna(how=\'any\') print traindataset.iloc[25,3] print traindataset.shape Output TripType int64 VisitNumber int64 Weekday object Upc float64 ScanCount int64 DepartmentDescription object FinelineNumber float64 dtype: object (647054, 7) nan nan (647054, 7) [Finished in 2.2s]

How to get Python to gracefully format None and non-existing fields [duplicate]

阅读更多关于 How to get Python to gracefully format None and non-existing fields [duplicate]

问题 This question already has answers here : Leaving values blank if not passed in str.format (7 answers) Closed 5 years ago . If I write in Python: data = {\'n\': 3, \'k\': 3.141594, \'p\': {\'a\': 7, \'b\': 8}} print(\'{n}, {k:.2f}, {p[a]}, {p[b]}\'.format(**data)) del data[\'k\'] data[\'p\'][\'b\'] = None print(\'{n}, {k:.2f}, {p[a]}, {p[b]}\'.format(**data)) I get: 3, 3.14, 7, 8 Traceback (most recent call last): File \"./funky.py\", line 186, in <module> print(\'{n}, {k:.2f}, {p[a]}, {p[b]}\

python format string unused named arguments [duplicate]

阅读更多关于 python format string unused named arguments [duplicate]

问题 This question already has an answer here: partial string formatting 16 answers Let\'s say I have: action = \'{bond}, {james} {bond}\'.format(bond=\'bond\', james=\'james\') this wil output: \'bond, james bond\' Next we have: action = \'{bond}, {james} {bond}\'.format(bond=\'bond\') this will output: KeyError: \'james\' Is there some workaround to prevent this error to happen, something like: if keyrror: ignore, leave it alone (but do parse others) compare format string with available named