I am working on a large dataset, with some rows with NAs and others with blanks:
df <- data.frame(ID = c(1:7),
home_pc = c("","CB4 2DT", "NE5 7TH", "BY5 8IB", "DH4 6PB","MP9 7GH","KN4 5GH"),
start_pc = c(NA,"Home", "FC5 7YH","Home", "CB3 5TH", "BV6 5PB",NA),
end_pc = c(NA,"CB5 4FG","Home","","Home","",NA))
How do I remove the NAs and blanks in one go (in the start_pc and end_pc columns)? I have in the past used:
df<- df[-which(is.na(df$start_pc)), ]
... to remove the NAs - is there a similar command to remove the blanks?
df[!(is.na(df$start_pc) | df$start_pc==""), ]
It is the same construct - simply test for empty strings rather than NA
:
Try this:
df <- df[-which(df$start_pc == ""), ]
In fact, looking at your code, you don't need the which
, but use the negation instead, so you can simplify it to:
df <- df[!(df$start_pc == ""), ]
df <- df[!is.na(df$start_pc), ]
And, of course, you can combine these two statements as follows:
df <- df[!(df$start_pc == "" | is.na(df$start_pc)), ]
And simplify it even further with with
:
df <- with(df, df[!(start_pc == "" | is.na(start_pc)), ])
You can also test for non-zero string length using nzchar
.
df <- with(df, df[!(nzchar(start_pc) | is.na(start_pc)), ])
Disclaimer: I didn't test any of this code. Please let me know if there are syntax errors anywhere
An easy approach would be making all the blank cells NA and only keeping complete cases. You might also look for na.omit examples. It is a widely discussed topic.
df[df==""]<-NA
df<-df[complete.cases(df),]
Alternative solution can be to remove the rows with blanks in one variable:
df <- subset(df, VAR != "")
An elegant solution with dplyr would be:
df %>%
# recode empty strings "" by NAs
na_if("") %>%
# remove NAs
na.omit
来源:https://stackoverflow.com/questions/9126840/delete-rows-with-blank-values-in-one-particular-column