问题
I want to substitute whitespaces with NA. A simple way could be df[df == ""] <- NA
, and that works for most of the cells of my data frame....but not for everyone!
I have the following code:
library(rvest)
library(dplyr)
library(tidyr)
#Read website
htmlpage <- read_html("http://www.soccervista.com/results-Liga_MX_Apertura-2016_2017-844815.html")
#Extract table
df <- htmlpage %>% html_nodes("table") %>% html_table()
df <- as.data.frame(df)
#Set whitespaces into NA's
df[df == ""] <- NA
I figured out that some whitespaces have a little whitespace between the quotation marks
df[11,1]
[1] " "
So my solution was to do the next: df[df == " "] <- NA
However the problem is still there and it has the little whitespace! I thought the trim function would work but it didn't...
#Trim
df[,c(1:10)] <- sapply(df[,c(1:10)], trimws)
However, the problem can't go off.
Any ideas?
回答1:
We need to use lapply
instead of sapply
as sapply
returns a matrix
instead of a list
and this can create problems in the quotes.
df[1:10] <- lapply(df[1:10], trimws)
and another option if we have spaces like " "
is to use gsub
to replace those spaces to ""
df[1:10] <- lapply(df[,c(1:10)], function(x) gsub("^\\s+|\\s+$", "", x))
and then change the ""
to NA
df[df == ""] <- NA
Or instead of doing the two replacements, we can do this one go and change the class
with type.convert
df[] <- lapply(df, function(x)
type.convert(replace(x, grepl("^\\s*$", trimws(x)), NA), as.is = TRUE))
NOTE: We don't have to specify the column index when all the columns are looped
回答2:
I just spent some time trying to determine a method usable in a pipe.
Here is my method:
df <- df %>%
dplyr::mutate_all(funs(sub("^\\s*$", NA, .)))
Hope this helps the next searcher.
来源:https://stackoverflow.com/questions/41530892/whitespace-string-cant-be-replaced-with-na-in-r