Whitespace string can't be replaced with NA in R

☆樱花仙子☆ 提交于 2019-12-01 08:15:37

问题


I want to substitute whitespaces with NA. A simple way could be df[df == ""] <- NA, and that works for most of the cells of my data frame....but not for everyone!

I have the following code:

library(rvest)
library(dplyr)
library(tidyr)

#Read website
htmlpage <- read_html("http://www.soccervista.com/results-Liga_MX_Apertura-2016_2017-844815.html")

#Extract table
df <- htmlpage %>% html_nodes("table") %>% html_table()
df <- as.data.frame(df)

#Set whitespaces into NA's
df[df == ""] <- NA

I figured out that some whitespaces have a little whitespace between the quotation marks

df[11,1] [1] " "

So my solution was to do the next: df[df == " "] <- NA

However the problem is still there and it has the little whitespace! I thought the trim function would work but it didn't...

#Trim
df[,c(1:10)] <- sapply(df[,c(1:10)], trimws)

However, the problem can't go off.

Any ideas?


回答1:


We need to use lapply instead of sapply as sapply returns a matrix instead of a list and this can create problems in the quotes.

df[1:10] <- lapply(df[1:10], trimws)

and another option if we have spaces like " " is to use gsub to replace those spaces to ""

df[1:10] <- lapply(df[,c(1:10)], function(x) gsub("^\\s+|\\s+$", "", x))

and then change the "" to NA

df[df == ""] <- NA

Or instead of doing the two replacements, we can do this one go and change the class with type.convert

df[] <- lapply(df, function(x)
      type.convert(replace(x, grepl("^\\s*$", trimws(x)), NA), as.is = TRUE))

NOTE: We don't have to specify the column index when all the columns are looped




回答2:


I just spent some time trying to determine a method usable in a pipe.

Here is my method:

df <- df %>% 
    dplyr::mutate_all(funs(sub("^\\s*$", NA, .)))

Hope this helps the next searcher.



来源:https://stackoverflow.com/questions/41530892/whitespace-string-cant-be-replaced-with-na-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!