R: combine several gsub() function in a pipe

放肆的年华 提交于 2019-12-09 05:44:04

问题


To clean some messy data I would like to start using pipes %>%, but I fail to get the R code working if gsub() is not at the beginning of the pipe, should occur late (Note: this question is not concerned with proper import, but with data cleaning).

Simple example:

df <- cbind.data.frame(A= c("2.187,78 ", "5.491,28 ", "7.000,32 "), B = c("A","B","C"))

Column A contains characters (in this case numbers, but this also could be string) and need to be cleaned. The steps are

df$D <- gsub("\\.","",df$A)
df$D <- str_trim(df$D) 
df$D <- as.numeric(gsub(",", ".",df$D))

One easily could pipe this

df$D  <-  gsub("\\.","",df$A) %>%
          str_trim() %>%
          as.numeric(gsub(",", ".")) %>%

The problem is the second gsub because it asks for the Input .... which actually the result of the previous line.

Please, could anyone explain how to use functions like gsub() further down the pipeline? Thanks a lot!

system: R 3.2.3, Windows


回答1:


Try this:

library(stringr)

df$D <- df$A %>%
  { gsub("\\.","", .) } %>%
  str_trim() %>%
  { as.numeric(gsub(",", ".", .)) }

With pipe your data are passed as a first argument to the next function, so if you want to use it somewhere else you need to wrap the next line in {} and use . as a data "marker".




回答2:


Normally one applies the pipes to the data frame as a whole like this returning the cleaned data frame. The idea of functional programming is that objects are immutable and are not changed in place but rather new objects are generated.

library(dplyr)

df %>%
   mutate(C = gsub("\\.", "", A)) %>%
   mutate(C = gsub(",", ".", C)) %>%
   mutate(C = as.numeric(C))

Also note that these alternatives work:

df %>% mutate(C = gsub("\\.", "", A), C = gsub(",", ".", C), C = as.numeric(C))


df %>% mutate(C = read.table(text = gsub("[.]", "", A), dec = ",")[[1]])


df %>% mutate(C = type.convert(gsub("[.]", "", A), dec = ","))

For this particular example type.convert seems the most appropriate since it compactly expresses at a high level what we intend to do. In comparison, the gsub/as.numeric solutions seem too low level and verbose while read.table adds conversion to data.frame which we need to undo making it too high level.




回答3:


The problem is that the argument that is fed into the pipe needs to be the first in the list of arguments. But this is not the case for gsub(), as x is the third one. A (wordy) workaround could be:

df$A %>% 
  gsub(pattern = "\\.", replacement="") %>%
  str_trim() %>%
  gsub(patter = ",", replacement = ".") %>%
  as.numeric



回答4:


You can use str_replace(string, pattern, replacement) from package stringr as a drop-in replacement for gsub. stringr functions follow a tidy approach in which the string / character vector is the first argument.

c("hello", "hi") %>% str_replace_all("[aeiou]", "x")

See Introduction to stringr for more information on stringr's sensibly named and defined functions as replacements for R's default string functions.



来源:https://stackoverflow.com/questions/39997273/r-combine-several-gsub-function-in-a-pipe

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!