How can I trim leading and trailing white space?

后端 未结 13 1515
北海茫月
北海茫月 2020-11-22 03:53

I am having some troubles with leading and trailing white space in a data.frame.

For example, I like to take a look at a specific row in a data.fra

相关标签:
13条回答
  • 2020-11-22 04:09

    Probably the best way is to handle the trailing white spaces when you read your data file. If you use read.csv or read.table you can set the parameterstrip.white=TRUE.

    If you want to clean strings afterwards you could use one of these functions:

    # Returns string without leading white space
    trim.leading <- function (x)  sub("^\\s+", "", x)
    
    # Returns string without trailing white space
    trim.trailing <- function (x) sub("\\s+$", "", x)
    
    # Returns string without leading or trailing white space
    trim <- function (x) gsub("^\\s+|\\s+$", "", x)
    

    To use one of these functions on myDummy$country:

     myDummy$country <- trim(myDummy$country)
    

    To 'show' the white space you could use:

     paste(myDummy$country)
    

    which will show you the strings surrounded by quotation marks (") making white spaces easier to spot.

    0 讨论(0)
  • 2020-11-22 04:13

    Another related problem occurs if you have multiple spaces in between inputs:

    > a <- "  a string         with lots   of starting, inter   mediate and trailing   whitespace     "
    

    You can then easily split this string into "real" tokens using a regular expression to the split argument:

    > strsplit(a, split=" +")
    [[1]]
     [1] ""           "a"          "string"     "with"       "lots"
     [6] "of"         "starting,"  "inter"      "mediate"    "and"
    [11] "trailing"   "whitespace"
    

    Note that if there is a match at the beginning of a (non-empty) string, the first element of the output is ‘""’, but if there is a match at the end of the string, the output is the same as with the match removed.

    0 讨论(0)
  • 2020-11-22 04:14

    I tried trim(). It works well with white spaces as well as the '\n'.

    x = '\n              Harden, J.\n              '
    
    trim(x)
    
    0 讨论(0)
  • 2020-11-22 04:17

    Another option is to use the stri_trim function from the stringi package which defaults to removing leading and trailing whitespace:

    > x <- c("  leading space","trailing space   ")
    > stri_trim(x)
    [1] "leading space"  "trailing space"
    

    For only removing leading whitespace, use stri_trim_left. For only removing trailing whitespace, use stri_trim_right. When you want to remove other leading or trailing characters, you have to specify that with pattern =.

    See also ?stri_trim for more info.

    0 讨论(0)
  • 2020-11-22 04:19

    I created a trim.strings () function to trim leading and/or trailing whitespace as:

    # Arguments:    x - character vector
    #            side - side(s) on which to remove whitespace 
    #                   default : "both"
    #                   possible values: c("both", "leading", "trailing")
    
    trim.strings <- function(x, side = "both") { 
        if (is.na(match(side, c("both", "leading", "trailing")))) { 
          side <- "both" 
          } 
        if (side == "leading") { 
          sub("^\\s+", "", x)
          } else {
            if (side == "trailing") {
              sub("\\s+$", "", x)
        } else gsub("^\\s+|\\s+$", "", x)
        } 
    } 
    

    For illustration,

    a <- c("   ABC123 456    ", " ABC123DEF          ")
    
    # returns string without leading and trailing whitespace
    trim.strings(a)
    # [1] "ABC123 456" "ABC123DEF" 
    
    # returns string without leading whitespace
    trim.strings(a, side = "leading")
    # [1] "ABC123 456    "      "ABC123DEF          "
    
    # returns string without trailing whitespace
    trim.strings(a, side = "trailing")
    # [1] "   ABC123 456" " ABC123DEF"   
    
    0 讨论(0)
  • 2020-11-22 04:22

    Use grep or grepl to find observations with white spaces and sub to get rid of them.

    names<-c("Ganga Din\t", "Shyam Lal", "Bulbul ")
    grep("[[:space:]]+$", names)
    [1] 1 3
    grepl("[[:space:]]+$", names)
    [1]  TRUE FALSE  TRUE
    sub("[[:space:]]+$", "", names)
    [1] "Ganga Din" "Shyam Lal" "Bulbul"
    
    0 讨论(0)
提交回复
热议问题