Remove part of a string

前端 未结 6 1774
执笔经年
执笔经年 2020-11-27 11:07

How do I remove part of a string? For example in ATGAS_1121 I want to remove everything before _.

相关标签:
6条回答
  • 2020-11-27 11:38

    Here's the strsplit solution if s is a vector:

    > s <- c("TGAS_1121", "MGAS_1432")
    > s1 <- sapply(strsplit(s, split='_', fixed=TRUE), function(x) (x[2]))
    > s1
    [1] "1121" "1432"
    
    0 讨论(0)
  • 2020-11-27 11:40

    Maybe the most intuitive solution is probably to use the stringr function str_remove which is even easier than str_replace as it has only 1 argument instead of 2.

    The only tricky part in your example is that you want to keep the underscore but its possible: You must match the regular expression until it finds the specified string pattern (?=pattern).

    See example:

    strings = c("TGAS_1121", "MGAS_1432", "ATGAS_1121")
    strings %>% stringr::str_remove(".+?(?=_)")
    
    [1] "_1121" "_1432" "_1121"
    
    0 讨论(0)
  • 2020-11-27 11:41

    Here the strsplit solution for a dataframe using dplyr package

    col1 = c("TGAS_1121", "MGAS_1432", "ATGAS_1121") 
    col2 = c("T", "M", "A") 
    df = data.frame(col1, col2)
    df
            col1 col2
    1  TGAS_1121    T
    2  MGAS_1432    M
    3 ATGAS_1121    A
    
    df<-mutate(df,col1=as.character(col1))
    df2<-mutate(df,col1=sapply(strsplit(df$col1, split='_', fixed=TRUE),function(x) (x[2])))
    df2
    
      col1 col2
    1 1121    T
    2 1432    M
    3 1121    A
    
    0 讨论(0)
  • 2020-11-27 11:45

    If you're a Tidyverse kind of person, here's the stringr solution:

    R> library(stringr)
    R> strings = c("TGAS_1121", "MGAS_1432", "ATGAS_1121") 
    R> strings %>% str_replace(".*_", "_")
    [1] "_1121" "_1432" "_1121"
    # Or:
    R> strings %>% str_replace("^[A-Z]*", "")
    [1] "_1121" "_1432" "_1121"
    
    0 讨论(0)
  • 2020-11-27 11:54

    Use regular expressions. In this case, you can use gsub:

    gsub("^.*?_","_","ATGAS_1121")
    [1] "_1121"
    

    This regular expression matches the beginning of the string (^), any character (.) repeated zero or more times (*), and underscore (_). The ? makes the match "lazy" so that it only matches are far as the first underscore. That match is replaced with just an underscore. See ?regex for more details and references

    0 讨论(0)
  • 2020-11-27 11:59

    You can use a built-in for this, strsplit:

    > s = "TGAS_1121"
    > s1 = unlist(strsplit(s, split='_', fixed=TRUE))[2]
    > s1    
     [1] "1121"
    

    strsplit returns both pieces of the string parsed on the split parameter as a list. That's probably not what you want, so wrap the call in unlist, then index that array so that only the second of the two elements in the vector are returned.

    Finally, the fixed parameter should be set to TRUE to indicate that the split parameter is not a regular expression, but a literal matching character.

    0 讨论(0)
提交回复
热议问题