gsub with “|” character in R

后端 未结 2 363
春和景丽
春和景丽 2021-01-28 20:19

I have a data frame with strings under a variable with the | character. What I want is to remove anything downstream of the | character.

For ex

相关标签:
2条回答
  • 2021-01-28 20:47

    Maybe a better job for strsplit than for a gsub

    And yes, it looks like the pipe does need to be escaped.

    string <- "heat-shock protein hsp70, putative | location=Ld28_v01s1:1091329-1093293(-) | length=654 | sequence_SO=chromosome | SO=protein_coding"
    strsplit(string, ' \\| ')[[1]][1]
    

    That outputs

    "heat-shock protein hsp70, putative"
    

    Note that I'm assuming you only want the text from before the first pipe, and that you want to drop the space that separates the pipe from the piece of the string you care about.

    0 讨论(0)
  • 2021-01-28 20:56

    You have to scape | by adding \\|. Try this

    > gsub("\\|.*$", "", string)
    [1] "heat-shock protein hsp70, putative "
    

    where string is

    string <- "heat-shock protein hsp70, putative | location=Ld28_v01s1:1091329-1093293(-) | length=654 | sequence_SO=chromosome | SO=protein_coding"
    

    This alternative remove the space at the end of line in the output

     gsub("\\s+\\|.*$", "", string)
    [1] "heat-shock protein hsp70, putative"
    
    0 讨论(0)
提交回复
热议问题