Remove the string before a certain word with R

前端 未结 2 480
时光取名叫无心
时光取名叫无心 2021-01-22 06:51

I have a character vector that I need to clean. Specifically, I want to remove the number that comes before the word \"Votes.\" Note that the number has a comma to separate thou

2条回答
  •  情话喂你
    2021-01-22 07:53

    Easiest way is with stringr:

    > library(stringr)
    > regexp <- "-?[[:digit:]]+\\.*,*[[:digit:]]*\\.*,*[[:digit:]]* Votes+"
    > str_extract(text,regexp)
    [1] "558,586 Votes"
    

    To do the same thing but extract only the number, wrap it in gsub:

    > gsub('\\s+[[:alpha:]]+', '', str_extract(text,regexp))
    [1] "558,586"
    

    Here's a version that will strip out all numbers before the word "Votes" even if they have commas or periods in it:

    > gsub('\\s+[[:alpha:]]+', '', unlist(regmatches (text,gregexpr("-?[[:digit:]]+\\.*,*[[:digit:]]*\\.*,*[[:digit:]]* Votes+",text) )) )
    [1] "558,586"
    

    If you want the label too, then just throw out the gsub part:

    > unlist(regmatches (text,gregexpr("-?[[:digit:]]+\\.*,*[[:digit:]]*\\.*,*[[:digit:]]* Votes+",text) )) 
    [1] "558,586 Votes"
    

    And if you want to pull out all the numbers:

    > unlist(regmatches (text,gregexpr("-?[[:digit:]]+\\.*,*[[:digit:]]*\\.*,*[[:digit:]]*",text) ))
    [1] "1"       "15"      "202"     "558,586"
    

提交回复
热议问题