Remove the string before a certain word with R

前端 未结 2 483
时光取名叫无心
时光取名叫无心 2021-01-22 06:51

I have a character vector that I need to clean. Specifically, I want to remove the number that comes before the word \"Votes.\" Note that the number has a comma to separate thou

2条回答
  •  不思量自难忘°
    2021-01-22 07:39

    You may use

    text <- "STATE QUESTION NO. 1                       Amendment to Title 15 of the Nevada Revised Statutes Shall Chapter 202 of the Nevada Revised Statutes be amended to prohibit, except in certain circumstances, a person from selling or transferring a firearm to another person unless a federally-licensed dealer first conducts a federal background check on the potential buyer or transferee?                    558,586 Votes"
    trimws(gsub("(\\s){2,}|\\d[0-9,]*\\s*(Votes)", "\\1\\2", text))
    # => [1] "STATE QUESTION NO. 1 Amendment to Title 15 of the Nevada Revised Statutes Shall Chapter 202 of the Nevada Revised Statutes be amended to prohibit, except in certain circumstances, a person from selling or transferring a firearm to another person unless a federally-licensed dealer first conducts a federal background check on the potential buyer or transferee? Votes"
    

    See the online R demo and the online regex demo.

    Details

    • (\\s){2,} - matches 2 or more whitespace chars while capturing the last occurrence that will be reinserted using the \1 placeholder in the replacement pattern
    • | - or
    • \\d - a digit
    • [0-9,]* - 0 or more digits or commas
    • \\s* - 0+ whitespace chars
    • (Votes) - Group 2 (will be restored in the output using the \2 placeholder): a Votes substring.

    Note that trimws will remove any leading/trailing whitespace.

提交回复
热议问题