R: Find and remove all one to two letter words

后端 未结 2 1353
礼貌的吻别
礼貌的吻别 2021-01-13 05:27

I am attempting to clean away any one or two letter words from a text passage. This was my first thought

gsub(\" [a-zA-Z]{1,2} \", \" \", \"a ab abc B BB BBB         


        
2条回答
  •  攒了一身酷
    2021-01-13 05:56

    You can make use of \b word boundary and [[:alpha:]] bracket expression with {1,2} limiting quantifier, and then trim the leading/trailing spaces and shrink multiple spaces into 1:

    tr <- "a ab abc B BB BBB f"
    tr <- gsub(" *\\b[[:alpha:]]{1,2}\\b *", " ", tr) # Remove 1-2 letter words
    gsub("^ +| +$|( ) +", "\\1", tr) # Remove excessive spacing
    

    Result:

    [1] "abc BBB"
    

    See IDEONE demo

提交回复
热议问题