R: Find and remove all one to two letter words

后端 未结 2 1341
礼貌的吻别
礼貌的吻别 2021-01-13 05:27

I am attempting to clean away any one or two letter words from a text passage. This was my first thought

gsub(\" [a-zA-Z]{1,2} \", \" \", \"a ab abc B BB BBB         


        
相关标签:
2条回答
  • 2021-01-13 05:56

    You can make use of \b word boundary and [[:alpha:]] bracket expression with {1,2} limiting quantifier, and then trim the leading/trailing spaces and shrink multiple spaces into 1:

    tr <- "a ab abc B BB BBB f"
    tr <- gsub(" *\\b[[:alpha:]]{1,2}\\b *", " ", tr) # Remove 1-2 letter words
    gsub("^ +| +$|( ) +", "\\1", tr) # Remove excessive spacing
    

    Result:

    [1] "abc BBB"
    

    See IDEONE demo

    0 讨论(0)
  • 2021-01-13 05:57

    Use the below Perl regex .

    x <- gsub("\\s*(?<!\\S)[a-zA-Z]{1,2}(?!\\S)", "", "a ab abc B BB BBB", perl=T)
    gsub("^\\s+", "", x)
    
    0 讨论(0)
提交回复
热议问题