I am attempting to clean away any one or two letter words from a text passage. This was my first thought
gsub(\" [a-zA-Z]{1,2} \", \" \", \"a ab abc B BB BBB
You can make use of \b
word boundary and [[:alpha:]]
bracket expression with {1,2}
limiting quantifier, and then trim the leading/trailing spaces and shrink multiple spaces into 1:
tr <- "a ab abc B BB BBB f"
tr <- gsub(" *\\b[[:alpha:]]{1,2}\\b *", " ", tr) # Remove 1-2 letter words
gsub("^ +| +$|( ) +", "\\1", tr) # Remove excessive spacing
Result:
[1] "abc BBB"
See IDEONE demo
Use the below Perl regex .
x <- gsub("\\s*(?<!\\S)[a-zA-Z]{1,2}(?!\\S)", "", "a ab abc B BB BBB", perl=T)
gsub("^\\s+", "", x)