Extract text in parentheses in R

无人久伴 提交于 2019-12-05 06:55:56

Text outside the parenthesis

> x <- c("a(b)jk(p)"  ,"ipq" , "e(ijkl)")
> gsub("\\([^()]*\\)", "", x)
[1] "ajk" "ipq" "e"  

Text inside the parenthesis

> x <- c("a(b)jk(p)"  ,"ipq" , "e(ijkl)")
> gsub("(?<=\\()[^()]*(?=\\))(*SKIP)(*F)|.", "", x, perl=T)
[1] "bp"   ""     "ijkl"

The (?<=\\()[^()]*(?=\\)) matches all the characters which are present inside the brackets and then the following (*SKIP)(*F) makes the match to fail. Now it tries to execute the pattern which was just after to | symbol against the remaining string. So the dot . matches all the characters which are not already skipped. Replacing all the matched characters with an empty string will give only the text present inside the rackets.

> gsub("\\(([^()]*)\\)|.", "\\1", x, perl=T)
[1] "bp"   ""     "ijkl"

This regex would capture all the characters which are present inside the brackets and matches all the other characters. |. or part helps to match all the remaining characters other than the captured ones. So by replacing all the characters with the chars present inside the group index 1 will give you the desired output.

The rm_round function in the qdapRegex package I maintain was born to do this:

First we'll get and load the package via pacman

if (!require("pacman")) install.packages("pacman")
pacman::p_load(qdapRegex)

## Then we can use it to remove and extract the parts you want:

x <-c("a(b)jk(p)", "ipq", "e(ijkl)")

rm_round(x)

## [1] "ajk" "ipq" "e" 

rm_round(x, extract=TRUE)

## [[1]]
## [1] "b" "p"
## 
## [[2]]
## [1] NA
## 
## [[3]]
## [1] "ijkl"

To condense b and p use:

sapply(rm_round(x, extract=TRUE), paste, collapse="")

## [1] "bp"   "NA"   "ijkl"
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!