Which regex removes punctuation from quotation marks in text

﹥>﹥吖頭↗ 提交于 2020-04-06 08:26:31

问题


I have a database and throughout the text there are some quotes that are in quotation marks. I would like to remove all the dots "." that are enclosed in quotation marks in the text.

I have code that punctuates text in quotation marks but if there is more than one quote or more than one point, only the first one is removed.

# Simple phrase:
string <- '"é preciso olhar para o futuro. vou atuar" no front '

# Code that works for a simple 1-point sentence:
str_replace_all(string, '(\".*)\\.(.*\")','\\1\\2')

# Sentence with more than one point and more than one quote:
string <- '"é preciso olhar para o futuro. vou atuar" no front em que posso 
fazer alguma coisa "para .frente", disse jose.'

# it doesn't work as i would like
str_replace_all(string, '(\".*)\\.(.*\")','\\1\\2')

I would like all the points in quotation marks to be removed, but you can see from the example that the regex I developed is not for more general cases.


回答1:


You may simply use str_replace_all with a mere "[^"]*" pattern and use a callback function as the replacement argument to remove all dots with a gsub call:

str_replace_all(string, '"[^"]*"', function(x) gsub(".", "", x, fixed=TRUE))

So,

  • "[^"]*" matches all substrings in string starting with ", then having 0+ chars other than " and then a "
  • Once the match is found, it is passed to the callback as x where gsub(".", "", x, fixed=TRUE) replaces all . (fixed=TRUE makes it a literal dot, not a regex pattern) with an empty string.



回答2:


mystring <-'"é preciso olhar para o futuro. vou atuar" no front em que posso 
fazer alguma coisa "para .frente", disse jose.'

You can use the following pattern with gsub:

gsub('(?!(([^"]*"){2})*[^"]*$)\\.', "", mystring, perl = T)

Same with stringr:

str_replace_all(mystring, '(?!(([^"]*"){2})*[^"]*$)\\.', '')

Output:

#> "é preciso olhar para o futuro vou atuar" no front em que posso 
#> fazer alguma coisa "para frente", disse jose.


来源:https://stackoverflow.com/questions/57317926/which-regex-removes-punctuation-from-quotation-marks-in-text

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!