问题
I have a database and throughout the text there are some quotes that are in quotation marks. I would like to remove all the dots "." that are enclosed in quotation marks in the text.
I have code that punctuates text in quotation marks but if there is more than one quote or more than one point, only the first one is removed.
# Simple phrase:
string <- '"é preciso olhar para o futuro. vou atuar" no front '
# Code that works for a simple 1-point sentence:
str_replace_all(string, '(\".*)\\.(.*\")','\\1\\2')
# Sentence with more than one point and more than one quote:
string <- '"é preciso olhar para o futuro. vou atuar" no front em que posso
fazer alguma coisa "para .frente", disse jose.'
# it doesn't work as i would like
str_replace_all(string, '(\".*)\\.(.*\")','\\1\\2')
I would like all the points in quotation marks to be removed, but you can see from the example that the regex I developed is not for more general cases.
回答1:
You may simply use str_replace_all
with a mere "[^"]*"
pattern and use a callback function as the replacement argument to remove all dots with a gsub
call:
str_replace_all(string, '"[^"]*"', function(x) gsub(".", "", x, fixed=TRUE))
So,
"[^"]*"
matches all substrings instring
starting with"
, then having 0+ chars other than"
and then a"
- Once the match is found, it is passed to the callback as
x
wheregsub(".", "", x, fixed=TRUE)
replaces all.
(fixed=TRUE
makes it a literal dot, not a regex pattern) with an empty string.
回答2:
mystring <-'"é preciso olhar para o futuro. vou atuar" no front em que posso
fazer alguma coisa "para .frente", disse jose.'
You can use the following pattern
with gsub
:
gsub('(?!(([^"]*"){2})*[^"]*$)\\.', "", mystring, perl = T)
Same with stringr
:
str_replace_all(mystring, '(?!(([^"]*"){2})*[^"]*$)\\.', '')
Output:
#> "é preciso olhar para o futuro vou atuar" no front em que posso
#> fazer alguma coisa "para frente", disse jose.
来源:https://stackoverflow.com/questions/57317926/which-regex-removes-punctuation-from-quotation-marks-in-text