A code in python using regex that can perform something like this
Input: Regex should return \"String 1\" or \"String 2\" or \"String3\"
Output: String 1,St
The highly up-voted answer doesn't account for the possibility that the double-quoted string might contain one or more double-quote characters (properly escaped, of course). To handle this situation, the regex needs to accumulate characters one-by-one with a positive lookahead assertion stating that the current character is not a double-quote character that is not preceded by a backslash (which requires a negative lookbehind assertion):
"(?:(?:(?!(?<!\\)").)*)"
See Regex Demo
import re
import ast
def doit(text):
matches=re.findall(r'"(?:(?:(?!(?<!\\)").)*)"',text)
for match in matches:
print(match, '=>', ast.literal_eval(match))
doit('Regex should return "String 1" or "String 2" or "String3" and "\\"double quoted string\\"" ')
Prints:
"String 1" => String 1
"String 2" => String 2
"String3" => String3
"\"double quoted string\"" => "double quoted string"
Just try to fetch double quoted strings from the multiline string:
import re
str="""
"my name is daniel" "mobile 8531111453733"[[[[[[--"i like pandas"
"location chennai"! -asfas"aadhaar du2mmy8969769##69869"
@4343453 "pincode 642002""@mango,@apple,@berry"
"""
print(re.findall(r'["](.*?)["]',str))
import re
r=r"'(\\'|[^'])*(?!<\\)'|\"(\\\"|[^\"])*(?!<\\)\""
texts=[r'"aerrrt"',
r'"a\"e'+"'"+'rrt"',
r'"a""""arrtt"""""',
r'"aerrrt',
r'"a\"errt'+"'",
r"'aerrrt'",
r"'a\'e"+'"'+"rrt'",
r"'a''''arrtt'''''",
r"'aerrrt",
r"'a\'errt"+'"',
"''",'""',""]
for text in texts:
print (text,"-->",re.fullmatch(r,text))
results:
"aerrrt" --> <_sre.SRE_Match object; span=(0, 8), match='"aerrrt"'>
"a\"e'rrt" --> <_sre.SRE_Match object; span=(0, 10), match='"a\\"e\'rrt"'>
"a""""arrtt""""" --> None
"aerrrt --> None
"a\"errt' --> None
'aerrrt' --> <_sre.SRE_Match object; span=(0, 8), match="'aerrrt'">
'a\'e"rrt' --> <_sre.SRE_Match object; span=(0, 10), match='\'a\\\'e"rrt\''>
'a''''arrtt''''' --> None
'aerrrt --> None
'a\'errt" --> None
'' --> <_sre.SRE_Match object; span=(0, 2), match="''">
"" --> <_sre.SRE_Match object; span=(0, 2), match='""'>
--> None
Here's all you need to do:
def doit(text):
import re
matches=re.findall(r'\"(.+?)\"',text)
# matches is now ['String 1', 'String 2', 'String3']
return ",".join(matches)
doit('Regex should return "String 1" or "String 2" or "String3" ')
# result:
'String 1,String 2,String3'
As pointed out by Li-aung Yip: (I nearly quote)
.+?
is the "non-greedy" version of .+
. It makes the regular expression match the smallest number of characters it can instead of the most characters it can. The greedy version, .+
, will give string 1" or "String 2" or "String 3
; the non-greedy version .+?
'String 1,String 2,String3'
In addition (Johan speaking again), if you want to accept empty strings, change .+
to .*
. Star means zero or more - plus means at least one.