问题
When I found out that the python regex module allows fuzzy matching I was increasingly happy as it seemed as a simple solution to many of my problems. But now I am having a problem for which I did not find any answers from documentation.
How could I compile Strings into regexps using also the new fuzziness value feature?
To illustrate my usual needs and give a sample a little piece of code
import regex
f = open('liner.fa', 'r')
nosZ2f='TTCCGACTACCAAGGCAAATACTGCTTCTCGAC'
nosZ2r='AGGTCACATCAACGTCAACG'
#nini=regex.compile(nosZ2r{e<=3})
nimekiri=list(f)
pikkus=len(nimekiri)
count = 0
while (count < pikkus):
line = nimekiri[count].rstrip('\n')
m=regex.findall("(TTCCGACTACCAAGGCAAATACTGCTTCTCGAC){e<=3}", line)
n=regex.findall("AGGTCACATCAACGTCAACG{e<=3}", line)
if bool(m) & bool(n):
print nimekiri[count-1].rstrip('\n')
print line
count = count + 1
f.close()
As you can see the regexps with fuzzyness of 3 errors work fine. But I was forced to enter the whole string (nosZ2f/r) manually into the findall. I was not able to compile a regexp with error/fuzzyness values.
What would be correct syntax of turning a string (line nosZ2f/r) to a regexp pattern with fuzzyness value of 3 errors? (failed attempt seen at commented line)
Possibility to use input strings as source for regexp would be critical for any actually useful script I have in mind. (not much automation going on otherwise) So it'd delight me if I could frex. replace
m=regex.findall("(TTCCGACTACCAAGGCAAATACTGCTTCTCGAC){e<=3}", line)
with
m=regex.findall(nini, line) etc.
Or
m=regex.findall("string{e<=3}", line)
回答1:
You need to put your strings together correctly.
import regex
testString = 'some phrase'
r = regex.compile('('+testString+'){e<=5}')
r.match('phrase')
If you want to build a regex pattern, you'll need to use various string manipulation techniques to build up the appropriate pattern to use.
来源:https://stackoverflow.com/questions/21114454/compiling-a-fuzzy-regexp-with-python-regex