matching two or more characters that are not the same

只谈情不闲聊 提交于 2020-03-22 09:03:32

问题


Is it possible to write a regex pattern to match abc where each letter is not literal but means that text like xyz (but not xxy) would be matched? I am able to get as far as (.)(?!\1) to match a in ab but then I am stumped.

After getting the answer below, I was able to write a routine to generate this pattern. Using raw re patterns is much faster than converting both the pattern and a text to canonical form and then comaring them.

def pat2re(p, know=None, wild=None):
    """return a compiled re pattern that will find pattern `p`
    in which each different character should find a different
    character in a string. Characters to be taken literally
    or that can represent any character should be given as
    `know` and `wild`, respectively.

    EXAMPLES
    ========

    Characters in the pattern denote different characters to
    be matched; characters that are the same in the pattern
    must be the same in the text:

    >>> pat = pat2re('abba')
    >>> assert pat.search('maccaw')
    >>> assert not pat.search('busses')

    The underlying pattern of the re object can be seen
    with the pattern property:

    >>> pat.pattern
    '(.)(?!\\1)(.)\\2\\1'    

    If some characters are to be taken literally, list them
    as known; do the same if some characters can stand for
    any character (i.e. are wildcards):

    >>> a_ = pat2re('ab', know='a')
    >>> assert a_.search('ad') and not a_.search('bc')

    >>> ab_ = pat2re('ab*', know='ab', wild='*')
    >>> assert ab_.search('abc') and ab_.search('abd')
    >>> assert not ab_.search('bad')

    """
    import re
    # make a canonical "hash" of the pattern
    # with ints representing pattern elements that
    # must be unique and strings for wild or known
    # values
    m = {}
    j = 1
    know = know or ''
    wild = wild or ''
    for c in p:
        if c in know:
            m[c] = '\.' if c == '.' else c
        elif c in wild:
            m[c] = '.'
        elif c not in m:
            m[c] = j
            j += 1
            assert j < 100
    h = tuple(m[i] for i in p)
    # build pattern
    out = []
    last = 0
    for i in h:
        if type(i) is int:
            if i <= last:
                out.append(r'\%s' % i)
            else:
                if last:
                    ors = '|'.join(r'\%s' % i for i in range(1, last + 1))
                    out.append('(?!%s)(.)' % ors)
                else:
                    out.append('(.)')
                last = i
        else:
            out.append(i)
    return re.compile(''.join(out))

回答1:


You may try:

^(.)(?!\1)(.)(?!\1|\2).$

Demo

Here is an explanation of the regex pattern:

^          from the start of the string
(.)        match and capture any first character (no restrictions so far)
(?!\1)     then assert that the second character is different from the first
(.)        match and capture any (legitimate) second character
(?!\1|\2)  then assert that the third character does not match first or second
.          match any valid third character
$          end of string


来源:https://stackoverflow.com/questions/60538299/matching-two-or-more-characters-that-are-not-the-same

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!