a regular expression generator for number ranges

前端 未结 9 2151
长情又很酷
长情又很酷 2021-02-04 05:56

I checked on the stackExchange description, and algorithm questions are one of the allowed topics. So here goes.

Given an input of a range, where begin and ending number

9条回答
  •  伪装坚强ぢ
    2021-02-04 06:30

    Here is a recursive solution in python, which works for an arbitrary range of positive numbers. The idea is to divide the range into three sub-ranges:

    • from start to the next multiple of 10 (if start is not already a multiple of 10)
    • from the last multiple of 10 to end (if end is not already a multiple of 10)
    • the range between these two multiples of 10 can be handled recursivle by taking off the last digit and adding the regular expression [0-9] to all generated regular expressions afterwards

    The code below even optimizes ranges of single values like [1-1] to 1. The function to call is genRangeRegex (start is inclusive, end is exclusive):

    def regexRangeDigits(start,stop):
      if start == stop:
        return str(start)
      return '[%d-%d]' % (start,stop)
    
    # generate list of regular expressions for the number range [start,end[
    def genRangeRegex(start, end):
      if start <= 0:
        raise ValueError('only ranges of positive numbers supported')
    
      print 'getting regex list for range [%d,%d[' % (start,end)
      if start >= end:
        return []
    
      digitsStart = str(start)
      digitsEnd   = str(end)
      lastDigitStart = start%10
    
      if start//10 == (end-1)//10: # integer division
        lastDigitStop = (end-1)%10
        regexAll = digitsStart[:-1] + regexRangeDigits(lastDigitStart,lastDigitStop)
        print '  regexAll   = %s' % regexAll
        return [regexAll]
    
      regexListStart = [] # at most one regular expression for going up to first multiple of 10
      if lastDigitStart != 0:
        regexStart = digitsStart[:-1] + regexRangeDigits(lastDigitStart,9)
        print '  regexStart = %s' % regexStart
        regexListStart.append(regexStart)
    
      regexListEnd = [] # at most one regular expression for going up from last multiple of 10
      lastDigitEnd = end%10
      if lastDigitEnd != 0:
        regexEnd = digitsEnd[:-1] + regexRangeDigits(0,lastDigitEnd-1)
        print '  regexEnd   = %s' % regexEnd
        regexListEnd.append(regexEnd)
    
      regexListMidTrunc = genRangeRegex((start+9)//10, end//10)
      regexListMid = [r+'[0-9]' for r in regexListMidTrunc]
    
      return regexListStart + regexListMid + regexListEnd
    

    And here an example output how the function works:

    >>> genRangeRegex(12,231)
    getting regex list for range [12,231[
      regexStart = 1[2-9]
      regexEnd   = 230
    getting regex list for range [2,23[
      regexStart = [2-9]
      regexEnd   = 2[0-2]
    getting regex list for range [1,2[
      regexAll   = 1
    ['1[2-9]', '[2-9][0-9]', '1[0-9][0-9]', '2[0-2][0-9]', '230']
    

提交回复
热议问题