a regular expression generator for number ranges

前端 未结 9 2154
长情又很酷
长情又很酷 2021-02-04 05:56

I checked on the stackExchange description, and algorithm questions are one of the allowed topics. So here goes.

Given an input of a range, where begin and ending number

相关标签:
9条回答
  • 2021-02-04 06:18

    I recently had a requirement where I needed to develop a "Regular Expression Generator For Number Ranges" using Java and I have developed a full solution which is posted on IdeOne and class is called ideone.java -- Source Code on ideone.com. Code on ideone is heavily commented and it uses the same algorithm as posted by other users, so I will only highlight the changes and features added or issues fixed. I used part of solutions provided in answers by Bezmax (concept), arcy (overall code and idea of generating RegEx range as pairs) and coproc (using recursion for generating RegEx pairs, instead of method used by arcy). Thanks to all the three folks.

    There are two public methods provided by Ideone.java, which implement the RegExGenerator logic - one accepts string numeric ranges and the other accepts integer number ranges.

    generateRegEx(String begStr, String endStr)
    generateRegEx(int beg, int end)
    

    Both of the aforementioned public methods call generateRegExCommon method, which in turn calls the getRegExPairsRecursion method (same implementation as provided by coproc's answer), to generate a list containing numbers in pairs which represent lower and upper end of valid RegEx ranges. It then calls formatPairsToRegEx method to convert the RegEx pairs to actual RegEx ranges which may contain prefix of zeros to match the input length, if there were zeros in the input. If the input did not contain leading zeros or if integer input was used, no leading zeros would be added to the output ranges. The output is available as a list of Array of strings, where each entry/element is a valid Regular expression numeric range:

    regexArray - String Array where each element is a valid regular expression range.
    regexList  - List of String elements where each element is a valid regular expression range.
    

    Figure below shows sequence diagram of a sample execution of Java code (ideone.java) with input and output at each stage. The example used has Input Range of numeric strings with leading zeros where Lower range value is "0006" and upper range is "0977". Output as shown in the figure below is:

    000[6-9]
    00[1-9][0-9]
    0[1-8][0-9][0-9]
    09[0-6][0-9]
    097[0-7]
    

    Code provides the following benefits:

    • Single comprehensive solution combining the best of all answers and algorithms.
    • Print statements which help in debugging the code and can be easily converted to any logging framework trace/debug statements, instead of System.out.
    • Fixed the problem where low range of 0 was not working with other answers.
    0 讨论(0)
  • 2021-02-04 06:23

    I've finally arrived at the following. The overall idea is to start with the beginning of the range, produce a regular expression that will match from that up to but not including the next multiple of 10, then for hundreds, etc. until you have matched things up to the end of the range; then start with the end of the range and work downwards, replacing increasing numbers of digits with 0s to match against similar numbers of 9s, to match the specific end-of-range. Then generate one regular expression for the part of the range if they don't already cover it all.

    Special note should be taken of bezmax's routine to convert two numbers to the regular expression that will match them - MUCH easier than dealing with strings or character arrays directly, I think.

    Anyway, here it is:

    package numbers;
    
    import java.util.ArrayList;
    import java.util.Collections;
    import java.util.Iterator;
    import java.util.List;
    
    /**
     * Has methods for generating regular expressions to match ranges of numbers.
     */
    public class RangeRegexGenerator
    {
      public static void main(String[] args)
      {
        RangeRegexGenerator rrg = new RangeRegexGenerator();
    
    //    do
    //    {
    //      Scanner scanner = new Scanner(System.in);
    //      System.out.println("enter start, <return>, then end and <return>");
    //      int start = scanner.nextInt();
    //      int end = scanner.nextInt();
    //      System.out.println(String.format("for %d-%d", start, end));
    
          List<String> regexes = rrg.getRegex("0015", "0213");
          for (String s: regexes) { System.out.println(s); }
    //    } 
    //    while(true);
      }
    
      /**
       * Return a list of regular expressions that match the numbers
       * that fall within the range of the given numbers, inclusive.
       * Assumes the given strings are numbers of the the same length,
       * and 0-left-pads the resulting expressions, if necessary, to the
       * same length. 
       * @param begStr
       * @param endStr
       * @return
       */
      public List<String> getRegex(String begStr, String endStr)
      {
          int start = Integer.parseInt(begStr);
          int end   = Integer.parseInt(endStr);
          int stringLength = begStr.length();
          List<Integer> pairs = getRegexPairs(start, end);
          List<String> regexes = toRegex(pairs, stringLength);
          return regexes;
      }
    
      /**
       * Return a list of regular expressions that match the numbers
       * that fall within the range of the given numbers, inclusive.
       * @param beg
       * @param end
       * @return
       */
      public List<String> getRegex(int beg, int end)
      {
        List<Integer> pairs = getRegexPairs(beg, end);
        List<String> regexes = toRegex(pairs);
        return regexes;
      }
    
      /**
       * return the list of integers that are the paired integers
       * used to generate the regular expressions for the given
       * range. Each pair of integers in the list -- 0,1, then 2,3,
       * etc., represents a range for which a single regular expression
       * is generated.
       * @param start
       * @param end
       * @return
       */
      private List<Integer> getRegexPairs(int start, int end)
      {
          List<Integer> pairs = new ArrayList<>();
    
          ArrayList<Integer> leftPairs = new ArrayList<>();
          int middleStartPoint = fillLeftPairs(leftPairs, start, end);
          ArrayList<Integer> rightPairs = new ArrayList<>();
          int middleEndPoint = fillRightPairs(rightPairs, middleStartPoint, end);
    
          pairs.addAll(leftPairs);
          if (middleEndPoint > middleStartPoint)
          {
            pairs.add(middleStartPoint);
            pairs.add(middleEndPoint);
          }
          pairs.addAll(rightPairs);
          return pairs;
      }
    
      /**
       * print the given list of integer pairs - used for debugging.
       * @param list
       */
      @SuppressWarnings("unused")
      private void printPairList(List<Integer> list)
      {
        if (list.size() > 0)
        {
          System.out.print(String.format("%d-%d", list.get(0), list.get(1)));
          int i = 2;
          while (i < list.size())
          {
            System.out.print(String.format(", %d-%d", list.get(i), list.get(i + 1)));
            i = i + 2;
          }
          System.out.println();
        }
      }
    
      /**
       * return the regular expressions that match the ranges in the given
       * list of integers. The list is in the form firstRangeStart, firstRangeEnd, 
       * secondRangeStart, secondRangeEnd, etc.
       * @param pairs
       * @return
       */
      private List<String> toRegex(List<Integer> pairs)
      {
        return toRegex(pairs, 0);
      }
    
      /**
       * return the regular expressions that match the ranges in the given
       * list of integers. The list is in the form firstRangeStart, firstRangeEnd, 
       * secondRangeStart, secondRangeEnd, etc. Each regular expression is 0-left-padded,
       * if necessary, to match strings of the given width.
       * @param pairs
       * @param minWidth
       * @return
       */
      private List<String> toRegex(List<Integer> pairs, int minWidth)
      {
        List<String> list = new ArrayList<>();
        String numberWithWidth = String.format("%%0%dd", minWidth);
        for (Iterator<Integer> iterator = pairs.iterator(); iterator.hasNext();)
        {
          String start = String.format(numberWithWidth, iterator.next()); // String.valueOf(iterator.next());
          String end = String.format(numberWithWidth, iterator.next());
    
          list.add(toRegex(start, end));
        }
        return list;
      }
    
      /**
       * return a regular expression string that matches the range
       * with the given start and end strings.
       * @param start
       * @param end
       * @return
       */
      private String toRegex(String start, String end)
      {
        assert start.length() == end.length();
    
        StringBuilder result = new StringBuilder();
    
        for (int pos = 0; pos < start.length(); pos++)
        {
          if (start.charAt(pos) == end.charAt(pos))
          {
            result.append(start.charAt(pos));
          } else
          {
            result.append('[').append(start.charAt(pos)).append('-')
                .append(end.charAt(pos)).append(']');
          }
        }
        return result.toString();
      }
    
      /**
       * Return the integer at the end of the range that is not covered
       * by any pairs added to the list.
       * @param rightPairs
       * @param start
       * @param end
       * @return
       */
      private int fillRightPairs(List<Integer> rightPairs, int start, int end)
      {
        int firstBeginRange = end;    // the end of the range not covered by pairs
                                      // from this routine.
        int y = end;
        int x = getPreviousBeginRange(y);
    
        while (x >= start)
        {
          rightPairs.add(y);
          rightPairs.add(x);
          y = x - 1;
          firstBeginRange = y;
          x = getPreviousBeginRange(y);
        }
        Collections.reverse(rightPairs);
        return firstBeginRange;
      }
    
      /**
       * Return the integer at the start of the range that is not covered
       * by any pairs added to its list. 
       * @param leftInts
       * @param start
       * @param end
       * @return
       */
      private int fillLeftPairs(ArrayList<Integer> leftInts, int start, int end)
      {
        int x = start;
        int y = getNextLeftEndRange(x);
    
        while (y < end)
        {
          leftInts.add(x);
          leftInts.add(y);
          x = y + 1;
          y = getNextLeftEndRange(x);
        }
        return x;
      }
    
      /**
       * given a number, return the number altered such
       * that any 9s at the end of the number remain, and
       * one more 9 replaces the number before the other
       * 9s.
       * @param num
       * @return
       */
      private int getNextLeftEndRange(int num)
      {
        char[] chars = String.valueOf(num).toCharArray();
        for (int i = chars.length - 1; i >= 0; i--)
        {
          if (chars[i] == '0')
          {
            chars[i] = '9';
          } else
          {
            chars[i] = '9';
            break;
          }
        }
    
        return Integer.parseInt(String.valueOf(chars));
      }
    
      /**
       * given a number, return the number altered such that
       * any 9 at the end of the number is replaced by a 0,
       * and the number preceding any 9s is also replaced by
       * a 0.
       * @param num
       * @return
       */
      private int getPreviousBeginRange(int num)
      {
        char[] chars = String.valueOf(num).toCharArray();
        for (int i = chars.length - 1; i >= 0; i--)
        {
          if (chars[i] == '9')
          {
            chars[i] = '0';
          } else
          {
            chars[i] = '0';
            break;
          }
        }
    
        return Integer.parseInt(String.valueOf(chars));
      }
    }
    

    This one is correct as far as I've been able to test it; the one posted by bezmax did not quite work, though he had the right idea (that I also came up with) for an overall algorithm, and a major implementation detail or two that were helpful, so I'm leaving the 'answer' checkmark on his response.

    I was a little surprised at the amount of interest this generated, though not as much as by just how complex the problem turned out to be.

    0 讨论(0)
  • 2021-02-04 06:30

    Here is a recursive solution in python, which works for an arbitrary range of positive numbers. The idea is to divide the range into three sub-ranges:

    • from start to the next multiple of 10 (if start is not already a multiple of 10)
    • from the last multiple of 10 to end (if end is not already a multiple of 10)
    • the range between these two multiples of 10 can be handled recursivle by taking off the last digit and adding the regular expression [0-9] to all generated regular expressions afterwards

    The code below even optimizes ranges of single values like [1-1] to 1. The function to call is genRangeRegex (start is inclusive, end is exclusive):

    def regexRangeDigits(start,stop):
      if start == stop:
        return str(start)
      return '[%d-%d]' % (start,stop)
    
    # generate list of regular expressions for the number range [start,end[
    def genRangeRegex(start, end):
      if start <= 0:
        raise ValueError('only ranges of positive numbers supported')
    
      print 'getting regex list for range [%d,%d[' % (start,end)
      if start >= end:
        return []
    
      digitsStart = str(start)
      digitsEnd   = str(end)
      lastDigitStart = start%10
    
      if start//10 == (end-1)//10: # integer division
        lastDigitStop = (end-1)%10
        regexAll = digitsStart[:-1] + regexRangeDigits(lastDigitStart,lastDigitStop)
        print '  regexAll   = %s' % regexAll
        return [regexAll]
    
      regexListStart = [] # at most one regular expression for going up to first multiple of 10
      if lastDigitStart != 0:
        regexStart = digitsStart[:-1] + regexRangeDigits(lastDigitStart,9)
        print '  regexStart = %s' % regexStart
        regexListStart.append(regexStart)
    
      regexListEnd = [] # at most one regular expression for going up from last multiple of 10
      lastDigitEnd = end%10
      if lastDigitEnd != 0:
        regexEnd = digitsEnd[:-1] + regexRangeDigits(0,lastDigitEnd-1)
        print '  regexEnd   = %s' % regexEnd
        regexListEnd.append(regexEnd)
    
      regexListMidTrunc = genRangeRegex((start+9)//10, end//10)
      regexListMid = [r+'[0-9]' for r in regexListMidTrunc]
    
      return regexListStart + regexListMid + regexListEnd
    

    And here an example output how the function works:

    >>> genRangeRegex(12,231)
    getting regex list for range [12,231[
      regexStart = 1[2-9]
      regexEnd   = 230
    getting regex list for range [2,23[
      regexStart = [2-9]
      regexEnd   = 2[0-2]
    getting regex list for range [1,2[
      regexAll   = 1
    ['1[2-9]', '[2-9][0-9]', '1[0-9][0-9]', '2[0-2][0-9]', '230']
    
    0 讨论(0)
提交回复
热议问题