Convert integer numeric interval to regex

后端 未结 3 1189
轻奢々
轻奢々 2020-12-06 21:48

SO,

I\'m looking for a solution about the problem - how to convert integer interval to regex. Suppose I have two numbers, A and B. Both of

相关标签:
3条回答
  • 2020-12-06 22:32

    Why use regex in this situation?

    I would just do this:

    boolean isBetween = num > A && num < B;

    (Code written in Java)

    Far easier, a regex like what you're asking for could be huge and using it in this situation would be pointless and inefficient.

    Good Luck.

    If you truly insist on using RegEx for this task, see this website, run the regex with verbose mode on and it will explain to you how the author's RegEx works.

    0 讨论(0)
  • 2020-12-06 22:42

    I've done with this (in PHP):

    class Converter
    {
        const REGEXP_OR     = '|';
        const REGEXP_START  = '^';
        const REGEXP_END    = '$';
    
        protected $sStart;
        protected $sEnd;
        function __construct($mStart, $mEnd=null)
        {
            if(is_array($mStart) && count($mStart)>1)
            {
                $this->sStart = (string)($mStart[0]);
                $this->sEnd   = (string)($mStart[1]);
            }
            else
            {
                $this->sStart = (string)($mStart);
                $this->sEnd   = (string)($mEnd);
            }
            if((int)($mStart)>(int)($mEnd))
            {
                $this->sStart = $this->sEnd = null;
            }
        }
    
        public function getRegexp()
        {
            return self::REGEXP_START.$this->_get_regexp_by_range($this->sStart, $this->sEnd).self::REGEXP_END;
        }
    
        protected function _get_regexp_by_range($sStart, $sEnd, $sOr=self::REGEXP_OR, $sFrom=self::REGEXP_START, $sTill=self::REGEXP_END)
        {
           if(!isset($sStart) || !isset($sEnd))
           {
               return null;
           }
           if((int)($sStart)>(int)($sEnd))
           {
              return null;
           }
           elseif($sStart==$sEnd)
           {
              return $sStart;
           }
           elseif(strlen($sEnd)>strlen($sStart))
           {
              $rgRegexp  = array($this->_get_regexp_by_range($sStart, str_repeat('9', strlen($sStart))));
              for($i=strlen($sStart)+1; $i<strlen($sEnd)-1; $i++)
              {
                 $rgRegexp[] = $this->_get_regexp_by_range('1'.str_repeat('0', $i), str_repeat('9', $i+1));
              }
              $rgRegexp[] = $this->_get_regexp_by_range('1'.str_repeat('0', strlen($sEnd)-1), $sEnd);
              return join($sTill.$sOr.$sFrom, $rgRegexp);
           }
           else
           {
              $rgRegexp   = array();
              for($iIntersect=0;$iIntersect<strlen($sStart);$iIntersect++)
              {
                 if($sStart[$iIntersect]!=$sEnd[$iIntersect])
                 {
                    break;
                 }
              }
              if($iIntersect)
              {
                 return join($sTill.$sOr.$sFrom, array_map(function($sItem) use ($iIntersect, $sStart)
                 {
                    return substr($sStart, 0, $iIntersect).$sItem;
                 }, explode($sTill.$sOr.$sFrom, $this->_get_regexp_by_range(substr($sStart, $iIntersect), substr($sEnd, $iIntersect)))));
              }
              else
              {
                 $rgRegexp = array($sStart);
                 for($iPos=strlen($sStart)-1; $iPos>0; $iPos--)
                 {
                    if($sStart[$iPos]+1<10)
                    {
                       $rgRegexp[]=substr($sStart, 0, $iPos).'['.($sStart[$iPos]+1).'-'.'9'.']'.str_repeat('[0-9]', strlen($sStart)-$iPos-1);
                    }
                 }
                 if(($sStart[0]+1)<($sEnd[0]-1))
                 {
                    $rgRegexp[]='['.($sStart[0]+1).'-'.($sEnd[0]-1).']'.str_repeat('[0-9]', strlen($sStart)-1);
                 }
                 elseif((int)($sStart[0])+1==(int)($sEnd[0])-1)
                 {
                    $rgRegexp[]=($sStart[0]+1).str_repeat('[0-9]', strlen($sStart)-1);
                 }
                 for($iPos=1; $iPos<strlen($sEnd); $iPos++)
                 {
                    if($sEnd[$iPos]-1>=0)
                    {
                      $rgRegexp[]=substr($sEnd,0, $iPos).'['.'0'.'-'.($sEnd[$iPos]-1).']'.str_repeat('[0-9]', strlen($sEnd)-$iPos-1);
                    }
                 }
                 $rgRegexp[]=$sEnd;
                 return join($sTill.$sOr.$sFrom, $rgRegexp);
              }
           }
        }
    }
    

    then, it get correct results with any strings, but I think the resulting regex is not the best one.

    $sPattern = (new Converter('1', '1000000000'))->getRegexp();
    var_dump(
       preg_match('/'.$sPattern.'/', '10000000000'), 
       preg_match('/'.$sPattern.'/', '100000000'));
    

    anyway, thanks a lot to all who answered.

    0 讨论(0)
  • 2020-12-06 22:43

    As others have already told you, this is not a very good idea. It will not be faster than just matching all integers and filter them afterwards. But I will answer your question anyway.

    Depending on how large the interval is you can let the regex engine optimize it for you, so you just output a |-separated list of values. This can be minimized algorithmically with basic algorithms from finite automata theory.

    This may be too memory-intensive for large intervals. In that case you can match all numbers of different lengths from A and B in one go. In your example, all numbers of 6-7 digits are easily matched with [0-9][1-9]{5,6}. Now you have the border cases left, which you can create recursively by (for the A side in this case, I have not included the base case of the recursion):

    1. Let S be A.
    2. Let f be first digit of S, g=f+1, and n be (digits of S)-1
    3. Add a segment to the regex for digits larger than f of: [g-9][0-9]{n}
    4. Add a segment for numbers starting with f: f(recursive call starting from step 2, with S=the rest of digits of S)

    So for A=123 we would end up with something like (spaces only added for "readability"):

    ([2-9][0-9]{2}) | (1(([3-9][0-9]{1}) | (2(([4-9]) | 3))) )
    
    0 讨论(0)
提交回复
热议问题