Partial regular expression match

£可爱£侵袭症+ 提交于 2019-12-23 17:01:41

问题


I have a regular expression that I'm testing a input stream of characters. I wonder if there is a way to match the regular expression against the input and determine if it is a partial match that consumes the entire input buffer? I.e. the end of the input buffer is reached before the regexp completes. I would like the implementation to decide whether to wait for more input characters, or abort the operation.

In other words, I need to determine which one is true:

  1. The end of the input buffer was reached before the regexp was matched

    E.g. "foo" =~ /^foobar/

  2. The regular expression matches completely

    E.g. "foobar" =~ /^foobar/

  3. The regular expression failed to match

    E.g. "fuubar" =~ /^foobar

The input is not packetized.


回答1:


Is this the scenario you are solving? You are waiting for a literal string, e.g. 'foobar'. If the user types a partial match, e.g. 'foo', you want to keep waiting. If the input is a non-match you want to exit.

If you are working with literal strings I would just write a loop to to test the characters in sequence. Or,

If (input.Length < target.Length && target.StartsWith(input))
   // keep trying

If you are trying to match more complex regular expressions, I don't know how to do this with regular expressions. But I would start by reading more about how the platform implements regular expressions.

tom




回答2:


I'm not sure if this is your question but.
Regular expressions either match or not. And the expression will match a variable amount of input. So, it can't be determined directly.

However, it is possible, if you believe there is a possibility of overlap, to use a smart buffering scheme to accomplish the same thing.

There are many ways to do this.

One way is to match all that does not match via assertions, up until you get the start of a match (but not the full match you seek). These you simple throw away and clear from your buffer. When you get a match you seek, clear the buffer of that data and data before it.

Example: /(<function.*?>)|([^<]*)/ The part you throw away/clear from the buffer is in group 2 capture buffer.

Another way is if you are matching finite length strings, if you don't match anything in the buffer, you can safely throw away all from the beginning of the buffer to the end of the buffer minus the length of the finite string you are searching for.

Example: Your buffer is 64k in size. You are searching for a string of length 10. It was not found in the buffer. You can safely clear (64k - 10) bytes, retaining the last 10 bytes. Then append (64k-10) bytes to the end of the buffer. Of course you only need a buffer of size 10 bytes, constantly removing/adding 1 character but a larger buffer is more efficient and you could use thresholds to reload more data.

If you can create a buffer that easily contracts/expands, more buffering options are available.



来源:https://stackoverflow.com/questions/4759783/partial-regular-expression-match

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!