Automatic regex builder

前端未结

关注

 3  1226

I have N strings. Also, there are K regular expressions, unknown to me. Each string is either matching one of the regular expressions, or it is garbage. There are total of

相关标签:

3条回答

忘了有多久

2020-12-17 06:24

Nothing clever here, perhaps I don't fully understand the problem?

Why not just always reduce L to 0? Check each string against each regex; if a string doesn't match any of the regex's, it's garbage. if it does match, remember the regex/string(s) that did match and do LCS on each L = 0, K = 1 to deduce each regex's definition.

0 讨论(0)
发布评论:

提交评论
- 加载中...
慢半拍i

2020-12-17 06:25

The key words in academia are "grammatical inference". Unfortunately, there aren't any efficient, general algorithms to do the sort of thing you're proposing. What's your real problem?

Edit: it sounds like you might be interested in Data Description Languages. PADS (http://www.padsproj.org/) is a typical example.

0 讨论(0)
发布评论:

提交评论
- 加载中...
长情又很酷

2020-12-17 06:27
What you are trying to do is language learning or language inference with a twist: instead of generalising over a set of given examples (and possibly counter-examples), you wish to infer a language with a small yet specific grammar.

I'm not sure how much research is being done on that. However, if you are also interested in finding the minimal (= general) regular expression that accepts all n strings, search for papers on MDL (Minimum Description Length) and FSMs (Finite State Machines).

Two interesting queries at Google Scholar:
- "minimum description length" automata
- "language inference" automata
0 讨论(0)
发布评论:

提交评论
- 加载中...