Automatic regex builder

前端 未结 3 1226
误落风尘
误落风尘 2020-12-17 05:21

I have N strings. Also, there are K regular expressions, unknown to me. Each string is either matching one of the regular expressions, or it is garbage. There are total of

相关标签:
3条回答
  • 2020-12-17 06:24

    Nothing clever here, perhaps I don't fully understand the problem?

    Why not just always reduce L to 0? Check each string against each regex; if a string doesn't match any of the regex's, it's garbage. if it does match, remember the regex/string(s) that did match and do LCS on each L = 0, K = 1 to deduce each regex's definition.

    0 讨论(0)
  • 2020-12-17 06:25

    The key words in academia are "grammatical inference". Unfortunately, there aren't any efficient, general algorithms to do the sort of thing you're proposing. What's your real problem?

    Edit: it sounds like you might be interested in Data Description Languages. PADS (http://www.padsproj.org/) is a typical example.

    0 讨论(0)
  • 2020-12-17 06:27

    What you are trying to do is language learning or language inference with a twist: instead of generalising over a set of given examples (and possibly counter-examples), you wish to infer a language with a small yet specific grammar.

    I'm not sure how much research is being done on that. However, if you are also interested in finding the minimal (= general) regular expression that accepts all n strings, search for papers on MDL (Minimum Description Length) and FSMs (Finite State Machines).

    Two interesting queries at Google Scholar:

    • "minimum description length" automata
    • "language inference" automata
    0 讨论(0)
提交回复
热议问题