Hashtable/dictionary/map lookup with regular expressions

前端 未结 19 1314
难免孤独
难免孤独 2021-02-01 05:36

I\'m trying to figure out if there\'s a reasonably efficient way to perform a lookup in a dictionary (or a hash, or a map, or whatever your favorite language calls it) where the

相关标签:
19条回答
  • 2021-02-01 06:19

    What happens if you have a dictionary such as

    regex_dict = { re.compile("foo.*"): 5, re.compile("f.*"): 6 }
    

    In this case regex_dict["food"] could legitimately return either 5 or 6.

    Even ignoring that problem, there's probably no way to do this efficiently with the regex module. Instead, what you'd need is an internal directed graph or tree structure.

    0 讨论(0)
  • 2021-02-01 06:19

    The problem has nothing to do with regular expressions - you'd have the same problem with a dictionary with keys as functions of lambdas. So the problem you face is figuring is there a way of classifying your functions to figure which will return true or not and that isn't a search problem because f(x) is not known in general before hand.

    Distributed programming or caching answer sets assuming there are common values of x may help.

    -- DM

    0 讨论(0)
  • 2021-02-01 06:22

    A special case of this problem came up in the 70s AI languages oriented around deductive databases. The keys in these databases could be patterns with variables -- like regular expressions without the * or | operators. They tended to use fancy extensions of trie structures for indexes. See krep*.lisp in Norvig's Paradigms of AI Programming for the general idea.

    0 讨论(0)
  • 2021-02-01 06:24

    If you have a small set of possible inputs, you can cache the matches as they appear in a second dict and get O(1) for the cached values.

    If the set of possible inputs is too big to cache but not infinite, either, you can just keep the last N matches in the cache (check Google for "LRU maps" - least recently used).

    If you can't do this, you can try to chop down the number of regexps you have to try by checking a prefix or somesuch.

    0 讨论(0)
  • 2021-02-01 06:24

    I created this exact data structure for a project once. I implemented it naively, as you suggested. I did make two immensely helpful optimizations, which may or may not be feasible for you, depending on the size of your data:

    • Memoizing the hash lookups
    • Pre-seeding the the memoization table (not sure what to call this... warming up the cache?)

    To avoid the problem of multiple keys matching the input, I gave each regex key a priority and the highest priority was used.

    0 讨论(0)
  • 2021-02-01 06:25

    The fundamental assumption is flawed, I think. you can't map hashes to regular expressions.

    0 讨论(0)
提交回复
热议问题