Hashtable/dictionary/map lookup with regular expressions

前端未结

关注

 19  1314

I\'m trying to figure out if there\'s a reasonably efficient way to perform a lookup in a dictionary (or a hash, or a map, or whatever your favorite language calls it) where the

相关标签:

19条回答

迷失自我

2021-02-01 06:19
What happens if you have a dictionary such as
```
regex_dict = { re.compile("foo.*"): 5, re.compile("f.*"): 6 }
```
In this case regex_dict["food"] could legitimately return either 5 or 6.

Even ignoring that problem, there's probably no way to do this efficiently with the regex module. Instead, what you'd need is an internal directed graph or tree structure.
0 讨论(0)
发布评论:

提交评论
- 加载中...
迷失自我

2021-02-01 06:19

The problem has nothing to do with regular expressions - you'd have the same problem with a dictionary with keys as functions of lambdas. So the problem you face is figuring is there a way of classifying your functions to figure which will return true or not and that isn't a search problem because f(x) is not known in general before hand.

Distributed programming or caching answer sets assuming there are common values of x may help.

-- DM

0 讨论(0)
发布评论:

提交评论
- 加载中...
猫巷女王i

2021-02-01 06:22

A special case of this problem came up in the 70s AI languages oriented around deductive databases. The keys in these databases could be patterns with variables -- like regular expressions without the * or | operators. They tended to use fancy extensions of trie structures for indexes. See krep*.lisp in Norvig's Paradigms of AI Programming for the general idea.

0 讨论(0)
发布评论:

提交评论
- 加载中...
半阙折子戏

2021-02-01 06:24

If you have a small set of possible inputs, you can cache the matches as they appear in a second dict and get O(1) for the cached values.

If the set of possible inputs is too big to cache but not infinite, either, you can just keep the last N matches in the cache (check Google for "LRU maps" - least recently used).

If you can't do this, you can try to chop down the number of regexps you have to try by checking a prefix or somesuch.

0 讨论(0)
发布评论:

提交评论
- 加载中...
温柔的废话

2021-02-01 06:24
I created this exact data structure for a project once. I implemented it naively, as you suggested. I did make two immensely helpful optimizations, which may or may not be feasible for you, depending on the size of your data:
- Memoizing the hash lookups
- Pre-seeding the the memoization table (not sure what to call this... warming up the cache?)
To avoid the problem of multiple keys matching the input, I gave each regex key a priority and the highest priority was used.
0 讨论(0)
发布评论:

提交评论
- 加载中...
没有蜡笔的小新

2021-02-01 06:25

The fundamental assumption is flawed, I think. you can't map hashes to regular expressions.

0 讨论(0)
发布评论:

提交评论
- 加载中...