发表新帖

发表新帖

Why are uncompiled, repeatedly used regexes so much slower in Python 3?

前端未结

关注

 1  743

When answering this question (and having read this answer to a similar question), I thought that I knew how Python caches regexes.

But then I thought I\'d test it, c

相关标签:

1条回答

Happy的楠姐

2020-12-03 17:45

The code has changed.

In Python 2.7, the cache is a simple dictionary; if more than _MAXCACHE items are stored in it, the whole the cache is cleared before storing a new item. A cache lookup only takes building a simple key and testing the dictionary, see the 2.7 implementation of _compile()

In Python 3.x, the cache has been replaced by the @functools.lru_cache(maxsize=500, typed=True) decorator. This decorator does much more work and includes a thread-lock, adjusting the cache LRU queue and maintaining the cache statistics (accessible via re._compile.cache_info()). See the 3.3.0 implementation of _compile() and of functools.lru_cache().

Others have noticed the same slowdown, and filed issue 16389 in the Python bugtracker. I'd expect 3.4 to be a lot faster again; either the lru_cache implementation is improved or the re module will move to a custom cache again.

Update: With revision 4b4ffffdd670d0 (hg) / 0f606a6 (git) the cache change has been reverted back to the simple version found in 3.1. Python versions 3.2.4 and 3.3.1 include that revision.

Since then, in Python 3.7 the pattern cache was updated to a custom FIFO cache implementation based on a regular dict (relying on insertion order, and unlike a LRU, does not take into account how recently items already in the cache were used when evicting).

0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题