Why is checking isinstance(something, Mapping) so slow?

后端 未结 1 1868
感情败类
感情败类 2021-01-02 19:21

I recently compared the performance of collections.Counter to sorted for comparison checks (if some iterable contains the same elements with the sa

相关标签:
1条回答
  • 2021-01-02 19:41

    The performance is really just tied to a collection of checks in ABCMeta's __instancecheck__, which is called by isinstance.

    The bottom line is that the poor performance witnessed here isn't a result of some missing optimization, but rather just a result of isinstance with abstract base classes being a Python-level operation, as mentioned by Jim. Positive and negative results are cached, but even with cached results you're looking at a few microseconds per loop simply to traverse the conditionals in the __instancecheck__ method of the ABCMeta class.


    An example

    Consider some different empty structures.

    >>> d = dict; l = list(); s = pd.Series()
    
    >>> %timeit isinstance(d, collections.abc.Mapping)
    100000 loops, best of 3: 1.99 µs per loop
    
    >>> %timeit isinstance(l, collections.abc.Mapping)
    100000 loops, best of 3: 3.16 µs per loop # caching happening
    
    >>> %timeit isinstance(s, collections.abc.Mapping)
    100000 loops, best of 3: 3.26 µs per loop # caching happening
    

    We can see the performance discrepancy - what accounts for it?

    For a dict

    >>> %lprun -f abc.ABCMeta.__instancecheck__ isinstance(dict(), collections.abc.Mapping)
    Timer unit: 6.84247e-07 s
    Total time: 1.71062e-05 s
    
    Line #      Hits         Time  Per Hit   % Time  Line Contents
    ==============================================================
       178                                               def __instancecheck__(cls, instance):
       179                                                   """Override for isinstance(instance, cls)."""
       180                                                   # Inline the cache checking
       181         1            7      7.0     28.0          subclass = instance.__class__
       182         1           16     16.0     64.0          if subclass in cls._abc_cache:
       183         1            2      2.0      8.0              return True
       184                                                   subtype = type(instance)
       185                                                   if subtype is subclass:
       186                                                       if (cls._abc_negative_cache_version ==
       187                                                           ABCMeta._abc_invalidation_counter and
       188                                                           subclass in cls._abc_negative_cache):
       189                                                           return False
       190                                                       # Fall back to the subclass check.
       191                                                       return cls.__subclasscheck__(subclass)
       192                                                   return any(cls.__subclasscheck__(c) for c in {subclass, subtype})
    

    For a list

    >>> %lprun -f abc.ABCMeta.__instancecheck__ isinstance(list(), collections.abc.Mapping)
    Timer unit: 6.84247e-07 s
    Total time: 3.07911e-05 s
    
    Line #      Hits         Time  Per Hit   % Time  Line Contents
    ==============================================================
       178                                               def __instancecheck__(cls, instance):
       179                                                   """Override for isinstance(instance, cls)."""
       180                                                   # Inline the cache checking
       181         1            7      7.0     15.6          subclass = instance.__class__
       182         1           17     17.0     37.8          if subclass in cls._abc_cache:
       183                                                       return True
       184         1            2      2.0      4.4          subtype = type(instance)
       185         1            2      2.0      4.4          if subtype is subclass:
       186         1            3      3.0      6.7              if (cls._abc_negative_cache_version ==
       187         1            2      2.0      4.4                  ABCMeta._abc_invalidation_counter and
       188         1           10     10.0     22.2                  subclass in cls._abc_negative_cache):
       189         1            2      2.0      4.4                  return False
       190                                                       # Fall back to the subclass check.
       191                                                       return cls.__subclasscheck__(subclass)
       192                                                   return any(cls.__subclasscheck__(c) for c in {subclass, subtype})
    

    We can see that for a dict, the Mapping abstract classes' _abc_cache

    >>> list(collections.abc.Mapping._abc_cache)
    [dict]
    

    includes our dict, and so the check short-circuits early. For a list evidently the positive cache won't be hit, however the Mapping's _abc_negative_cache contains the list type

    >>> list(collections.abc.Mapping._abc_negative_cache)
    [type,
     list,
     generator,
     pandas.core.series.Series,
     itertools.chain,
     int,
     map]
    

    as well as now the pd.Series type, as a result of calling isinstance more than once with %timeit. In the case that we don't hit the negative cache (like the first iteration for a Series), Python resorts to the regular subclass check with

    cls.__subclasscheck__(subclass)
    

    which can be far slower, resorting to the subclass hook and recursive subclass checks seen here, then caches the result for subsequent speedups.

    0 讨论(0)
提交回复
热议问题