Python performance: Try-except or not in?

前端 未结 5 770
南笙
南笙 2020-12-19 19:32

In one of my classes I have a number of methods that all draw values from the same dictionaries. However, if one of the methods tries to access a value that isn\'t there, it

相关标签:
5条回答
  • 2020-12-19 20:06

    If it is exceptional, use an exception. If you expect the key to be in there, use try/except, if you don't know whether the key is in there, use not in.

    0 讨论(0)
  • 2020-12-19 20:10

    Checking if a key exists is cheaper or at least as cheap as retrieving it. So use the if not in solution which is much cleaner and more readable.

    According to your question a key not existing is not an error-like case so there's no good reason to let python raise an exception (even though you catch it immediately), and if you have a if not in check, everyone knows your intention - to get the existing value or otherwise generate it.

    0 讨论(0)
  • 2020-12-19 20:11

    I believe the .get() method of a dict has a parameter for setting the default value. You could use that and have it in one line. I'm not sure how it affects performance though.

    0 讨论(0)
  • 2020-12-19 20:22

    It's a delicate problem to time this because you need care to avoid "lasting side effects" and the performance tradeoff depends on the % of missing keys. So, consider a dil.py file as follows:

    def make(percentmissing):
      global d
      d = dict.fromkeys(range(100-percentmissing), 1)
    
    def addit(d, k):
      d[k] = k
    
    def with_in():
      dc = d.copy()
      for k in range(100):
        if k not in dc:
          addit(dc, k)
        lc = dc[k]
    
    def with_ex():
      dc = d.copy()
      for k in range(100):
        try: lc = dc[k]
        except KeyError:
          addit(dc, k)
          lc = dc[k]
    
    def with_ge():
      dc = d.copy()
      for k in range(100):
        lc = dc.get(k)
        if lc is None:
          addit(dc, k)
          lc = dc[k]
    

    and a series of timeit calls such as:

    $ python -mtimeit -s'import dil; dil.make(10)' 'dil.with_in()'
    10000 loops, best of 3: 28 usec per loop
    $ python -mtimeit -s'import dil; dil.make(10)' 'dil.with_ex()'
    10000 loops, best of 3: 41.7 usec per loop
    $ python -mtimeit -s'import dil; dil.make(10)' 'dil.with_ge()'
    10000 loops, best of 3: 46.6 usec per loop
    

    this shows that, with 10% missing keys, the in check is substantially the fastest way.

    $ python -mtimeit -s'import dil; dil.make(1)' 'dil.with_in()'
    10000 loops, best of 3: 24.6 usec per loop
    $ python -mtimeit -s'import dil; dil.make(1)' 'dil.with_ex()'
    10000 loops, best of 3: 23.4 usec per loop
    $ python -mtimeit -s'import dil; dil.make(1)' 'dil.with_ge()'
    10000 loops, best of 3: 42.7 usec per loop
    

    with just 1% missing keys, the exception approach is marginally fastest (and the get approach remains the slowest one in either case).

    So, for optimal performance, unless the vast majority (99%+) of lookups is going to succeed, the in approach is preferable.

    Of course, there's another, elegant possibility: adding a dict subclass like...:

    class dd(dict):
       def __init__(self, *a, **k):
         dict.__init__(self, *a, **k)
       def __missing__(self, k):
         addit(self, k)
         return self[k]
    
    def with_dd():
      dc = dd(d)
      for k in range(100):
        lc = dc[k]
    

    However...:

    $ python -mtimeit -s'import dil; dil.make(1)' 'dil.with_dd()'
    10000 loops, best of 3: 46.1 usec per loop
    $ python -mtimeit -s'import dil; dil.make(10)' 'dil.with_dd()'
    10000 loops, best of 3: 55 usec per loop
    

    ...while slick indeed, this is not a performance winner -- it's about even with the get approach, or slower, just with much nicer-looking code to use it. (defaultdict, semantically analogous to this dd class, would be a performance win if it was applicable, but that's because the __missing__ special method, in that case, is implemented in well optimized C code).

    0 讨论(0)
  • 2020-12-19 20:24

    When in doubt, profile.

    Run a test to see if, in your environment, one runs faster than another.

    0 讨论(0)
提交回复
热议问题