Dictionary vs Object - which is more efficient and why?

后端 未结 8 774
爱一瞬间的悲伤
爱一瞬间的悲伤 2020-11-29 15:55

What is more efficient in Python in terms of memory usage and CPU consumption - Dictionary or Object?

Background: I have to load huge amount of data

相关标签:
8条回答
  • 2020-11-29 16:17

    Have you tried using __slots__?

    From the documentation:

    By default, instances of both old and new-style classes have a dictionary for attribute storage. This wastes space for objects having very few instance variables. The space consumption can become acute when creating large numbers of instances.

    The default can be overridden by defining __slots__ in a new-style class definition. The __slots__ declaration takes a sequence of instance variables and reserves just enough space in each instance to hold a value for each variable. Space is saved because __dict__ is not created for each instance.

    So does this save time as well as memory?

    Comparing the three approaches on my computer:

    test_slots.py:

    class Obj(object):
      __slots__ = ('i', 'l')
      def __init__(self, i):
        self.i = i
        self.l = []
    all = {}
    for i in range(1000000):
      all[i] = Obj(i)
    

    test_obj.py:

    class Obj(object):
      def __init__(self, i):
        self.i = i
        self.l = []
    all = {}
    for i in range(1000000):
      all[i] = Obj(i)
    

    test_dict.py:

    all = {}
    for i in range(1000000):
      o = {}
      o['i'] = i
      o['l'] = []
      all[i] = o
    

    test_namedtuple.py (supported in 2.6):

    import collections
    
    Obj = collections.namedtuple('Obj', 'i l')
    
    all = {}
    for i in range(1000000):
      all[i] = Obj(i, [])
    

    Run benchmark (using CPython 2.5):

    $ lshw | grep product | head -n 1
              product: Intel(R) Pentium(R) M processor 1.60GHz
    $ python --version
    Python 2.5
    $ time python test_obj.py && time python test_dict.py && time python test_slots.py 
    
    real    0m27.398s (using 'normal' object)
    real    0m16.747s (using __dict__)
    real    0m11.777s (using __slots__)
    

    Using CPython 2.6.2, including the named tuple test:

    $ python --version
    Python 2.6.2
    $ time python test_obj.py && time python test_dict.py && time python test_slots.py && time python test_namedtuple.py 
    
    real    0m27.197s (using 'normal' object)
    real    0m17.657s (using __dict__)
    real    0m12.249s (using __slots__)
    real    0m12.262s (using namedtuple)
    

    So yes (not really a surprise), using __slots__ is a performance optimization. Using a named tuple has similar performance to __slots__.

    0 讨论(0)
  • 2020-11-29 16:17

    Here is a copy of @hughdbrown answer for python 3.6.1, I've made the count 5x larger and added some code to test the memory footprint of the python process at the end of each run.

    Before the downvoters have at it, Be advised that this method of counting the size of objects is not accurate.

    from datetime import datetime
    import os
    import psutil
    
    process = psutil.Process(os.getpid())
    
    
    ITER_COUNT = 1000 * 1000 * 5
    
    RESULT=None
    
    def makeL(i):
        # Use this line to negate the effect of the strings on the test 
        # return "Python is smart and will only create one string with this line"
    
        # Use this if you want to see the difference with 5 million unique strings
        return "This is a sample string %s" % i
    
    def timeit(method):
        def timed(*args, **kw):
            global RESULT
            s = datetime.now()
            RESULT = method(*args, **kw)
            e = datetime.now()
    
            sizeMb = process.memory_info().rss / 1024 / 1024
            sizeMbStr = "{0:,}".format(round(sizeMb, 2))
    
            print('Time Taken = %s, \t%s, \tSize = %s' % (e - s, method.__name__, sizeMbStr))
    
        return timed
    
    class Obj(object):
        def __init__(self, i):
           self.i = i
           self.l = makeL(i)
    
    class SlotObj(object):
        __slots__ = ('i', 'l')
        def __init__(self, i):
           self.i = i
           self.l = makeL(i)
    
    from collections import namedtuple
    NT = namedtuple("NT", ["i", 'l'])
    
    @timeit
    def profile_dict_of_nt():
        return [NT(i=i, l=makeL(i)) for i in range(ITER_COUNT)]
    
    @timeit
    def profile_list_of_nt():
        return dict((i, NT(i=i, l=makeL(i))) for i in range(ITER_COUNT))
    
    @timeit
    def profile_dict_of_dict():
        return dict((i, {'i': i, 'l': makeL(i)}) for i in range(ITER_COUNT))
    
    @timeit
    def profile_list_of_dict():
        return [{'i': i, 'l': makeL(i)} for i in range(ITER_COUNT)]
    
    @timeit
    def profile_dict_of_obj():
        return dict((i, Obj(i)) for i in range(ITER_COUNT))
    
    @timeit
    def profile_list_of_obj():
        return [Obj(i) for i in range(ITER_COUNT)]
    
    @timeit
    def profile_dict_of_slot():
        return dict((i, SlotObj(i)) for i in range(ITER_COUNT))
    
    @timeit
    def profile_list_of_slot():
        return [SlotObj(i) for i in range(ITER_COUNT)]
    
    profile_dict_of_nt()
    profile_list_of_nt()
    profile_dict_of_dict()
    profile_list_of_dict()
    profile_dict_of_obj()
    profile_list_of_obj()
    profile_dict_of_slot()
    profile_list_of_slot()
    

    And these are my results

    Time Taken = 0:00:07.018720,    provile_dict_of_nt,     Size = 951.83
    Time Taken = 0:00:07.716197,    provile_list_of_nt,     Size = 1,084.75
    Time Taken = 0:00:03.237139,    profile_dict_of_dict,   Size = 1,926.29
    Time Taken = 0:00:02.770469,    profile_list_of_dict,   Size = 1,778.58
    Time Taken = 0:00:07.961045,    profile_dict_of_obj,    Size = 1,537.64
    Time Taken = 0:00:05.899573,    profile_list_of_obj,    Size = 1,458.05
    Time Taken = 0:00:06.567684,    profile_dict_of_slot,   Size = 1,035.65
    Time Taken = 0:00:04.925101,    profile_list_of_slot,   Size = 887.49
    

    My conclusion is:

    1. Slots have the best memory footprint and are reasonable on speed.
    2. dicts are the fastest, but use the most memory.
    0 讨论(0)
  • 2020-11-29 16:21

    Here are my test runs of the very nice script of @Jarrod-Chesney. For comparison, I also run it against python2 with "range" replaced by "xrange".

    By curiosity, I also added similar tests with OrderedDict (ordict) for comparison.

    Python 3.6.9:

    Time Taken = 0:00:04.971369,    profile_dict_of_nt,     Size = 944.27
    Time Taken = 0:00:05.743104,    profile_list_of_nt,     Size = 1,066.93
    Time Taken = 0:00:02.524507,    profile_dict_of_dict,   Size = 1,920.35
    Time Taken = 0:00:02.123801,    profile_list_of_dict,   Size = 1,760.9
    Time Taken = 0:00:05.374294,    profile_dict_of_obj,    Size = 1,532.12
    Time Taken = 0:00:04.517245,    profile_list_of_obj,    Size = 1,441.04
    Time Taken = 0:00:04.590298,    profile_dict_of_slot,   Size = 1,030.09
    Time Taken = 0:00:04.197425,    profile_list_of_slot,   Size = 870.67
    
    Time Taken = 0:00:08.833653,    profile_ordict_of_ordict, Size = 3,045.52
    Time Taken = 0:00:11.539006,    profile_list_of_ordict, Size = 2,722.34
    Time Taken = 0:00:06.428105,    profile_ordict_of_obj,  Size = 1,799.29
    Time Taken = 0:00:05.559248,    profile_ordict_of_slot, Size = 1,257.75
    

    Python 2.7.15+:

    Time Taken = 0:00:05.193900,    profile_dict_of_nt,     Size = 906.0
    Time Taken = 0:00:05.860978,    profile_list_of_nt,     Size = 1,177.0
    Time Taken = 0:00:02.370905,    profile_dict_of_dict,   Size = 2,228.0
    Time Taken = 0:00:02.100117,    profile_list_of_dict,   Size = 2,036.0
    Time Taken = 0:00:08.353666,    profile_dict_of_obj,    Size = 2,493.0
    Time Taken = 0:00:07.441747,    profile_list_of_obj,    Size = 2,337.0
    Time Taken = 0:00:06.118018,    profile_dict_of_slot,   Size = 1,117.0
    Time Taken = 0:00:04.654888,    profile_list_of_slot,   Size = 964.0
    
    Time Taken = 0:00:59.576874,    profile_ordict_of_ordict, Size = 7,427.0
    Time Taken = 0:10:25.679784,    profile_list_of_ordict, Size = 11,305.0
    Time Taken = 0:05:47.289230,    profile_ordict_of_obj,  Size = 11,477.0
    Time Taken = 0:00:51.485756,    profile_ordict_of_slot, Size = 11,193.0
    

    So, on both major versions, the conclusions of @Jarrod-Chesney are still looking good.

    0 讨论(0)
  • 2020-11-29 16:24

    Have you considered using a namedtuple? (link for python 2.4/2.5)

    It's the new standard way of representing structured data that gives you the performance of a tuple and the convenience of a class.

    It's only downside compared with dictionaries is that (like tuples) it doesn't give you the ability to change attributes after creation.

    0 讨论(0)
  • 2020-11-29 16:35

    There is no question.
    You have data, with no other attributes (no methods, nothing). Hence you have a data container (in this case, a dictionary).

    I usually prefer to think in terms of data modeling. If there is some huge performance issue, then I can give up something in the abstraction, but only with very good reasons.
    Programming is all about managing complexity, and the maintaining the correct abstraction is very often one of the most useful way to achieve such result.

    About the reasons an object is slower, I think your measurement is not correct.
    You are performing too little assignments inside the for loop, and therefore what you see there is the different time necessary to instantiate a dict (intrinsic object) and a "custom" object. Although from the language perspective they are the same, they have quite a different implementation.
    After that, the assignment time should be almost the same for both, as in the end members are maintained inside a dictionary.

    0 讨论(0)
  • 2020-11-29 16:37
    from datetime import datetime
    
    ITER_COUNT = 1000 * 1000
    
    def timeit(method):
        def timed(*args, **kw):
            s = datetime.now()
            result = method(*args, **kw)
            e = datetime.now()
    
            print method.__name__, '(%r, %r)' % (args, kw), e - s
            return result
        return timed
    
    class Obj(object):
        def __init__(self, i):
           self.i = i
           self.l = []
    
    class SlotObj(object):
        __slots__ = ('i', 'l')
        def __init__(self, i):
           self.i = i
           self.l = []
    
    @timeit
    def profile_dict_of_dict():
        return dict((i, {'i': i, 'l': []}) for i in xrange(ITER_COUNT))
    
    @timeit
    def profile_list_of_dict():
        return [{'i': i, 'l': []} for i in xrange(ITER_COUNT)]
    
    @timeit
    def profile_dict_of_obj():
        return dict((i, Obj(i)) for i in xrange(ITER_COUNT))
    
    @timeit
    def profile_list_of_obj():
        return [Obj(i) for i in xrange(ITER_COUNT)]
    
    @timeit
    def profile_dict_of_slotobj():
        return dict((i, SlotObj(i)) for i in xrange(ITER_COUNT))
    
    @timeit
    def profile_list_of_slotobj():
        return [SlotObj(i) for i in xrange(ITER_COUNT)]
    
    if __name__ == '__main__':
        profile_dict_of_dict()
        profile_list_of_dict()
        profile_dict_of_obj()
        profile_list_of_obj()
        profile_dict_of_slotobj()
        profile_list_of_slotobj()
    

    Results:

    hbrown@hbrown-lpt:~$ python ~/Dropbox/src/StackOverflow/1336791.py 
    profile_dict_of_dict ((), {}) 0:00:08.228094
    profile_list_of_dict ((), {}) 0:00:06.040870
    profile_dict_of_obj ((), {}) 0:00:11.481681
    profile_list_of_obj ((), {}) 0:00:10.893125
    profile_dict_of_slotobj ((), {}) 0:00:06.381897
    profile_list_of_slotobj ((), {}) 0:00:05.860749
    
    0 讨论(0)
提交回复
热议问题