Dictionary vs Object - which is more efficient and why?

后端未结

关注

 8  774

爱一瞬间的悲伤

What is more efficient in Python in terms of memory usage and CPU consumption - Dictionary or Object?

Background: I have to load huge amount of data

相关标签:

8条回答

说谎

2020-11-29 16:17

Have you tried using __slots__?

From the documentation:

By default, instances of both old and new-style classes have a dictionary for attribute storage. This wastes space for objects having very few instance variables. The space consumption can become acute when creating large numbers of instances.

The default can be overridden by defining __slots__ in a new-style class definition. The __slots__ declaration takes a sequence of instance variables and reserves just enough space in each instance to hold a value for each variable. Space is saved because __dict__ is not created for each instance.

So does this save time as well as memory?

Comparing the three approaches on my computer:

test_slots.py:

class Obj(object):
  __slots__ = ('i', 'l')
  def __init__(self, i):
    self.i = i
    self.l = []
all = {}
for i in range(1000000):
  all[i] = Obj(i)

test_obj.py:

class Obj(object):
  def __init__(self, i):
    self.i = i
    self.l = []
all = {}
for i in range(1000000):
  all[i] = Obj(i)

test_dict.py:

all = {}
for i in range(1000000):
  o = {}
  o['i'] = i
  o['l'] = []
  all[i] = o

test_namedtuple.py (supported in 2.6):

import collections

Obj = collections.namedtuple('Obj', 'i l')

all = {}
for i in range(1000000):
  all[i] = Obj(i, [])

Run benchmark (using CPython 2.5):

$ lshw | grep product | head -n 1
          product: Intel(R) Pentium(R) M processor 1.60GHz
$ python --version
Python 2.5
$ time python test_obj.py && time python test_dict.py && time python test_slots.py 

real    0m27.398s (using 'normal' object)
real    0m16.747s (using __dict__)
real    0m11.777s (using __slots__)

Using CPython 2.6.2, including the named tuple test:

$ python --version
Python 2.6.2
$ time python test_obj.py && time python test_dict.py && time python test_slots.py && time python test_namedtuple.py 

real    0m27.197s (using 'normal' object)
real    0m17.657s (using __dict__)
real    0m12.249s (using __slots__)
real    0m12.262s (using namedtuple)

So yes (not really a surprise), using __slots__ is a performance optimization. Using a named tuple has similar performance to __slots__.

0 讨论(0)

北恋

2020-11-29 16:17

Here is a copy of @hughdbrown answer for python 3.6.1, I've made the count 5x larger and added some code to test the memory footprint of the python process at the end of each run.

Before the downvoters have at it, Be advised that this method of counting the size of objects is not accurate.

from datetime import datetime
import os
import psutil

process = psutil.Process(os.getpid())


ITER_COUNT = 1000 * 1000 * 5

RESULT=None

def makeL(i):
    # Use this line to negate the effect of the strings on the test 
    # return "Python is smart and will only create one string with this line"

    # Use this if you want to see the difference with 5 million unique strings
    return "This is a sample string %s" % i

def timeit(method):
    def timed(*args, **kw):
        global RESULT
        s = datetime.now()
        RESULT = method(*args, **kw)
        e = datetime.now()

        sizeMb = process.memory_info().rss / 1024 / 1024
        sizeMbStr = "{0:,}".format(round(sizeMb, 2))

        print('Time Taken = %s, \t%s, \tSize = %s' % (e - s, method.__name__, sizeMbStr))

    return timed

class Obj(object):
    def __init__(self, i):
       self.i = i
       self.l = makeL(i)

class SlotObj(object):
    __slots__ = ('i', 'l')
    def __init__(self, i):
       self.i = i
       self.l = makeL(i)

from collections import namedtuple
NT = namedtuple("NT", ["i", 'l'])

@timeit
def profile_dict_of_nt():
    return [NT(i=i, l=makeL(i)) for i in range(ITER_COUNT)]

@timeit
def profile_list_of_nt():
    return dict((i, NT(i=i, l=makeL(i))) for i in range(ITER_COUNT))

@timeit
def profile_dict_of_dict():
    return dict((i, {'i': i, 'l': makeL(i)}) for i in range(ITER_COUNT))

@timeit
def profile_list_of_dict():
    return [{'i': i, 'l': makeL(i)} for i in range(ITER_COUNT)]

@timeit
def profile_dict_of_obj():
    return dict((i, Obj(i)) for i in range(ITER_COUNT))

@timeit
def profile_list_of_obj():
    return [Obj(i) for i in range(ITER_COUNT)]

@timeit
def profile_dict_of_slot():
    return dict((i, SlotObj(i)) for i in range(ITER_COUNT))

@timeit
def profile_list_of_slot():
    return [SlotObj(i) for i in range(ITER_COUNT)]

profile_dict_of_nt()
profile_list_of_nt()
profile_dict_of_dict()
profile_list_of_dict()
profile_dict_of_obj()
profile_list_of_obj()
profile_dict_of_slot()
profile_list_of_slot()

And these are my results

Time Taken = 0:00:07.018720,    provile_dict_of_nt,     Size = 951.83
Time Taken = 0:00:07.716197,    provile_list_of_nt,     Size = 1,084.75
Time Taken = 0:00:03.237139,    profile_dict_of_dict,   Size = 1,926.29
Time Taken = 0:00:02.770469,    profile_list_of_dict,   Size = 1,778.58
Time Taken = 0:00:07.961045,    profile_dict_of_obj,    Size = 1,537.64
Time Taken = 0:00:05.899573,    profile_list_of_obj,    Size = 1,458.05
Time Taken = 0:00:06.567684,    profile_dict_of_slot,   Size = 1,035.65
Time Taken = 0:00:04.925101,    profile_list_of_slot,   Size = 887.49

My conclusion is:

Slots have the best memory footprint and are reasonable on speed.
dicts are the fastest, but use the most memory.

0 讨论(0)

暗喜

2020-11-29 16:21

Here are my test runs of the very nice script of @Jarrod-Chesney. For comparison, I also run it against python2 with "range" replaced by "xrange".

By curiosity, I also added similar tests with OrderedDict (ordict) for comparison.

Python 3.6.9:

Time Taken = 0:00:04.971369,    profile_dict_of_nt,     Size = 944.27
Time Taken = 0:00:05.743104,    profile_list_of_nt,     Size = 1,066.93
Time Taken = 0:00:02.524507,    profile_dict_of_dict,   Size = 1,920.35
Time Taken = 0:00:02.123801,    profile_list_of_dict,   Size = 1,760.9
Time Taken = 0:00:05.374294,    profile_dict_of_obj,    Size = 1,532.12
Time Taken = 0:00:04.517245,    profile_list_of_obj,    Size = 1,441.04
Time Taken = 0:00:04.590298,    profile_dict_of_slot,   Size = 1,030.09
Time Taken = 0:00:04.197425,    profile_list_of_slot,   Size = 870.67

Time Taken = 0:00:08.833653,    profile_ordict_of_ordict, Size = 3,045.52
Time Taken = 0:00:11.539006,    profile_list_of_ordict, Size = 2,722.34
Time Taken = 0:00:06.428105,    profile_ordict_of_obj,  Size = 1,799.29
Time Taken = 0:00:05.559248,    profile_ordict_of_slot, Size = 1,257.75

Python 2.7.15+:

Time Taken = 0:00:05.193900,    profile_dict_of_nt,     Size = 906.0
Time Taken = 0:00:05.860978,    profile_list_of_nt,     Size = 1,177.0
Time Taken = 0:00:02.370905,    profile_dict_of_dict,   Size = 2,228.0
Time Taken = 0:00:02.100117,    profile_list_of_dict,   Size = 2,036.0
Time Taken = 0:00:08.353666,    profile_dict_of_obj,    Size = 2,493.0
Time Taken = 0:00:07.441747,    profile_list_of_obj,    Size = 2,337.0
Time Taken = 0:00:06.118018,    profile_dict_of_slot,   Size = 1,117.0
Time Taken = 0:00:04.654888,    profile_list_of_slot,   Size = 964.0

Time Taken = 0:00:59.576874,    profile_ordict_of_ordict, Size = 7,427.0
Time Taken = 0:10:25.679784,    profile_list_of_ordict, Size = 11,305.0
Time Taken = 0:05:47.289230,    profile_ordict_of_obj,  Size = 11,477.0
Time Taken = 0:00:51.485756,    profile_ordict_of_slot, Size = 11,193.0

So, on both major versions, the conclusions of @Jarrod-Chesney are still looking good.

0 讨论(0)

耶瑟儿～

2020-11-29 16:24

Have you considered using a namedtuple? (link for python 2.4/2.5)

It's the new standard way of representing structured data that gives you the performance of a tuple and the convenience of a class.

It's only downside compared with dictionaries is that (like tuples) it doesn't give you the ability to change attributes after creation.

0 讨论(0)
发布评论:

提交评论
- 加载中...
小蘑菇

2020-11-29 16:35

There is no question.
You have data, with no other attributes (no methods, nothing). Hence you have a data container (in this case, a dictionary).

I usually prefer to think in terms of data modeling. If there is some huge performance issue, then I can give up something in the abstraction, but only with very good reasons.
Programming is all about managing complexity, and the maintaining the correct abstraction is very often one of the most useful way to achieve such result.

About the reasons an object is slower, I think your measurement is not correct.
You are performing too little assignments inside the for loop, and therefore what you see there is the different time necessary to instantiate a dict (intrinsic object) and a "custom" object. Although from the language perspective they are the same, they have quite a different implementation.
After that, the assignment time should be almost the same for both, as in the end members are maintained inside a dictionary.

0 讨论(0)
发布评论:

提交评论
- 加载中...

情书的邮戳

2020-11-29 16:37

from datetime import datetime

ITER_COUNT = 1000 * 1000

def timeit(method):
    def timed(*args, **kw):
        s = datetime.now()
        result = method(*args, **kw)
        e = datetime.now()

        print method.__name__, '(%r, %r)' % (args, kw), e - s
        return result
    return timed

class Obj(object):
    def __init__(self, i):
       self.i = i
       self.l = []

class SlotObj(object):
    __slots__ = ('i', 'l')
    def __init__(self, i):
       self.i = i
       self.l = []

@timeit
def profile_dict_of_dict():
    return dict((i, {'i': i, 'l': []}) for i in xrange(ITER_COUNT))

@timeit
def profile_list_of_dict():
    return [{'i': i, 'l': []} for i in xrange(ITER_COUNT)]

@timeit
def profile_dict_of_obj():
    return dict((i, Obj(i)) for i in xrange(ITER_COUNT))

@timeit
def profile_list_of_obj():
    return [Obj(i) for i in xrange(ITER_COUNT)]

@timeit
def profile_dict_of_slotobj():
    return dict((i, SlotObj(i)) for i in xrange(ITER_COUNT))

@timeit
def profile_list_of_slotobj():
    return [SlotObj(i) for i in xrange(ITER_COUNT)]

if __name__ == '__main__':
    profile_dict_of_dict()
    profile_list_of_dict()
    profile_dict_of_obj()
    profile_list_of_obj()
    profile_dict_of_slotobj()
    profile_list_of_slotobj()

Results:

hbrown@hbrown-lpt:~$ python ~/Dropbox/src/StackOverflow/1336791.py 
profile_dict_of_dict ((), {}) 0:00:08.228094
profile_list_of_dict ((), {}) 0:00:06.040870
profile_dict_of_obj ((), {}) 0:00:11.481681
profile_list_of_obj ((), {}) 0:00:10.893125
profile_dict_of_slotobj ((), {}) 0:00:06.381897
profile_list_of_slotobj ((), {}) 0:00:05.860749

0 讨论(0)

1 2 下一页