Avoid deepcopy due to performance

ε祈祈猫儿з 提交于 2020-05-29 05:09:30

问题


I have a list with another 3 lists in it. I need to do some operations on the lists, as i have like 180.000 of those the step of deepcopying already takes 2,5s to copy the lists once. The overall time including operations takes 80s out of 220s compution time.

s = [[1,2,3],
     [4,5,6],
     [7,8,9]]
s1 = copy.deepcopy(s)
s1[0][0] = s[1][1]
s1[1][1] = s[0][0]

The shown operation needs to be repeated like a million of times. So deepcopy makes me face a performance bottleneck.

Is there a more performant way or a different approach to "unreference" the list s?


回答1:


deepcopy seems to have some overhead for checking all those different cases it is able to handle. If your lists are always lists of lists (one level of nesting), then you can try to use just s = [list(x) for x in s] or s = list(map(list, s)). Both seem to be quite a bit faster:

In [9]: s = [[random.randint(1, 1000) for _ in range(100)] for _ in range(100)]
In [10]: %timeit copy.deepcopy(s)
10 loops, best of 3: 22.7 ms per loop
In [11]: %timeit [list(x) for x in s]
10000 loops, best of 3: 123 µs per loop
In [18]: %timeit list(map(list, s))
10000 loops, best of 3: 111 µs per loop

Alternatively, depending on your application, it might be better not to copy and store the (modified) lists themselves, but just the modification, either in the form of the modified cells, or as a command stack.




回答2:


Okay, so copy.deepcopy is implemented in python, I did a small test script for you question:

# deepcopy.py
import copy
import hotshot, hotshot.stats
import line_profiler


def prof():
    all = []
    for _ in range(180000):
        all.append([
            [1,2,3],
            [1,2,3],
            [1,2,3],
        ])

    prof = hotshot.Profile("deepcopy.py.prof")
    prof.start()
    copy.deepcopy(all)
    prof.stop()
    prof.close()
    stats = hotshot.stats.load('deepcopy.py.prof')
    stats.strip_dirs()
    stats.sort_stats('time', 'calls')
    stats.print_stats(20)

def lineprof():
    all = []
    for _ in range(180000):
        all.append([
            [1,2,3],
            [1,2,3],
            [1,2,3],
        ])

    prof = line_profiler.LineProfiler()
    prof.add_function(copy.deepcopy)
    prof.enable()
    copy.deepcopy(all)
    prof.disable()
    prof.print_stats()

prof()
lineprof()

First I used hotshot to get a sense of what was going on, there doesn't seem much more than deepcopy itself, so I added line profile, here is the result:

  3780009 function calls (720009 primitive calls) in 3.156 seconds

   Ordered by: internal time, call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 720001/1    1.483    0.000    3.156    3.156 copy.py:226(_deepcopy_list)
2340001/1    1.425    0.000    3.156    3.156 copy.py:145(deepcopy)
   720004    0.248    0.000    0.248    0.000 copy.py:267(_keep_alive)
        3    0.000    0.000    0.000    0.000 copy.py:198(_deepcopy_atomic)
        0    0.000             0.000          profile:0(profiler)


Timer unit: 1e-06 s

Total time: 13.6727 s
File: /usr/lib64/python2.7/copy.py
Function: deepcopy at line 145

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   145                                           def deepcopy(x, memo=None, _nil=[]):
   146                                               """Deep copy operation on arbitrary Python objects.
   147                                           
   148                                               See the module's __doc__ string for more info.
   149                                               """
   150                                           
   151   2340001      1614687      0.7     11.8      if memo is None:
   152         1            1      1.0      0.0          memo = {}
   153                                           
   154   2340001      1725759      0.7     12.6      d = id(x)
   155   2340001      2123792      0.9     15.5      y = memo.get(d, _nil)
   156   2340001      1550940      0.7     11.3      if y is not _nil:
   157   1619997      1047121      0.6      7.7          return y
   158                                           
   159    720004       587838      0.8      4.3      cls = type(x)
   160                                           
   161    720004       577271      0.8      4.2      copier = _deepcopy_dispatch.get(cls)
   162    720004       476286      0.7      3.5      if copier:
   163    720004      1813706      2.5     13.3          y = copier(x, memo)
   164                                               else:
   165                                                   try:
   166                                                       issc = issubclass(cls, type)
   167                                                   except TypeError: # cls is not a class (old Boost; see SF #502085)
   168                                                       issc = 0
   169                                                   if issc:
   170                                                       y = _deepcopy_atomic(x, memo)
   171                                                   else:
   172                                                       copier = getattr(x, "__deepcopy__", None)
   173                                                       if copier:
   174                                                           y = copier(memo)
   175                                                       else:
   176                                                           reductor = dispatch_table.get(cls)
   177                                                           if reductor:
   178                                                               rv = reductor(x)
   179                                                           else:
   180                                                               reductor = getattr(x, "__reduce_ex__", None)
   181                                                               if reductor:
   182                                                                   rv = reductor(2)
   183                                                               else:
   184                                                                   reductor = getattr(x, "__reduce__", None)
   185                                                                   if reductor:
   186                                                                       rv = reductor()
   187                                                                   else:
   188                                                                       raise Error(
   189                                                                           "un(deep)copyable object of type %s" % cls)
   190                                                           y = _reconstruct(x, rv, 1, memo)
   191                                           
   192    720004       579181      0.8      4.2      memo[d] = y
   193    720004      1115258      1.5      8.2      _keep_alive(x, memo) # Make sure x lives at least as long as d
   194    720004       460834      0.6      3.4      return y

So the problem seems that the memoization is not helping and it is wasting way more time in bookkeeping instead of coping, so my suggestion is to exploit the fact that the lists are well defined and do the copy yourself:

import hotshot, hotshot.stats
import line_profiler


def manualcopy(original_list):
    copy = []
    for item in original_list:
        copy.append(
            [
                list(item[0]),
                list(item[1]),
                list(item[2]),
            ]
        )
    return copy


def prof():
    all = []
    for _ in range(180000):
        all.append([
            [1,2,3],
            [1,2,3],
            [1,2,3],
        ])

    prof = hotshot.Profile("deepcopy.py.prof")
    prof.start()
    manualcopy(all)
    prof.stop()
    prof.close()
    stats = hotshot.stats.load('deepcopy.py.prof')
    stats.strip_dirs()
    stats.sort_stats('time', 'calls')
    stats.print_stats(20)

def lineprof():
    all = []
    for _ in range(180000):
        all.append([
            [1,2,3],
            [1,2,3],
            [1,2,3],
        ])

    prof = line_profiler.LineProfiler()
    prof.add_function(manualcopy)
    prof.enable()
    manualcopy(all)
    prof.disable()
    prof.print_stats()

prof()
lineprof()

That will reduce from 3.156 seconds to 0.446 seconds

         1 function calls in 0.446 seconds

   Ordered by: internal time, call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.446    0.446    0.446    0.446 manualcopy.py:5(manualcopy)
        0    0.000             0.000          profile:0(profiler)


Timer unit: 1e-06 s

Total time: 0.762817 s
File: manualcopy.py
Function: manualcopy at line 5

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     5                                           def manualcopy(original_list):
     6         1            3      3.0      0.0      copy = []
     7    180001        62622      0.3      8.2      for item in original_list:
     8    180000        66805      0.4      8.8          copy.append(
     9                                                       [
    10    180000       106683      0.6     14.0                  list(item[0]),
    11    180000       137308      0.8     18.0                  list(item[1]),
    12    180000       389396      2.2     51.0                  list(item[2]),
    13                                                       ]
    14                                                   )
    15         1            0      0.0      0.0      return copy


来源:https://stackoverflow.com/questions/35008473/avoid-deepcopy-due-to-performance

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!