问题
I have a list with another 3 lists in it. I need to do some operations on the lists, as i have like 180.000 of those the step of deepcopying already takes 2,5s to copy the lists once. The overall time including operations takes 80s out of 220s compution time.
s = [[1,2,3],
[4,5,6],
[7,8,9]]
s1 = copy.deepcopy(s)
s1[0][0] = s[1][1]
s1[1][1] = s[0][0]
The shown operation needs to be repeated like a million of times. So deepcopy makes me face a performance bottleneck.
Is there a more performant way or a different approach to "unreference" the list s
?
回答1:
deepcopy
seems to have some overhead for checking all those different cases it is able to handle. If your lists are always lists of lists (one level of nesting), then you can try to use just s = [list(x) for x in s]
or s = list(map(list, s))
. Both seem to be quite a bit faster:
In [9]: s = [[random.randint(1, 1000) for _ in range(100)] for _ in range(100)]
In [10]: %timeit copy.deepcopy(s)
10 loops, best of 3: 22.7 ms per loop
In [11]: %timeit [list(x) for x in s]
10000 loops, best of 3: 123 µs per loop
In [18]: %timeit list(map(list, s))
10000 loops, best of 3: 111 µs per loop
Alternatively, depending on your application, it might be better not to copy and store the (modified) lists themselves, but just the modification, either in the form of the modified cells, or as a command stack.
回答2:
Okay, so copy.deepcopy is implemented in python, I did a small test script for you question:
# deepcopy.py
import copy
import hotshot, hotshot.stats
import line_profiler
def prof():
all = []
for _ in range(180000):
all.append([
[1,2,3],
[1,2,3],
[1,2,3],
])
prof = hotshot.Profile("deepcopy.py.prof")
prof.start()
copy.deepcopy(all)
prof.stop()
prof.close()
stats = hotshot.stats.load('deepcopy.py.prof')
stats.strip_dirs()
stats.sort_stats('time', 'calls')
stats.print_stats(20)
def lineprof():
all = []
for _ in range(180000):
all.append([
[1,2,3],
[1,2,3],
[1,2,3],
])
prof = line_profiler.LineProfiler()
prof.add_function(copy.deepcopy)
prof.enable()
copy.deepcopy(all)
prof.disable()
prof.print_stats()
prof()
lineprof()
First I used hotshot to get a sense of what was going on, there doesn't seem much more than deepcopy itself, so I added line profile, here is the result:
3780009 function calls (720009 primitive calls) in 3.156 seconds
Ordered by: internal time, call count
ncalls tottime percall cumtime percall filename:lineno(function)
720001/1 1.483 0.000 3.156 3.156 copy.py:226(_deepcopy_list)
2340001/1 1.425 0.000 3.156 3.156 copy.py:145(deepcopy)
720004 0.248 0.000 0.248 0.000 copy.py:267(_keep_alive)
3 0.000 0.000 0.000 0.000 copy.py:198(_deepcopy_atomic)
0 0.000 0.000 profile:0(profiler)
Timer unit: 1e-06 s
Total time: 13.6727 s
File: /usr/lib64/python2.7/copy.py
Function: deepcopy at line 145
Line # Hits Time Per Hit % Time Line Contents
==============================================================
145 def deepcopy(x, memo=None, _nil=[]):
146 """Deep copy operation on arbitrary Python objects.
147
148 See the module's __doc__ string for more info.
149 """
150
151 2340001 1614687 0.7 11.8 if memo is None:
152 1 1 1.0 0.0 memo = {}
153
154 2340001 1725759 0.7 12.6 d = id(x)
155 2340001 2123792 0.9 15.5 y = memo.get(d, _nil)
156 2340001 1550940 0.7 11.3 if y is not _nil:
157 1619997 1047121 0.6 7.7 return y
158
159 720004 587838 0.8 4.3 cls = type(x)
160
161 720004 577271 0.8 4.2 copier = _deepcopy_dispatch.get(cls)
162 720004 476286 0.7 3.5 if copier:
163 720004 1813706 2.5 13.3 y = copier(x, memo)
164 else:
165 try:
166 issc = issubclass(cls, type)
167 except TypeError: # cls is not a class (old Boost; see SF #502085)
168 issc = 0
169 if issc:
170 y = _deepcopy_atomic(x, memo)
171 else:
172 copier = getattr(x, "__deepcopy__", None)
173 if copier:
174 y = copier(memo)
175 else:
176 reductor = dispatch_table.get(cls)
177 if reductor:
178 rv = reductor(x)
179 else:
180 reductor = getattr(x, "__reduce_ex__", None)
181 if reductor:
182 rv = reductor(2)
183 else:
184 reductor = getattr(x, "__reduce__", None)
185 if reductor:
186 rv = reductor()
187 else:
188 raise Error(
189 "un(deep)copyable object of type %s" % cls)
190 y = _reconstruct(x, rv, 1, memo)
191
192 720004 579181 0.8 4.2 memo[d] = y
193 720004 1115258 1.5 8.2 _keep_alive(x, memo) # Make sure x lives at least as long as d
194 720004 460834 0.6 3.4 return y
So the problem seems that the memoization is not helping and it is wasting way more time in bookkeeping instead of coping, so my suggestion is to exploit the fact that the lists are well defined and do the copy yourself:
import hotshot, hotshot.stats
import line_profiler
def manualcopy(original_list):
copy = []
for item in original_list:
copy.append(
[
list(item[0]),
list(item[1]),
list(item[2]),
]
)
return copy
def prof():
all = []
for _ in range(180000):
all.append([
[1,2,3],
[1,2,3],
[1,2,3],
])
prof = hotshot.Profile("deepcopy.py.prof")
prof.start()
manualcopy(all)
prof.stop()
prof.close()
stats = hotshot.stats.load('deepcopy.py.prof')
stats.strip_dirs()
stats.sort_stats('time', 'calls')
stats.print_stats(20)
def lineprof():
all = []
for _ in range(180000):
all.append([
[1,2,3],
[1,2,3],
[1,2,3],
])
prof = line_profiler.LineProfiler()
prof.add_function(manualcopy)
prof.enable()
manualcopy(all)
prof.disable()
prof.print_stats()
prof()
lineprof()
That will reduce from 3.156 seconds to 0.446 seconds
1 function calls in 0.446 seconds
Ordered by: internal time, call count
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.446 0.446 0.446 0.446 manualcopy.py:5(manualcopy)
0 0.000 0.000 profile:0(profiler)
Timer unit: 1e-06 s
Total time: 0.762817 s
File: manualcopy.py
Function: manualcopy at line 5
Line # Hits Time Per Hit % Time Line Contents
==============================================================
5 def manualcopy(original_list):
6 1 3 3.0 0.0 copy = []
7 180001 62622 0.3 8.2 for item in original_list:
8 180000 66805 0.4 8.8 copy.append(
9 [
10 180000 106683 0.6 14.0 list(item[0]),
11 180000 137308 0.8 18.0 list(item[1]),
12 180000 389396 2.2 51.0 list(item[2]),
13 ]
14 )
15 1 0 0.0 0.0 return copy
来源:https://stackoverflow.com/questions/35008473/avoid-deepcopy-due-to-performance