问题
I write below code to test cache feature of numba
import numba
import numpy as np
import time
@numba.njit(cache=True)
def sum2d(arr):
M, N = arr.shape
result = 0.0
for i in range(M):
for j in range(N):
result += arr[i,j]
return result
a=np.random.random((1000,100))
print(time.time())
sum2d(a)
print(time.time())
print(time.time())
sum2d(a)
print(time.time())
Though, there are some cache files generated in pycache folder, the timing is always the same like
1576855294.8787484
1576855295.5378428
1576855295.5378428
1576855295.5388253
no matter how many times I run this script, which means that first run of sum2d
takes much more time to compile. Then what is usage of cache file in pycache folder?
回答1:
The following script illustrates the point of cache=True
. It first calls a non-cached dummy
function that absorbs the time it takes to initialize numba
. Then it proceeds with calling twice the sum2d
function with no cache and twice the sum2d
function with cache.
import numba
import numpy as np
import time
@numba.njit
def dummy():
return None
@numba.njit
def sum2d_nocache(arr):
M, N = arr.shape
result = 0.0
for i in range(M):
for j in range(N):
result += arr[i,j]
return result
@numba.njit(cache=True)
def sum2d_cache(arr):
M, N = arr.shape
result = 0.0
for i in range(M):
for j in range(N):
result += arr[i,j]
return result
start = time.time()
dummy()
end = time.time()
print(f'Dummy timing {end - start}')
a=np.random.random((1000,100))
start = time.time()
sum2d_nocache(a)
end = time.time()
print(f'No cache 1st timing {end - start}')
a=np.random.random((1000,100))
start = time.time()
sum2d_nocache(a)
end = time.time()
print(f'No cache 2nd timing {end - start}')
a=np.random.random((1000,100))
start = time.time()
sum2d_cache(a)
end = time.time()
print(f'Cache 1st timing {end - start}')
a=np.random.random((1000,100))
start = time.time()
sum2d_cache(a)
end = time.time()
print(f'Cache 2nd timing {end - start}')
Output after 1st run:
Dummy timing 0.10361385345458984
No cache 1st timing 0.08893513679504395
No cache 2nd timing 0.00020122528076171875
Cache 1st timing 0.08929300308227539
Cache 2nd timing 0.00015544891357421875
Output after 2nd run:
Dummy timing 0.08973526954650879
No cache 1st timing 0.0809786319732666
No cache 2nd timing 0.0001163482666015625
Cache 1st timing 0.0016787052154541016
Cache 2nd timing 0.0001163482666015625
What does this output tells us?
- The time to initialize
numba
is not negligible. - During the first run, the first call of the cache and non-cache version take longer due to compilation time.
- In this example, the creation of the cache file doesn't make much of a difference.
- In the second run, the first call to the cache function is much faster (this is what
cache=True
is for) - The subsequent calls to the cache and non-cache functions take approximately the same time.
The point of using cache=True
is to avoid repeating the compile time of large and complex functions at each run of a script. In this example the function is simple and the time saving is limited but for a script with a number of more complex functions, using cache can significantly reduce the run-time.
来源:https://stackoverflow.com/questions/59427775/numba-cache-true-has-no-effect