Recording data in a long running python simulation

前端未结

关注

 2  413

I am running a simulation from which I need to record some small numpy arrays every cycle. My current solution is to load, write then save as follows:

existing_d


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  孤城傲影        
                
              
                            
                2021-01-29 05:19
              
            
            
                                                                       
I think one solution is using a memory mapped file through numpy.memmap. The code can be found below. The documentation contains important information to understand the code.

import numpy as np
from os.path import getsize

from time import time

filename = "data.bin"

# Datatype used for memmap
dtype = np.int32

# Create memmap for the first time (w+). Arbitrary shape. Probably good to try and guess the correct size.
mm = np.memmap(filename, dtype=dtype, mode='w+', shape=(1, ))
print("File has {} bytes".format(getsize(filename)))


N = 20
num_data_per_loop = 10**7

# Main loop to append data
for i in range(N):

    # will extend the file because mode='r+'
    starttime = time()
    mm = np.memmap(filename,
                   dtype=dtype,
                   mode='r+',
                   offset=np.dtype(dtype).itemsize*num_data_per_loop*i,
                   shape=(num_data_per_loop, ))
    mm[:] = np.arange(start=num_data_per_loop*i, stop=num_data_per_loop*(i+1))
    mm.flush()
    endtime = time()
    print("{:3d}/{:3d} ({:6.4f} sec): File has {} bytes".format(i, N, endtime-starttime, getsize(filename)))

A = np.array(np.memmap(filename, dtype=dtype, mode='r'))
if np.array_equal(A, np.arange(num_data_per_loop*N, dtype=dtype)):
    print("Correct")


The output I get is:

File has 4 bytes
  0/ 20 (0.2167 sec): File has 40000000 bytes
  1/ 20 (0.2200 sec): File has 80000000 bytes
  2/ 20 (0.2131 sec): File has 120000000 bytes
  3/ 20 (0.2180 sec): File has 160000000 bytes
  4/ 20 (0.2215 sec): File has 200000000 bytes
  5/ 20 (0.2141 sec): File has 240000000 bytes
  6/ 20 (0.2187 sec): File has 280000000 bytes
  7/ 20 (0.2138 sec): File has 320000000 bytes
  8/ 20 (0.2137 sec): File has 360000000 bytes
  9/ 20 (0.2227 sec): File has 400000000 bytes
 10/ 20 (0.2168 sec): File has 440000000 bytes
 11/ 20 (0.2141 sec): File has 480000000 bytes
 12/ 20 (0.2150 sec): File has 520000000 bytes
 13/ 20 (0.2144 sec): File has 560000000 bytes
 14/ 20 (0.2190 sec): File has 600000000 bytes
 15/ 20 (0.2186 sec): File has 640000000 bytes
 16/ 20 (0.2210 sec): File has 680000000 bytes
 17/ 20 (0.2146 sec): File has 720000000 bytes
 18/ 20 (0.2178 sec): File has 760000000 bytes
 19/ 20 (0.2182 sec): File has 800000000 bytes
Correct


The time is approximately constant over the iterations because of the offsets used for memmap. Also the amount of RAM needed (apart from loading the whole memmap for the check at the end) is constant.

I hope this solves your performance issues

kind regards

Lukas

Edit 1: It seems the poster has solved his own question. I leave this answer up as an alternative.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  说谎        
                
              
                            
                2021-01-29 05:31
              
            
            
                                                                       
I have found a good working solution using the h5py library. Performance is far better as there is no reading data and I have cut down on the number of nump array append operations. A short example:

with h5py.File("logfile_name", "a") as f:
  ds = f.create_dataset("weights", shape=(3,2,100000), maxshape=(3, 2, None))
  ds[:,:,cycle_num] = weight_matrix


I am not sure if the numpy style slicing means the matrix gets copied but there is a write_direct(source, source_sel=None, dest_sel=None) function to avoid this happening which could be useful for larger matrices. 
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复