Reading a binary file with python

后端未结

关注

 6  1446

I find particularly difficult reading binary file with Python. Can you give me a hand? I need to read this file, which in Fortran 90 is easily read by

int*4


                      
              相关标签:


      
      
        
          6条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  暖寄归人        
                
              
                            
                2020-11-27 12:50
              
            
            
                                                                       
To read a binary file to a bytes object:

from pathlib import Path
data = Path('/path/to/file').read_bytes()  # Python 3.5+


To create an int from bytes 0-3 of the data:

i = int.from_bytes(data[:4], byteorder='little', signed=False)


To unpack multiple ints from the data:

import struct
ints = struct.unpack('iiii', data[:16])



pathlib
int.from_bytes()
struct

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  长发绾君心        
                
              
                            
                2020-11-27 12:55
              
            
            
                                                                       
Read the binary file content like this:

with open(fileName, mode='rb') as file: # b is important -> binary
    fileContent = file.read()


then "unpack" binary data using struct.unpack:

The start bytes: struct.unpack("iiiii", fileContent[:20])

The body: ignore the heading bytes and the trailing byte (= 24); The remaining part forms the body, to know the number of bytes in the body do an integer division by 4; The obtained quotient is multiplied by the string 'i' to create the correct format for the unpack method:

struct.unpack("i" * ((len(fileContent) -24) // 4), fileContent[20:-4])


The end byte: struct.unpack("i", fileContent[-4:])
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  北恋        
                
              
                            
                2020-11-27 13:02
              
            
            
                                                                       
In general, I would recommend that you look into using Python's struct module for this. It's standard with Python, and it should be easy to translate your question's specification into a formatting string suitable for struct.unpack().

Do note that if there's "invisible" padding between/around the fields, you will need to figure that out and include it in the unpack() call, or you will read the wrong bits.

Reading the contents of the file in order to have something to unpack is pretty trivial:

import struct

data = open("from_fortran.bin", "rb").read()

(eight, N) = struct.unpack("@II", data)


This unpacks the first two fields, assuming they start at the very beginning of the file (no padding or extraneous data), and also assuming native byte-order (the @ symbol). The Is in the formatting string mean "unsigned integer, 32 bits".
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  醉话见心        
                
              
                            
                2020-11-27 13:02
              
            
            
                                                                       
import pickle
f=open("filename.dat","rb")
try:
    while True:
        x=pickle.load(f)
        print x
except EOFError:
    pass
f.close()

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  小鲜肉        
                
              
                            
                2020-11-27 13:06
              
            
            
                                                                       
I too found Python lacking when it comes to reading and writing binary files, so I wrote a small module (for Python 3.6+).
With binaryfile you'd do something like this (I'm guessing, since I don't know Fortran):
import binaryfile

def particle_file(f):
    f.array('group_ids')  # Declare group_ids to be an array (so we can use it in a loop)
    f.skip(4)  # Bytes 1-4
    num_particles = f.count('num_particles', 'group_ids', 4)  # Bytes 5-8
    f.int('num_groups', 4)  # Bytes 9-12
    f.skip(8)  # Bytes 13-20
    for i in range(num_particles):
        f.struct('group_ids', '>f')  # 4 bytes x num_particles
    f.skip(4)

with open('myfile.bin', 'rb') as fh:
    result = binaryfile.read(fh, particle_file)
print(result)

Which produces an output like this:
{
    'group_ids': [(1.0,), (0.0,), (2.0,), (0.0,), (1.0,)],
    '__skipped': [b'\x00\x00\x00\x08', b'\x00\x00\x00\x08\x00\x00\x00\x14', b'\x00\x00\x00\x14'],
    'num_particles': 5,
    'num_groups': 3
}

I used skip() to skip the additional data Fortran adds, but you may want to add a utility to handle Fortran records properly instead. If you do, a pull request would be welcome.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  渐次进展        
                
              
                            
                2020-11-27 13:17
              
            
            
                                                                       
You could use numpy.fromfile, which can read data from both text and binary files. You would first construct a data type, which represents your file format, using numpy.dtype, and then read this type from file using numpy.fromfile.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复