I have a dictionary of file header values (time, number of frames, year, month, etc) that I would like to write into a numpy array. The code I have currently is as follows:
The problem seems to be that v
is an int
rather than a tuple
. Try:
arr=np.array([(k,v) for k,v in fileheader.iteritems()],dtype=["a3,a,i4,i4,i4,i4,f8,i4,i4,i4,i4,i4,i4,a10,a26,a33,a235,i4,i4,i4,i4,i4,i4"])
You're probably better off just keeping the header data in dict. Do you really need it as an array? (If so, why? There are some advantages of having the header in a numpy array, but it's more complex than a simple dict
, and isn't as flexible.)
One drawback to a dict
is that there's no predictable order to its keys. If you need to write your header back to disk in a regular order (similar to a C struct), then you need to separately store the order of the fields, as well as their values. If that's the case, you might consider an ordered dict (collections.OrderedDict
) or just putting together a simple class to hold your header data and storing the order there.
Unless there's a good reason to put it into an numpy array, you may not want to.
However, a structured array will preserve the order of your header and will make it easier to write a binary representation of it to disk, but it's inflexible in other ways.
If you did want to make the header an array, you'd do something like this:
import numpy as np
# Lists can be modified, but preserve order. That's important in this case.
names = ['Name1', 'Name2', 'Name3']
# It's "S3" instead of "a3" for a string field in numpy, by the way
formats = ['S3', 'i4', 'f8']
# It's often cleaner to specify the dtype this way instead of as a giant string
dtype = dict(names=names, formats=formats)
# This won't preserve the order we're specifying things in!!
# If we iterate through it, things may be in any order.
header = dict(Name1='abc', Name2=456, Name3=3.45)
# Therefore, we'll be sure to pass things in in order...
# Also, np.array will expect a tuple instead of a list for a structured array...
values = tuple(header[name] for name in names)
header_array = np.array(values, dtype=dtype)
# We can access field in the array like this...
print header_array['Name2']
# And dump it to disk (similar to a C struct) with
header_array.tofile('test.dat')
On the other hand, if you just want access to the values in the header, just keep it as a dict
. It's simpler that way.
Based on what it sounds like you're doing, I'd do something like this. I'm using numpy arrays to read in the header, but the header values are actually being stored as class attributes (as well as the header array).
This looks more complicated than it actually is.
I'm just defining two new classes, one for the parent file and one for a frame. You could do the same thing with a bit less code, but this gives you a foundation for more complex things.
import numpy as np
class SonarFile(object):
# These define the format of the file header
header_fields = ('num_frames', 'name1', 'name2', 'name3')
header_formats = ('i4', 'f4', 'S10', '>I4')
def __init__(self, filename):
self.infile = open(filename, 'r')
dtype = dict(names=self.header_fields, formats=self.header_formats)
# Read in the header as a numpy array (count=1 is important here!)
self.header = np.fromfile(self.infile, dtype=dtype, count=1)
# Store the position so we can "rewind" to the end of the header
self.header_length = self.infile.tell()
# You may or may not want to do this (If the field names can have
# spaces, it's a bad idea). It will allow you to access things with
# sonar_file.Name1 instead of sonar_file.header['Name1'], though.
for field in self.header_fields:
setattr(self, field, self.header[field])
# __iter__ is a special function that defines what should happen when we
# try to iterate through an instance of this class.
def __iter__(self):
"""Iterate through each frame in the dataset."""
# Rewind to the end of the file header
self.infile.seek(self.header_length)
# Iterate through frames...
for _ in range(self.num_frames):
yield Frame(self.infile)
def close(self):
self.infile.close()
class Frame(object):
header_fields = ('width', 'height', 'name')
header_formats = ('i4', 'i4', 'S20')
data_format = 'f4'
def __init__(self, infile):
dtype = dict(names=self.header_fields, formats=self.header_formats)
self.header = np.fromfile(infile, dtype=dtype, count=1)
# See discussion above...
for field in self.header_fields:
setattr(self, field, self.header[field])
# I'm assuming that the size of the frame is in the frame header...
ncols, nrows = self.width, self.height
# Read the data in
self.data = np.fromfile(infile, self.data_format, count=ncols * nrows)
# And reshape it into a 2d array.
# I'm assuming C-order, instead of Fortran order.
# If it's fortran order, just do "data.reshape((ncols, nrows)).T"
self.data = self.data.reshape((nrows, ncols))
You'd use it similar to this:
dataset = SonarFile('input.dat')
for frame in dataset:
im = frame.data
# Do something...