If I create a recarray in this way:
In [29]: np.rec.fromrecords([(1,\'hello\'),(2,\'world\')],names=[\'a\',\'b\'])
The result looks fine:
I don't know how to ask numpy to determine for you some aspects of a dtype but not others, but couldn't you have, e.g.:
data = [(1,'hello'),(2,'world')]
dlen = max(len(s) for i, s in data)
st = '|S%d' % dlen
np.rec.fromrecords(data, dtype=[('a',np.int8), ('b',st)])
If you don't need to manipulate the strings as bytes, you may use the object data-type to represent them. This essentially stores a pointer instead of the actual bytes:
In [38]: np.array(data, dtype=[('a', np.uint8), ('b', np.object)])
Out[38]:
array([(1, 'hello'), (2, 'world')],
dtype=[('a', '|u1'), ('b', '|O8')])
Alternatively, Alex's idea would work well:
new_dt = []
# For each field of a given type and alignment, determine
# whether the field is an integer. If so, represent it as a byte.
for f, (T, align) in dt.fields.iteritems():
if np.issubdtype(T, int):
new_dt.append((f, np.uint8))
else:
new_dt.append((f, T))
new_dt = np.dtype(new_dt)
np.array(data, dtype=new_dt)
which should yield
array([(1, 'hello'), (2, 'world')],
dtype=[('f0', '|u1'), ('f1', '|S5')])