I try to understand how works numpy.getfromtxt method and io.StringIO. On the officical website(https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/num
In [200]: np.__version__
Out[200]: '1.14.0'
The example works for me:
In [201]: s = io.StringIO("1,1.3,abcde")
In [202]: np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),
...: ... ('mystring','S5')], delimiter=",")
Out[202]:
array((1, 1.3, b'abcde'),
dtype=[('myint', '
It also works for a byte string:
In [204]: s = io.BytesIO(b"1,1.3,abcde")
In [205]: np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),
...: ... ('mystring','S5')], delimiter=",")
Out[205]:
array((1, 1.3, b'abcde'),
dtype=[('myint', '
genfromtxt
works with anything that feeds it lines, so I usually use a list of bytestrings directly (when testing questions):
In [206]: s = [b"1,1.3,abcde"]
In [207]: np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),
...: ... ('mystring','S5')], delimiter=",")
Out[207]:
array((1, 1.3, b'abcde'),
dtype=[('myint', '
Or with several lines
In [208]: s = b"""1,1.3,abcde
...: 4,1.3,two""".splitlines()
In [209]: s
Out[209]: [b'1,1.3,abcde', b'4,1.3,two']
In [210]: np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),
...: ... ('mystring','S5')], delimiter=",")
Out[210]:
array([(1, 1.3, b'abcde'), (4, 1.3, b'two')],
dtype=[('myint', '
It used to be that with dtype=None
, genfromtxt
created S
strings.
NumPy dtype issues in genfromtxt(), reads string in as bytestring
With 1.14, we can control the default string dtype:
In [219]: s = io.StringIO("1,1.3,abcde")
In [220]: np.genfromtxt(s, dtype=None, delimiter=",")
/usr/local/bin/ipython3:1: VisibleDeprecationWarning: Reading unicode strings without specifying the encoding argument is deprecated. Set the encoding, use None for the system default.
#!/usr/bin/python3
Out[220]:
array((1, 1.3, b'abcde'),
dtype=[('f0', '
https://docs.scipy.org/doc/numpy/release.html#encoding-argument-for-text-io-functions
Now I can generate examples with Py3 strings without producing all those ugly b'string'
results (but got to remember that not everyone has upgraded to 1.14):
In [223]: s = """1,1.3,abcde
...: 4,1.3,two""".splitlines()
In [224]: np.genfromtxt(s, dtype=None, delimiter=",",encoding=None)
Out[224]:
array([(1, 1.3, 'abcde'), (4, 1.3, 'two')],
dtype=[('f0', '