StringIO example does not work

后端 未结 2 1764
Happy的楠姐
Happy的楠姐 2021-01-21 08:37

I try to understand how works numpy.getfromtxt method and io.StringIO. On the officical website(https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/num

相关标签:
2条回答
  • 2021-01-21 08:41
    In [200]: np.__version__
    Out[200]: '1.14.0'
    

    The example works for me:

    In [201]: s = io.StringIO("1,1.3,abcde")
    In [202]: np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),
         ...: ... ('mystring','S5')], delimiter=",")
    Out[202]: 
    array((1, 1.3, b'abcde'),
          dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', 'S5')])
    

    It also works for a byte string:

    In [204]: s = io.BytesIO(b"1,1.3,abcde")
    In [205]: np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),
         ...: ... ('mystring','S5')], delimiter=",")
    Out[205]: 
    array((1, 1.3, b'abcde'),
          dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', 'S5')])
    

    genfromtxt works with anything that feeds it lines, so I usually use a list of bytestrings directly (when testing questions):

    In [206]: s = [b"1,1.3,abcde"]
    In [207]: np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),
         ...: ... ('mystring','S5')], delimiter=",")
    Out[207]: 
    array((1, 1.3, b'abcde'),
          dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', 'S5')])
    

    Or with several lines

    In [208]: s = b"""1,1.3,abcde
         ...: 4,1.3,two""".splitlines()
    In [209]: s
    Out[209]: [b'1,1.3,abcde', b'4,1.3,two']
    In [210]: np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),
         ...: ... ('mystring','S5')], delimiter=",")
    Out[210]: 
    array([(1, 1.3, b'abcde'), (4, 1.3, b'two')],
          dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', 'S5')])
    

    It used to be that with dtype=None, genfromtxt created S strings.

    NumPy dtype issues in genfromtxt(), reads string in as bytestring

    With 1.14, we can control the default string dtype:

    In [219]: s = io.StringIO("1,1.3,abcde")
    In [220]: np.genfromtxt(s, dtype=None, delimiter=",")
    /usr/local/bin/ipython3:1: VisibleDeprecationWarning: Reading unicode strings without specifying the encoding argument is deprecated. Set the encoding, use None for the system default.
      #!/usr/bin/python3
    Out[220]: 
    array((1, 1.3, b'abcde'),
          dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', 'S5')])
    In [221]: s = io.StringIO("1,1.3,abcde")
    In [222]: np.genfromtxt(s, dtype=None, delimiter=",",encoding=None)
    Out[222]: 
    array((1, 1.3, 'abcde'),
          dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', '<U5')])
    

    https://docs.scipy.org/doc/numpy/release.html#encoding-argument-for-text-io-functions

    Now I can generate examples with Py3 strings without producing all those ugly b'string' results (but got to remember that not everyone has upgraded to 1.14):

    In [223]: s = """1,1.3,abcde
         ...: 4,1.3,two""".splitlines()
    In [224]: np.genfromtxt(s, dtype=None, delimiter=",",encoding=None)
    Out[224]: 
    array([(1, 1.3, 'abcde'), (4, 1.3, 'two')],
          dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', '<U5')])
    
    0 讨论(0)
  • 2021-01-21 08:53

    Consider upgrading numpy because for the current version of numpy, your code just works as written. See the mention in 1.14.0 release note highlights and the section Encoding argument for text IO functions for the relevant changes in np.genfromtxt.

    For older numpy, you use a string object for the input but the docs you linked say:

    Note that generators must return byte strings in Python 3k. 
    

    So do what the docs say and give it a byte string:

    import io
    s = io.BytesIO(b"1,1.3,abcde")
    
    0 讨论(0)
提交回复
热议问题