I am using Python 3.2.1 and I can\'t import the StringIO
module. I use
io.StringIO
and it works, but I can\'t use it with numpy
\'s
Roman Shapovalov's code should work in Python 3.x as well as Python 2.6/2.7. Here it is again with the complete example:
import io
import numpy
x = "1 3\n 4.5 8"
numpy.genfromtxt(io.BytesIO(x.encode()))
Output:
array([[ 1. , 3. ],
[ 4.5, 8. ]])
Explanation for Python 3.x:
numpy.genfromtxt
takes a byte stream (a file-like object interpreted as bytes instead of Unicode).io.BytesIO
takes a byte string and returns a byte stream. io.StringIO
, on the other hand, would take a Unicode string and and return a Unicode stream.x
gets assigned a string literal, which in Python 3.x is a Unicode string.encode()
takes the Unicode string x
and makes a byte string out of it, thus giving io.BytesIO
a valid argument.The only difference for Python 2.6/2.7 is that x
is a byte string (assuming from __future__ import unicode_literals
is not used), and then encode()
takes the byte string x
and still makes the same byte string out of it. So the result is the same.
Since this is one of SO's most popular questions regarding StringIO
, here's some more explanation on the import statements and different Python versions.
Here are the classes which take a string and return a stream:
StringIO.StringIO
, but can't take Unicode strings which contain non-ASCII characters.Note that StringIO.StringIO
is imported as from StringIO import StringIO
, then used as StringIO(...)
. Either that, or you do import StringIO
and then use StringIO.StringIO(...)
. The module name and class name just happen to be the same. It's similar to datetime
that way.
What to use, depending on your supported Python versions:
If you only support Python 3.x: Just use io.BytesIO
or io.StringIO
depending on what kind of data you're working with.
If you support both Python 2.6/2.7 and 3.x, or are trying to transition your code from 2.6/2.7 to 3.x: The easiest option is still to use io.BytesIO
or io.StringIO
. Although StringIO.StringIO
is flexible and thus seems preferred for 2.6/2.7, that flexibility could mask bugs that will manifest in 3.x. For example, I had some code which used StringIO.StringIO
or io.StringIO
depending on Python version, but I was actually passing a byte string, so when I got around to testing it in Python 3.x it failed and had to be fixed.
Another advantage of using io.StringIO
is the support for universal newlines. If you pass the keyword argument newline=''
into io.StringIO
, it will be able to split lines on any of \n
, \r\n
, or \r
. I found that StringIO.StringIO
would trip up on \r
in particular.
Note that if you import BytesIO
or StringIO
from six, you get StringIO.StringIO
in Python 2.x and the appropriate class from io
in Python 3.x. If you agree with my previous paragraphs' assessment, this is actually one case where you should avoid six
and just import from io
instead.
If you support Python 2.5 or lower and 3.x: You'll need StringIO.StringIO
for 2.5 or lower, so you might as well use six
. But realize that it's generally very difficult to support both 2.5 and 3.x, so you should consider bumping your lowest supported version to 2.6 if at all possible.
Thank you OP for your question, and Roman for your answer. I had to search a bit to find this; I hope the following helps others.
Python 2.7
See: https://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html
import numpy as np
from StringIO import StringIO
data = "1, abc , 2\n 3, xxx, 4"
print type(data)
"""
<type 'str'>
"""
print '\n', np.genfromtxt(StringIO(data), delimiter=",", dtype="|S3", autostrip=True)
"""
[['1' 'abc' '2']
['3' 'xxx' '4']]
"""
print '\n', type(data)
"""
<type 'str'>
"""
print '\n', np.genfromtxt(StringIO(data), delimiter=",", autostrip=True)
"""
[[ 1. nan 2.]
[ 3. nan 4.]]
"""
Python 3.5:
import numpy as np
from io import StringIO
import io
data = "1, abc , 2\n 3, xxx, 4"
#print(data)
"""
1, abc , 2
3, xxx, 4
"""
#print(type(data))
"""
<class 'str'>
"""
#np.genfromtxt(StringIO(data), delimiter=",", autostrip=True)
# TypeError: Can't convert 'bytes' object to str implicitly
print('\n')
print(np.genfromtxt(io.BytesIO(data.encode()), delimiter=",", dtype="|S3", autostrip=True))
"""
[[b'1' b'abc' b'2']
[b'3' b'xxx' b'4']]
"""
print('\n')
print(np.genfromtxt(io.BytesIO(data.encode()), delimiter=",", autostrip=True))
"""
[[ 1. nan 2.]
[ 3. nan 4.]]
"""
Aside:
dtype="|Sx", where x = any of { 1, 2, 3, ...}:
dtypes. Difference between S1 and S2 in Python
"The |S1 and |S2 strings are data type descriptors; the first means the array holds strings of length 1, the second of length 2. ..."