Say I have the following array:
import numpy as np
a = [\'hello\',\'snake\',\'plate\']
I want this to turn into a numpy array b
Actually, you can do this without any copies or list comprehensions in numpy (caveats about non-equal-length strings aside...). Just view it as a 1 character string array and reshape it:
import numpy as np
x = np.array(['hello','snake','plate'], dtype=str)
y = x.view('S1').reshape((x.size, -1))
print repr(y)
This yields:
array([['h', 'e', 'l', 'l', 'o'],
['s', 'n', 'a', 'k', 'e'],
['p', 'l', 'a', 't', 'e']],
dtype='|S1')
Generally speaking, though, I'd avoid using numpy arrays to store strings in most cases. There are cases where it's useful, but you're usually better off sticking to data structures that allow variable-length strings for, well, holding strings.
You can create a numpy character array directly e.g.:
b = np.array([ ['h','e','l','l','o'],['s','n','a','k','e'],['p','l','a','t','e'] ])
The usual array tricks work with this.
If you have a
and wish to generate b from it, note that:
list('hello') == ['h','e','l','l','o']
So you can do something like:
b = np.array([ list(word) for word in a ])
However, if a
has words of unequal length (e.g. ['snakes','on','a','plane']
), what do you want to do with the shorter words? You could pad them with spaces to the longest word:
wid = max(len(w) for w in a)
b = np.array([ list(w.center(wid)) for w in a])
Which the string.center(width)
pads with spaces, centering the string. You could also use rjust
or ljust
(see string docs).
Specify the string length as the shape parameter with unicode 1 char
> string_array = ['..##.#..#.', '##..#.....', '#...##..#.', '####.#...#', '##.##.###.', '##...#.###', '.#.#.#..##', '..#....#..', '###...#.#.', '..###..###']
> nummpy.array(string_array,dtype=('U1',10))
array([['.', '.', '#', '#', '.', '#', '.', '.', '#', '.'],
['#', '#', '.', '.', '#', '.', '.', '.', '.', '.'],
['#', '.', '.', '.', '#', '#', '.', '.', '#', '.'],
['#', '#', '#', '#', '.', '#', '.', '.', '.', '#'],
['#', '#', '.', '#', '#', '.', '#', '#', '#', '.'],
['#', '#', '.', '.', '.', '#', '.', '#', '#', '#'],
['.', '#', '.', '#', '.', '#', '.', '.', '#', '#'],
['.', '.', '#', '.', '.', '.', '.', '#', '.', '.'],
['#', '#', '#', '.', '.', '.', '#', '.', '#', '.'],
['.', '.', '#', '#', '#', '.', '.', '#', '#', '#']], dtype='<U1')