I have an array-like structure in a dataframe column as a string (I read the dataframe from a csv file).
One string element of
You can use ast.literal_eval
before passing to numpy.array
:
from ast import literal_eval
import numpy as np
x = '[(-0.0426, -0.7231, -0.4207), (0.2116, -0.1733, -0.1013)]'
res = np.array(literal_eval(x))
print(res)
array([[-0.0426, -0.7231, -0.4207],
[ 0.2116, -0.1733, -0.1013]])
You can do the equivalent with strings in a Pandas series, but it's not clear if you need to aggregate across rows. If this is the case, you can combine a list of NumPy arrays derived using the above logic.
The docs explain types acceptable to literal_eval:
Safely evaluate an expression node or a string containing a Python literal or container display. The string or node provided may only consist of the following Python literal structures: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, and
None
.
So we are effectively converting a string to a list of tuples, which np.array
can then convert to a NumPy array.