Size-1 array error when preparing decision model

问题

I have DataFrame called data with 477154 rows.

    PDB_ID  Chain          Sequence                  Secstr
0   101M     A       GEWQLVLHVWAKVEA         |   HHHH  HHHHGG|
1   102L     A       MVLSEGEWKVEA            |HHHH  HHHHHH|
2   102M     A       MVLSEGEWQLVLHVWAKVEA    |HHHHHHHHHGGHH HHH   | 
3   103L     A       MVLSEGEWQLVLHVWAKV      |   HHHHH HHHHHH HH| 
4   103L     B       MVLSEGEWQLVLHVWAKVEAVAL |   HHHHH HHHHHH HHHHH  |

My goal is to get each character one by one from columns: 'Sequence' and 'Secstr' to arrays and make it usable for classification. Every row has different number of elements. I tried to do it in manual way by creating an alphabet = " ABCDEFGHIKLMNOPQRSTUVWXYZ" then convert letters to [12, 21, 11, 18, 5, 7, 5, 22, 16, 11, 21, 11, 8, 21, 22]

After this I created numpy.ndarray

X_array = np.array([np.array(xi) for xi in new_encoded_seq])
y_array = np.array([np.array(xi) for xi in new_encoded_str])

When I did this I couldn't use it to build model because of an error: TypeError: only size-1 arrays can be converted to Python scalars and ValueError: setting an array element with a sequence while using

X = X_array
y = y_array
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
model = DecisionTreeClassifier()
model = model.fit(X_train,y_train)
y_pred = model.predict(X_test)

来源：https://stackoverflow.com/questions/64703484/size-1-array-error-when-preparing-decision-model

标签

python

arrays

numpy

dataframe

classification

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!