问题
I have DataFrame called data with 477154 rows.
PDB_ID Chain Sequence Secstr
0 101M A GEWQLVLHVWAKVEA | HHHH HHHHGG|
1 102L A MVLSEGEWKVEA |HHHH HHHHHH|
2 102M A MVLSEGEWQLVLHVWAKVEA |HHHHHHHHHGGHH HHH |
3 103L A MVLSEGEWQLVLHVWAKV | HHHHH HHHHHH HH|
4 103L B MVLSEGEWQLVLHVWAKVEAVAL | HHHHH HHHHHH HHHHH |
My goal is to get each character one by one from columns: 'Sequence' and 'Secstr' to arrays and make it usable for classification.
Every row has different number of elements. I tried to do it in manual way by creating an alphabet = " ABCDEFGHIKLMNOPQRSTUVWXYZ"
then convert letters to [12, 21, 11, 18, 5, 7, 5, 22, 16, 11, 21, 11, 8, 21, 22]
After this I created numpy.ndarray
X_array = np.array([np.array(xi) for xi in new_encoded_seq])
y_array = np.array([np.array(xi) for xi in new_encoded_str])
When I did this I couldn't use it to build model because of an error: TypeError: only size-1 arrays can be converted to Python scalars and ValueError: setting an array element with a sequence while using
X = X_array
y = y_array
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
model = DecisionTreeClassifier()
model = model.fit(X_train,y_train)
y_pred = model.predict(X_test)
来源:https://stackoverflow.com/questions/64703484/size-1-array-error-when-preparing-decision-model