I have a csv, struct is
CAT1,CAT2,TITLE,URL,CONTENT
, CAT1, CAT2, TITLE ,CONTENT are in chinese.
I want train LinearSVC
or Multinomial
Thanks to @meelo, I solved this problem.
As he said: in my code, data
is a feature vector, target
is target value. I mixed up two things.
I learned that TfidfVectorizer
processes data to [data, feature], and each data should map to just one target.
If I want to predict two type targets, I need two distinct targets:
target_C1
with all C1 valuetarget_C2
with all C2 value.Then use the two targets and original data to train two classifier for each target.
I had the same issue.
So if you are facing the same problem you should check the shape of clf.fit(X,y)
parameters:
X : Training vector {array-like, sparse matrix}, shape (n_samples, n_features).
y : Target vector relative to X array-like, shape (n_samples,).
as you can see the y width should be 1, to make sure your target vector is shaped correctly try command
y.shape
should be (n_samples,)
In my case, for my training vector I was concatenating 3 separate vectors from 3 different vectorizers to use all as my final training vector.
The problem was that each vector had the ['Label']
column in it so the final training vector contained 3 ['Label']
columns.
Then when I used final_trainingVect['Label']
as my Target vector it's shape was n_samples,3).