Apple turicreate always return the same label

问题

I'm test-driving turicreate, to resolve a classification issue, in which data consists of 10-uples (q,w,e,r,t,y,u,i,o,p,label), where 'q..p' is a sequence of characters (for now of 2 types), +,-, like this:

q,w,e,r,t,y,u,i,o,p,label
-,+,+,e,e,e,e,e,e,e,type2
+,+,e,e,e,e,e,e,e,e,type1
-,+,e,e,e,e,e,e,e,e,type2

'e' is just a padding character, so that vectors have a fixed lenght of 10.

note:data is significantly tilted toward one label (90% of it), and the dataset is small, < 100 points.

I use Apple's vanilla script to prepare and process the data (derived from here):

import turicreate as tc

# Load the data
data =  tc.SFrame('data.csv')

# Note, for sake of investigating why predictions do not work on Swift, the model is deliberately over-fitted, with split 1.0
train_data, test_data = data.random_split(1.0)
print(train_data)

# Automatically picks the right model based on your data.
model = tc.classifier.create(train_data, target='label', features = ['q','w','e','r','t','y','u','i','o','p'])

# Generate predictions (class/probabilities etc.), contained in an SFrame.
predictions = model.classify(train_data)

# Evaluate the model, with the results stored in a dictionary
results = model.evaluate(train_data)

print("***********")
print(results['accuracy'])
print("***********")
model.export_coreml("MyModel.mlmodel")

Note:The model is over-fitted on the whole data (for now). Convergence seems ok,

PROGRESS: Model selection based on validation accuracy:
PROGRESS: ---------------------------------------------
PROGRESS: BoostedTreesClassifier          : 1.0
PROGRESS: RandomForestClassifier          : 0.9032258064516129
PROGRESS: DecisionTreeClassifier          : 0.9032258064516129
PROGRESS: SVMClassifier                   : 1.0
PROGRESS: LogisticClassifier              : 1.0
PROGRESS: ---------------------------------------------
PROGRESS: Selecting BoostedTreesClassifier based on validation set performance.

And the classification works as expected (although over-fitted) However, when i use the mlmodel in my code, no matter what, it returns always the same label, here 'type2'. The rule is here type1 = only "+" and "e", type2 = all others combinations.

I tried using the text_classifier, the results are far less accurate...

I have no idea what I'm doing wrong....

Just in case someone wants to check, for a small data set, here's the raw data.

q,w,e,r,t,y,u,i,o,p,label
-,+,+,e,e,e,e,e,e,e,type2
-,+,e,e,e,e,e,e,e,e,type2
+,+,-,+,e,e,e,e,e,e,type2
-,-,+,-,e,e,e,e,e,e,type2
+,e,e,e,e,e,e,e,e,e,type1
-,-,+,+,e,e,e,e,e,e,type2
+,-,+,-,e,e,e,e,e,e,type2
-,+,-,-,e,e,e,e,e,e,type2
+,-,-,+,e,e,e,e,e,e,type2
+,+,e,e,e,e,e,e,e,e,type1
+,+,-,-,e,e,e,e,e,e,type2
-,+,-,e,e,e,e,e,e,e,type2
-,-,-,-,e,e,e,e,e,e,type2
-,-,e,e,e,e,e,e,e,e,type2
-,-,-,e,e,e,e,e,e,e,type2
+,+,+,+,e,e,e,e,e,e,type1
+,-,+,+,e,e,e,e,e,e,type2
+,+,+,e,e,e,e,e,e,e,type1
+,-,-,-,e,e,e,e,e,e,type2
+,-,-,e,e,e,e,e,e,e,type2
+,+,+,-,e,e,e,e,e,e,type2
+,-,e,e,e,e,e,e,e,e,type2
+,-,+,e,e,e,e,e,e,e,type2
-,-,+,e,e,e,e,e,e,e,type2
+,+,-,e,e,e,e,e,e,e,type2
e,e,e,e,e,e,e,e,e,e,type1
-,+,+,-,e,e,e,e,e,e,type2
-,-,-,+,e,e,e,e,e,e,type2
-,e,e,e,e,e,e,e,e,e,type2
-,+,+,+,e,e,e,e,e,e,type2
-,+,-,+,e,e,e,e,e,e,type2

And the swift code:

//Helper
extension MyModelInput {
    public convenience init(v:[String]) {
        self.init(q: v[0], w: v[1], e: v[2], r: v[3], t: v[4], y: v[5], u: v[6], i: v[7], o: v[8], p:v[9])
    }
}
    let classifier = MyModel()
    let data = ["-,+,+,e,e,e,e,e,e,e,e", "-,+,e,e,e,e,e,e,e,e,e", "+,+,-,+,e,e,e,e,e,e,e", "-,-,+,-,e,e,e,e,e,e,e","+,e,e,e,e,e,e,e,e,e,e"]
    data.forEach { (tt) in
        let gg = MyModelInput(v: tt.components(separatedBy: ","))
        if let prediction = try? classifier.prediction(input: gg) {
            print(prediction.labelProbability)
        }
    }

The python code saves a MyModel.mlmodel file, which you can add to any Xcode project and use the code above.

note: the python part works fine, for example:

+---+---+---+---+---+---+---+---+---+---+-------+
| q | w | e | r | t | y | u | i | o | p | label |
+---+---+---+---+---+---+---+---+---+---+-------+
| + | + | + | + | e | e | e | e | e | e | type1 |
+---+---+---+---+---+---+---+---+---+---+-------+

is labelled as expected. But when using the swift code, the label comes out as type2. This thing is driving be berserk (and yes, I checked that the mlmodel replaces the old one whenever i create a new version, and also in Xcode).

来源：https://stackoverflow.com/questions/57921856/apple-turicreate-always-return-the-same-label

标签

swift

machine-learning

classification

turi-create