ML.NET Show which score relates to which label

问题

With ML.Net I am using a classifier for text interpretation. The prediction has a score column as float[] and a predicted label. This works in that the highest score relates to the predicted label, but the other scores are just floats in no particular order. How do I know which score relates to which label? How can I see what the second highest weighted label?

For example, I get this back: 0.00005009 0.00893076 0.1274763 0.6209787 0.2425644

The 0.6 is my predicted label, but I also need to see which label the 0.24 is so I can see why it is confused.

Labels are text strings such as "Greeting" or "Joke" which were Dictionarized in the pipeline, so maybe that is why they aren't in the correct order?

Is there any way in ML.Net to link the two together? To show which score relates to which label?

回答1:

You can get the labels corresponding to the scores using the following code:

string[] scoreLabels;
model.TryGetScoreLabelNames(out scoreLabels);

Additional details can be found here and here.

Note that this may change with the upcoming ML.NET 0.6 APIs. These APIs will expose the Schema directly and enable getting this information (along with other useful information). This might be similar to how TryGetScoreLabelNames works today.

回答2:

For newer versions this one will do the trick as TryGetScoreLabelNames has been removed:

    var scoreEntries = GetSlotNames(predictor.OutputSchema, "Score");

    ...

    private static List<string> GetSlotNames(DataViewSchema schema, string name)
    {
        var column = schema.GetColumnOrNull(name);

        var slotNames = new VBuffer<ReadOnlyMemory<char>>();
        column.Value.GetSlotNames(ref slotNames);
        var names = new string[slotNames.Length];
        var num = 0;
        foreach (var denseValue in slotNames.DenseValues())
        {
            names[num++] = denseValue.ToString();
        }

        return names.ToList();
    }

(Source: http://www.programmersought.com/article/3762753756/)

Of course this needs more error handling etc.

回答3:

This problem can be avoided from the point of building the pipeline. Ensure that you one hot encoded or featurized column have distinct column names. Both input and output columns will still be present in the DataView so you just build your output model appropriately.

For example:

when building the pipeline

var pipeline = mlContext.Transforms.Categorical.OneHotEncoding(outputColumnName: "label_hotencoded", inputColumnName: "label")
// Append other processing in the pipeline 
.Append(...)
// Ensure that you override the default name("label") for the label column in the pipeline trainer and/or calibrator to your hot encoded label column
.Append(mlContext.BinaryClassification.Trainers.FastTree(labelColumnName: "label_hotencoded"))
.Append(mlContext.BinaryClassification.Calibrators.Platt(labelColumnName: "label_hotencoded"));

You can now build your output model POCO class to receive the value you want

public class OutputModel
{      
    [ColumnName("label")]
    public string Label{ get; set; }

    [ColumnName("Score")]
    public float Score{ get; set; }
}

This way your output columns are human-readable and at the same time your input columns to the trainer are in the correct format.

NOTE: This technique can be used with other columns in your data too. Just ensure you use distinct column names when transforming columns in the pipeline and pass in the correct column name when concatenating to "Features". Your output model class can then be written to extract any values you want.

来源：https://stackoverflow.com/questions/52598955/ml-net-show-which-score-relates-to-which-label

标签

artificial-intelligence

ml.net