How do I create gold data for TextCategorizer training?

前端 未结 1 1287
慢半拍i
慢半拍i 2021-01-05 08:07

I want to train a TextCategorizer model with the following (text, label) pairs.

Label COLOR:

  • The door is brown.
  • The barn
相关标签:
1条回答
  • 2021-01-05 08:56

    According to this example train_textcat.py it should be something like {'cats': {'ANIMAL': 0, 'COLOR': 1}} if you want to train a multi-label model. Also, if you have only two classes, you can simply use {'cats': {'ANIMAL': 1}} for label ANIMAL and {'cats': {'ANIMAL': 0}} for label COLOR.

    You can use the following minimal working example for a one category text classification;

    import spacy
    
    nlp = spacy.load('en')
    
    train_data = [
        (u"That was very bad", {"cats": {"POSITIVE": 0}}),
        (u"it is so bad", {"cats": {"POSITIVE": 0}}),
        (u"so terrible", {"cats": {"POSITIVE": 0}}),
        (u"I like it", {"cats": {"POSITIVE": 1}}),
        (u"It is very good.", {"cats": {"POSITIVE": 1}}),
        (u"That was great!", {"cats": {"POSITIVE": 1}}),
    ]
    
    
    textcat = nlp.create_pipe('textcat')
    nlp.add_pipe(textcat, last=True)
    textcat.add_label('POSITIVE')
    optimizer = nlp.begin_training()
    for itn in range(100):
        for doc, gold in train_data:
            nlp.update([doc], [gold], sgd=optimizer)
    
    doc = nlp(u'It is good.')
    print(doc.cats)
    
    0 讨论(0)
提交回复
热议问题