google-natural-language

Format of the input dataset for Google AutoML Natural Language multi-label text classification

大憨熊 提交于 2019-12-06 04:39:31
What should the format of the input dataset be for Google AutoML Natural Language multi-label text classification? I know that for multi-class classification I need a column of text and another column for labels. The labels column include one label per row. I have multiple labels for each text and I want to do multi-label classification. I tried having one column per label and one-hot encoding but I got this error message: Max 1000 labels supported. Found 9823 labels. It was very confusing at first but later I managed to find the format in the documentation, which is a CSV file like: text1,