text-classification | 易学教程

Generating dictionaries to categorize tweets into pre-defined categories using NLTK

阅读更多关于 Generating dictionaries to categorize tweets into pre-defined categories using NLTK

问题 I have a list of twitter users (screen_names) and I need to categorise them into 7 pre-defined categories - Education, Art, Sports, Business, Politics, Automobiles, Technology based on thier interest area. I have extracted last 100 tweets of the users in Python and created a corpus for each user after cleaning the tweets. As mentioned here Tweet classification into multiple categories on (Unsupervised data/tweets) : I am trying to generate dictionaries of common words under each category so

Generating dictionaries to categorize tweets into pre-defined categories using NLTK

阅读更多关于 Generating dictionaries to categorize tweets into pre-defined categories using NLTK

Spacy TextCat Score in MultiLabel Classfication

阅读更多关于 Spacy TextCat Score in MultiLabel Classfication

问题 In the spacy's text classification train_textcat example, there are two labels specified Positive and Negative . Hence the cats score is represented as cats = [{"POSITIVE": bool(y), "NEGATIVE": not bool(y)} for y in labels] I am working with Multilabel classfication which means i have more than two labels to tag in one text. I have added my labels as textcat.add_label("CONSTRUCTION") and to specify cats score I have used cats = [{"POSITIVE": bool(y), "NEGATIVE": not bool(y)} for y in labels]

How to use Hugging Face Transformers library in Tensorflow for text classification on custom data?

阅读更多关于 How to use Hugging Face Transformers library in Tensorflow for text classification on custom data?

问题 I am trying to do binary text classification on custom data (which is in csv format) using different transformer architectures that Hugging Face 'Transformers' library offers. I am using this Tensorflow blog post as reference. I am loading the custom dataset into 'tf.data.Dataset' format using the following code: def get_dataset(file_path, **kwargs): dataset = tf.data.experimental.make_csv_dataset( file_path, batch_size=5, # Artificially small to make examples easier to show. na_value="", num

Feature hashing in R for Text classification

阅读更多关于 Feature hashing in R for Text classification

问题 I'm trying to implement feature hashing in R to help me with a text classification problem, but i'm not sure if i'm doing it the way it should be. Part of my code is based on this post: Hashing function for mapping integers to a given range?. My code: random.data = function(n = 200, wlen = 40, ncol = 10){ random.word = function(n){ paste0(sample(c(letters, 0:9), n, TRUE), collapse = '') } matrix(replicate(n, random.word(wlen)), ncol = ncol) } feature_hash = function(doc, N){ doc = as.matrix

Text Classification(Spacy) in place of Gensim

阅读更多关于 Text Classification(Spacy) in place of Gensim

问题 Hello i am using gemsin library for semantic text similarity classification but i am failed to load the data of gemsin file even it takes too much time to execute the program when we use jupyter notebbok and run cells. So my question is that can we use spacy library to overcome this type of error and can we fount out the similarity between two document files.i have seen tf-idf for semantic similarity here is error MemoryError: Unable to allocate 3.35 GiB for an array with shape (3000000, 300)

R: problems applying LIME to quanteda text model

阅读更多关于 R: problems applying LIME to quanteda text model

问题 it's a modified version of my previous question: I'm trying to run LIME on my quanteda text model that feeds off Trump & Clinton tweets data. I run it following an example given by Thomas Pedersen in his Understanding LIME and useuful SO answer provided by @Weihuang Wong: library(dplyr) library(stringr) library(quanteda) library(lime) #data prep tweet_csv <- read_csv("tweets.csv") # creating corpus and dfm for train and test sets get_matrix <- function(df){ corpus <- quanteda::corpus(df) dfm

R: problems applying LIME to quanteda text model

阅读更多关于 R: problems applying LIME to quanteda text model

How to do Text classification using tensorflow?

阅读更多关于 How to do Text classification using tensorflow?

问题 I am new to tensorflow and machine learning. I am facing issues with writing a tensorflow code which does the text classification similar to one I tried using sklearn libraries. I am facing major issues with vectorising the dataset and providing the input to tensorflow layers. I do remember being successful in one hot encoding the labels but the tensorflow layer ahead did not accept the created array. Please note, I have read majority of text clasification answered questions on stackoverflow

Save progress between multiple instances of partial_fit in Python SGDClassifier

阅读更多关于 Save progress between multiple instances of partial_fit in Python SGDClassifier

问题 I've successfully followed this example for my own text classification script. The problem is I'm not looking to process pieces of a huge, but existing data set in a loop of partial_fit calls, like they do in the example. I want to be able to add data as it becomes available, even if I shut down my python script in the meantime. Ideally I'd like to do something like this: sometime in 2015: model2015=partial_fit(dataset2015) save_to_file(model2015) shut down my python script sometime in 2016: