nlp | 易学教程

SPACY custom NER is not returning any entity

阅读更多关于 SPACY custom NER is not returning any entity

问题 I am trying to train a Spacy model to recognize a few custom NERs, the training data is given below, it is mostly related to recognizing a few server models, date in the FY format and Types of HDD: TRAIN_DATA = [('Send me the number of units shipped in FY21 for A566TY server', {'entities': [(39, 42, 'DateParse'),(48,53,'server')]}), ('Send me the number of units shipped in FY-21 for A5890Y server', {'entities': [(39, 43, 'DateParse'),(49,53,'server')]}), ('How many systems sold with 3.5 inch

Using R, how to get the “diff” of two strings?

阅读更多关于 Using R, how to get the “diff” of two strings?

问题 The base R function diff computes a first difference, useful for lagged data comparisons. I am looking for the GNU diff function accessible in R: https://www.computerhope.com/unix/udiff.htm This function is useful for version control, but also useful in natural language processes to identify changes or edits between two similar text elements. This is also an underlying engine of git and so on. Ideally the function would be gnudiff(text1,text2) and if tied to quanteda or another library, that

Using R, how to get the “diff” of two strings?

阅读更多关于 Using R, how to get the “diff” of two strings?

How to detect the dominant language of a text word?

阅读更多关于 How to detect the dominant language of a text word?

问题 It's looks good for string but it's not working for me for a word . I am working with search as per as my requirement when user typing any 3 character in the meantime looking to check which language user typing. if I think it should not work with detec0t word but i expect it should be working with Islam word. let tagger = NSLinguisticTagger(tagSchemes:[.tokenType, .language, .lexicalClass, .nameType, .lemma], options: 0) func determineLanguage(for text: String) { tagger.string = text let

Binary Classification using the N-Grams

阅读更多关于 Binary Classification using the N-Grams

问题 I want to extract the ngrams of the tweets, from two groups of users (0/1), to make a CSV file as follows for a binary classifier. user_tweets, ngram1, ngram2, ngram3, ..., label 1, 0.0, 0.0, 0.0, ..., 0 2, 0.0, 0.0, 0.0, ..., 1 .. My question is whether I should first extract the important ngrams of the two groups, and then score each ngram that I found in the user's tweets? or is there an easier way to do this? 来源： https://stackoverflow.com/questions/66092089/binary-classification-using-the

How to get phrase count in Spacy phrasematcher

阅读更多关于 How to get phrase count in Spacy phrasematcher

问题 I am trying spaCy's PhraseMatcher. I have used an adaptation of the example given in the website like below. color_patterns = [nlp(text) for text in ('red', 'green', 'yellow')] product_patterns = [nlp(text) for text in ('boots', 'coats', 'bag')] material_patterns = [nlp(text) for text in ('bat', 'yellow ball')] matcher = PhraseMatcher(nlp.vocab) matcher.add('COLOR', None, *color_patterns) matcher.add('PRODUCT', None, *product_patterns) matcher.add('MATERIAL', None, *material_patterns) doc =

How to get phrase count in Spacy phrasematcher

阅读更多关于 How to get phrase count in Spacy phrasematcher

how to use bert for long sentences? [duplicate]

阅读更多关于 how to use bert for long sentences? [duplicate]

问题 This question already has answers here : How to use Bert for long text classification? (6 answers) Closed 5 months ago . I am trying to classify given text into news, clickbait or others. The texts which I have for training are long.distribution of lengths is shown here. Now, the question is should I trim the text at the middle and make it 512 tokens long? But, I have even documents with circa 10,000 words so won't I loose the gist by truncation? Or, should I split my text into sub texts of

how to use bert for long sentences? [duplicate]

阅读更多关于 how to use bert for long sentences? [duplicate]

how to use bert for long sentences? [duplicate]

阅读更多关于 how to use bert for long sentences? [duplicate]