nlp

SPACY custom NER is not returning any entity

主宰稳场 提交于 2021-02-11 13:24:58
问题 I am trying to train a Spacy model to recognize a few custom NERs, the training data is given below, it is mostly related to recognizing a few server models, date in the FY format and Types of HDD: TRAIN_DATA = [('Send me the number of units shipped in FY21 for A566TY server', {'entities': [(39, 42, 'DateParse'),(48,53,'server')]}), ('Send me the number of units shipped in FY-21 for A5890Y server', {'entities': [(39, 43, 'DateParse'),(49,53,'server')]}), ('How many systems sold with 3.5 inch

Using R, how to get the “diff” of two strings?

*爱你&永不变心* 提交于 2021-02-11 13:10:47
问题 The base R function diff computes a first difference, useful for lagged data comparisons. I am looking for the GNU diff function accessible in R: https://www.computerhope.com/unix/udiff.htm This function is useful for version control, but also useful in natural language processes to identify changes or edits between two similar text elements. This is also an underlying engine of git and so on. Ideally the function would be gnudiff(text1,text2) and if tied to quanteda or another library, that

Using R, how to get the “diff” of two strings?

痴心易碎 提交于 2021-02-11 13:10:38
问题 The base R function diff computes a first difference, useful for lagged data comparisons. I am looking for the GNU diff function accessible in R: https://www.computerhope.com/unix/udiff.htm This function is useful for version control, but also useful in natural language processes to identify changes or edits between two similar text elements. This is also an underlying engine of git and so on. Ideally the function would be gnudiff(text1,text2) and if tied to quanteda or another library, that

How to detect the dominant language of a text word?

百般思念 提交于 2021-02-11 12:17:38
问题 It's looks good for string but it's not working for me for a word . I am working with search as per as my requirement when user typing any 3 character in the meantime looking to check which language user typing. if I think it should not work with detec0t word but i expect it should be working with Islam word. let tagger = NSLinguisticTagger(tagSchemes:[.tokenType, .language, .lexicalClass, .nameType, .lemma], options: 0) func determineLanguage(for text: String) { tagger.string = text let

Binary Classification using the N-Grams

萝らか妹 提交于 2021-02-11 06:51:48
问题 I want to extract the ngrams of the tweets, from two groups of users (0/1), to make a CSV file as follows for a binary classifier. user_tweets, ngram1, ngram2, ngram3, ..., label 1, 0.0, 0.0, 0.0, ..., 0 2, 0.0, 0.0, 0.0, ..., 1 .. My question is whether I should first extract the important ngrams of the two groups, and then score each ngram that I found in the user's tweets? or is there an easier way to do this? 来源: https://stackoverflow.com/questions/66092089/binary-classification-using-the

How to get phrase count in Spacy phrasematcher

若如初见. 提交于 2021-02-10 22:37:35
问题 I am trying spaCy's PhraseMatcher. I have used an adaptation of the example given in the website like below. color_patterns = [nlp(text) for text in ('red', 'green', 'yellow')] product_patterns = [nlp(text) for text in ('boots', 'coats', 'bag')] material_patterns = [nlp(text) for text in ('bat', 'yellow ball')] matcher = PhraseMatcher(nlp.vocab) matcher.add('COLOR', None, *color_patterns) matcher.add('PRODUCT', None, *product_patterns) matcher.add('MATERIAL', None, *material_patterns) doc =

How to get phrase count in Spacy phrasematcher

馋奶兔 提交于 2021-02-10 22:36:09
问题 I am trying spaCy's PhraseMatcher. I have used an adaptation of the example given in the website like below. color_patterns = [nlp(text) for text in ('red', 'green', 'yellow')] product_patterns = [nlp(text) for text in ('boots', 'coats', 'bag')] material_patterns = [nlp(text) for text in ('bat', 'yellow ball')] matcher = PhraseMatcher(nlp.vocab) matcher.add('COLOR', None, *color_patterns) matcher.add('PRODUCT', None, *product_patterns) matcher.add('MATERIAL', None, *material_patterns) doc =

how to use bert for long sentences? [duplicate]

别来无恙 提交于 2021-02-10 15:50:21
问题 This question already has answers here : How to use Bert for long text classification? (6 answers) Closed 5 months ago . I am trying to classify given text into news, clickbait or others. The texts which I have for training are long.distribution of lengths is shown here. Now, the question is should I trim the text at the middle and make it 512 tokens long? But, I have even documents with circa 10,000 words so won't I loose the gist by truncation? Or, should I split my text into sub texts of

how to use bert for long sentences? [duplicate]

最后都变了- 提交于 2021-02-10 15:47:33
问题 This question already has answers here : How to use Bert for long text classification? (6 answers) Closed 5 months ago . I am trying to classify given text into news, clickbait or others. The texts which I have for training are long.distribution of lengths is shown here. Now, the question is should I trim the text at the middle and make it 512 tokens long? But, I have even documents with circa 10,000 words so won't I loose the gist by truncation? Or, should I split my text into sub texts of

how to use bert for long sentences? [duplicate]

懵懂的女人 提交于 2021-02-10 15:47:03
问题 This question already has answers here : How to use Bert for long text classification? (6 answers) Closed 5 months ago . I am trying to classify given text into news, clickbait or others. The texts which I have for training are long.distribution of lengths is shown here. Now, the question is should I trim the text at the middle and make it 512 tokens long? But, I have even documents with circa 10,000 words so won't I loose the gist by truncation? Or, should I split my text into sub texts of