spacy

Spacy custom sentence spliting

梦想与她 提交于 2021-01-28 17:47:35
问题 I using Spacy for custom sentence spliting and i need to parametrized the custom_delimeter/word for sentence spiting but i didnt find how to pass as an arugument here is the function, # Manual or Custom Based def mycustom_boundary(docx): for token in docx[:-1]: if token.text == '...': docx[token.i+1].is_sent_start = True return docx # Adding the rule before parsing nlp.add_pipe(mycustom_boundary,before='parser') Please let me know how can i send as a argument custom based splitter as list to

SpaCy 'nlp.to_disk' is not saving to disk

若如初见. 提交于 2021-01-28 12:00:42
问题 I am trying to figure out why my custom SpaCy NER model isn't saving to disk using nlp.to_disk . I am using this condition in my python script: # save model to output directory if output_dir is not None: output_dir = Path(output_dir) if not output_dir.exists(): output_dir.mkdir() nlp.to_disk(output_dir) print("Saved model to", output_dir) The output_dir is defined at the top of my script as: @plac.annotations( model=("Model name. Defaults to blank 'en' model.", "option", "m", str), output_dir

is there a method of rule based matching of spacy to match patterns?

霸气de小男生 提交于 2021-01-28 11:17:32
问题 i want to use rule based matching i have a text like each word with POS: text1= "it_PRON is_AUX a_DET beautiful_ADJ apple_NOUN" text2= "it_PRON is_AUX a_DET beautiful_ADJ and_CCONJ big_ADJ apple_NOUN" so i want to create a rule based matching that extract if we have an ADJ followed by noun (NOUN) or an ADJ followed by (PUNCT or CCONJ) followed by an ADJ followed by a noun (NOUN) so, iwant to have in output : text1 = [beautiful_ADJ apple_NOUN] text2= [beautiful_ADJ and_CCONJ big_ADJ apple_NOUN

How to use tokenized sentence as input for Spacy's PoS tagger?

不想你离开。 提交于 2021-01-28 11:00:23
问题 Spacy's pos tagger is really convenient, it can directly tag on raw sentence. import spacy sp = spacy.load('en_core_web_sm') sen = sp(u"I am eating") But I'm using tokenizer from nltk . So how to use a tokenized sentence like ['I', 'am', 'eating'] rather than 'I am eating' for the Spacy's tagger? BTW, where can I found detailed Spacy documentation? I can only find an overview on the official website Thanks. 回答1: There's two options: You write a wrapper around the nltk tokenizer and use it to

Spacy tokenizer, add tokenizer exception

早过忘川 提交于 2021-01-28 09:55:04
问题 Hey! I am trying to add an exception at tokenizing some tokens using spacy 2.02, I know that exists .tokenizer.add_special_case() which I am using for some cases but for example a token like US$100, spacy splits in two token ('US$', 'SYM'), ('100', 'NUM') But I want to split in three like this, instead of doing a special case for each number after the us$, i want to make an excpetion for every token that has a forma of US$NUMBER. ('US', 'PROPN'), ('$', 'SYM'), ('800', 'NUM') I was reading

Export vectors from fastText to spaCy

邮差的信 提交于 2021-01-28 07:51:31
问题 I downloaded the fasttext.cc vectors of 1.5gb, I used example code spaCy examples vectors_fast_text. I executed the following command in the terminal: python config/vectors_fast_text.py vectors_loc data/vectors/wiki.pt.vec After a few minutes with the processor at 100%, I received the following text: class colspan 0.32231358 What happens from here? How can I export these vectors elsewhere, such as for example with my AWS S3 training templates? 回答1: I modified the example script, to load the

How to force a pos tag in spacy before/after tagger?

给你一囗甜甜゛ 提交于 2021-01-21 10:45:06
问题 If I process the sentence 'Return target card to your hand' with spacy and the en_web_core_lg model, it recognize the tokens as below: Return NOUN target NOUN card NOUN to ADP your ADJ hand NOUN How can I force 'Return' to be tagged as a VERB? And how can I do it before the parser, so that the parser can better interpret relations between tokens? There are other situations in which this would be useful. I am dealing with text which contains specific symbols such as {G} . These three

Replace entity with its label in SpaCy

二次信任 提交于 2021-01-21 05:14:24
问题 Is there anyway by SpaCy to replace entity detected by SpaCy NER with its label? For example: I am eating an apple while playing with my Apple Macbook. I have trained NER model with SpaCy to detect "FRUITS" entity and the model successfully detects the first "apple" as "FRUITS", but not the second "Apple". I want to do post-processing of my data by replacing each entity with its label, so I want to replace the first "apple" with "FRUITS". The sentence will be " I am eating an FRUITS while

Replace entity with its label in SpaCy

北城余情 提交于 2021-01-21 05:12:24
问题 Is there anyway by SpaCy to replace entity detected by SpaCy NER with its label? For example: I am eating an apple while playing with my Apple Macbook. I have trained NER model with SpaCy to detect "FRUITS" entity and the model successfully detects the first "apple" as "FRUITS", but not the second "Apple". I want to do post-processing of my data by replacing each entity with its label, so I want to replace the first "apple" with "FRUITS". The sentence will be " I am eating an FRUITS while

Spacy nlp = spacy.load(“en_core_web_lg”)

℡╲_俬逩灬. 提交于 2021-01-21 03:48:07
问题 I already have spaCy downloaded, but everytime I try the nlp = spacy.load("en_core_web_lg") , command, I get this error: OSError: [E050] Can't find model 'en_core_web_lg'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory. I already tried >>> import spacy >>> nlp = spacy.load("en_core_web_sm") and this does not work like it would on my personal computer. My question is how do I work around this? What directory specifically do I need to drop the spacy