spacy | 易学教程

Spacy custom sentence spliting

阅读更多关于 Spacy custom sentence spliting

问题 I using Spacy for custom sentence spliting and i need to parametrized the custom_delimeter/word for sentence spiting but i didnt find how to pass as an arugument here is the function, # Manual or Custom Based def mycustom_boundary(docx): for token in docx[:-1]: if token.text == '...': docx[token.i+1].is_sent_start = True return docx # Adding the rule before parsing nlp.add_pipe(mycustom_boundary,before='parser') Please let me know how can i send as a argument custom based splitter as list to

SpaCy 'nlp.to_disk' is not saving to disk

阅读更多关于 SpaCy 'nlp.to_disk' is not saving to disk

问题 I am trying to figure out why my custom SpaCy NER model isn't saving to disk using nlp.to_disk . I am using this condition in my python script: # save model to output directory if output_dir is not None: output_dir = Path(output_dir) if not output_dir.exists(): output_dir.mkdir() nlp.to_disk(output_dir) print("Saved model to", output_dir) The output_dir is defined at the top of my script as: @plac.annotations( model=("Model name. Defaults to blank 'en' model.", "option", "m", str), output_dir

is there a method of rule based matching of spacy to match patterns?

阅读更多关于 is there a method of rule based matching of spacy to match patterns?

问题 i want to use rule based matching i have a text like each word with POS: text1= "it_PRON is_AUX a_DET beautiful_ADJ apple_NOUN" text2= "it_PRON is_AUX a_DET beautiful_ADJ and_CCONJ big_ADJ apple_NOUN" so i want to create a rule based matching that extract if we have an ADJ followed by noun (NOUN) or an ADJ followed by (PUNCT or CCONJ) followed by an ADJ followed by a noun (NOUN) so, iwant to have in output : text1 = [beautiful_ADJ apple_NOUN] text2= [beautiful_ADJ and_CCONJ big_ADJ apple_NOUN

How to use tokenized sentence as input for Spacy's PoS tagger?

阅读更多关于 How to use tokenized sentence as input for Spacy's PoS tagger?

问题 Spacy's pos tagger is really convenient, it can directly tag on raw sentence. import spacy sp = spacy.load('en_core_web_sm') sen = sp(u"I am eating") But I'm using tokenizer from nltk . So how to use a tokenized sentence like ['I', 'am', 'eating'] rather than 'I am eating' for the Spacy's tagger? BTW, where can I found detailed Spacy documentation? I can only find an overview on the official website Thanks. 回答1: There's two options: You write a wrapper around the nltk tokenizer and use it to

Spacy tokenizer, add tokenizer exception

阅读更多关于 Spacy tokenizer, add tokenizer exception

问题 Hey! I am trying to add an exception at tokenizing some tokens using spacy 2.02, I know that exists .tokenizer.add_special_case() which I am using for some cases but for example a token like US$100, spacy splits in two token ('US$', 'SYM'), ('100', 'NUM') But I want to split in three like this, instead of doing a special case for each number after the us$, i want to make an excpetion for every token that has a forma of US$NUMBER. ('US', 'PROPN'), ('$', 'SYM'), ('800', 'NUM') I was reading

Export vectors from fastText to spaCy

阅读更多关于 Export vectors from fastText to spaCy

问题 I downloaded the fasttext.cc vectors of 1.5gb, I used example code spaCy examples vectors_fast_text. I executed the following command in the terminal: python config/vectors_fast_text.py vectors_loc data/vectors/wiki.pt.vec After a few minutes with the processor at 100%, I received the following text: class colspan 0.32231358 What happens from here? How can I export these vectors elsewhere, such as for example with my AWS S3 training templates? 回答1: I modified the example script, to load the

How to force a pos tag in spacy before/after tagger?

阅读更多关于 How to force a pos tag in spacy before/after tagger?

问题 If I process the sentence 'Return target card to your hand' with spacy and the en_web_core_lg model, it recognize the tokens as below: Return NOUN target NOUN card NOUN to ADP your ADJ hand NOUN How can I force 'Return' to be tagged as a VERB? And how can I do it before the parser, so that the parser can better interpret relations between tokens? There are other situations in which this would be useful. I am dealing with text which contains specific symbols such as {G} . These three

Replace entity with its label in SpaCy

阅读更多关于 Replace entity with its label in SpaCy

问题 Is there anyway by SpaCy to replace entity detected by SpaCy NER with its label? For example: I am eating an apple while playing with my Apple Macbook. I have trained NER model with SpaCy to detect "FRUITS" entity and the model successfully detects the first "apple" as "FRUITS", but not the second "Apple". I want to do post-processing of my data by replacing each entity with its label, so I want to replace the first "apple" with "FRUITS". The sentence will be " I am eating an FRUITS while

Replace entity with its label in SpaCy

阅读更多关于 Replace entity with its label in SpaCy

Spacy nlp = spacy.load(“en_core_web_lg”)

阅读更多关于 Spacy nlp = spacy.load(“en_core_web_lg”)

问题 I already have spaCy downloaded, but everytime I try the nlp = spacy.load("en_core_web_lg") , command, I get this error: OSError: [E050] Can't find model 'en_core_web_lg'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory. I already tried >>> import spacy >>> nlp = spacy.load("en_core_web_sm") and this does not work like it would on my personal computer. My question is how do I work around this? What directory specifically do I need to drop the spacy