nlp | 易学教程

SOLR and Natural Language Parsing - Can I use it?

阅读更多关于 SOLR and Natural Language Parsing - Can I use it?

问题 Requirements Word frequency algorithm for natural language processing Using Solr While the answer for that question is excellent, I was wondering if I could make use of all the time I spent getting to know SOLR for my NLP. I thought of SOLR because: It's got a bunch of tokenizers and performs a lot of NLP. It's pretty use to use out of the box. It's restful distributed app, so it's easy to hook up I've spent some time with it, so using could save me time. Can I use Solr? Although the above

SOLR and Natural Language Parsing - Can I use it?

阅读更多关于 SOLR and Natural Language Parsing - Can I use it?

How to find all the related keywords for a root word?

阅读更多关于 How to find all the related keywords for a root word?

问题 I am trying to figure out a way to find all the keywords that come from the same root word (in some sense the opposite action of stemming). Currently, I am using R for coding, but I am open to switching to a different language if it helps. For instance, I have the root word "rent" and I would like to be able to find "renting", "renter", "rental", "rents" and so on. 回答1: Try this code in python: from pattern.en import lexeme print(lexeme("rent") the output generated is: Installation : pip

How to find all the related keywords for a root word?

阅读更多关于 How to find all the related keywords for a root word?

Google BERT and antonym detection

阅读更多关于 Google BERT and antonym detection

问题 I recently learned about the following phenomenon: Google BERT word embeddings of well-known state-of-the-art models seem to ignore the measure of semantical contrast between antonyms in terms of the natural distance(norm2 or cosine distance) between the corresponding embeddings. For example: The measure is the "cosine distance" (as oppose to the "cosine similarity"), that means closer vectors are supposed to have smaller distance between them. As one can see, BERT states "weak" and "powerful

how calculate distance between 2 node2vec model

阅读更多关于 how calculate distance between 2 node2vec model

问题 I have 2 node2vec models in different timestamps. I want to calculate the distance between 2 models. Two models have the same vocab and we update the models. My models are like this model1: "1":0.1,0.5,... "2":0.3,-0.4,... "3":0.2,0.5,... . . . model2: "1":0.15,0.54,... "2":0.24,-0.35,... "3":0.24,0.47,... . . . 回答1: Assuming you've used a standard word2vec library to train your models, each run bootstraps a wholly-separate model whose coordinates are not necessarily comparable to any other

extract name entities and its corresponding numerical values from sentence

阅读更多关于 extract name entities and its corresponding numerical values from sentence

问题 I want to extract information from sentences. Currently, I am able to do the following using spacy. Amy's monthly payment is $2000. --> (Amy's monthly payment, $2000) However, I am trying to do the following. The monthly payments for Amy, Bob, and Eva are $2000, $3000 and $3500 respectively. --> ((Amy's monthly payment, $2000), (Bob's monthly payment, $3000), (Eva's monthly payment, $3500)) Is there any way that I can perform the task using the NLP method through python library such as Spacy?

extract name entities and its corresponding numerical values from sentence

阅读更多关于 extract name entities and its corresponding numerical values from sentence

BERT sentence embeddings: how to obtain sentence embeddings vector

阅读更多关于 BERT sentence embeddings: how to obtain sentence embeddings vector

问题 I'm using the module bert-for-tf2 in order to wrap BERT model as Keras layer in Tensorflow 2.0 I've followed your guide for implementing BERT model as Keras layer. I'm trying to extract embeddings from a sentence; in my case, the sentence is "Hello" I have a question about the output of the model prediction; I've written this model: model_word_embedding = tf.keras.Sequential([ tf.keras.layers.Input(shape=(4,), dtype='int32', name='input_ids'), bert_layer ]) model_word_embedding .build(input

Equate strings based on meaning

阅读更多关于 Equate strings based on meaning

问题 Is there a way to equate strings in python based on their meaning despite not being similar. For example, temp. Max maximum ambient temperature I've tried using fuzzywuzzy and difflib and although they are generally good for this using token matching, they also provide false positives when I threshold the outputs over a large number of strings. Is there some other method using NLP or tokenizatio n that I'm missing here? Edit: The answer provided by A CO does solve the problem mentioned above