nlp

Azure ML Studio ML Pipeline - Exception: No temp file found

天大地大妈咪最大 提交于 2021-01-29 08:15:59
问题 I've successfully run an ML Pipeline experiment and published the Azure ML Pipeline without issues. When I run the following directly after the successful run and publish (i.e. I'm running all cells using Jupyter), the test fails! interactive_auth = InteractiveLoginAuthentication() auth_header = interactive_auth.get_authentication_header() rest_endpoint = published_pipeline.endpoint response = requests.post(rest_endpoint, headers=auth_header, json={"ExperimentName": "***redacted***",

How to handle variable length data for LSTM

我怕爱的太早我们不能终老 提交于 2021-01-29 07:15:19
问题 From what I know the general steps to preprocess data for LSTM include the following steps vocab_size = 20000 # Only consider the top 20k words maxlen = 200 # Only consider the first 200 words of each movie review (x_train, y_train), (x_val, y_val) = keras.datasets.imdb.load_data(num_words=vocab_size) print(len(x_train), "Training sequences") print(len(x_val), "Validation sequences") x_train0 = keras.preprocessing.sequence.pad_sequences(x_train, maxlen=maxlen) x_val0 = keras.preprocessing

Why do we use log probability in deep learning?

陌路散爱 提交于 2021-01-29 06:51:36
问题 I got curious while reading the paper 'Sequence to Sequence Learning with Neural Networks'. In fact, not only this paper but also many other papers use log probabilities, is there a reason for that? Please check the attached photo. 回答1: For any given problem we need to optimise the likelihood of parameters. But optimising the product require all data at once and requires huge computation. We know that a sum is a lot easier to optimise as the derivative of a sum is the sum of derivatives. So,

Can't set the attribute “trainable_weights”, likely because it conflicts with an existing read-only

假如想象 提交于 2021-01-29 06:10:42
问题 My code was running perfectly in colab. But today it's not running. It says Can't set the attribute "trainable_weights", likely because it conflicts with an existing read-only @property of the object. Please choose a different name. I am using LSTM with the attention layer. class Attention(Layer): def __init__(self, **kwargs): self.init = initializers.get('normal') #self.input_spec = [InputSpec(ndim=3)] super(Attention, self).__init__(**kwargs) def build(self, input_shape): assert len(input

Get antonyms for a word in java - Wordnet JWI

↘锁芯ラ 提交于 2021-01-29 02:34:53
问题 I am interested in finding antonyms for a word using wordnet in Java. I am currently using this method to find antonyms but I have yet to find any words which have antonyms. Are antonyms not common in Wordnet? Or is this implementation flawed? public List<String> getAntonyms(String baseWord) { List<String> synonymList = new ArrayList<>(); IIndexWord[] baseWordPOS = getAllPOSForBaseWord(baseWord); for (IIndexWord iIndexWord : baseWordPOS) { if (iIndexWord == null) { continue; } for (IWordID

Unable to do Stacking for a Multi-label classifier

 ̄綄美尐妖づ 提交于 2021-01-28 19:12:39
问题 I am working on a multi-label text classification problem (Total target labels 90). The data distribution has a long tail and class imbalance and around 100k records. I am using the OAA strategy (One against all). I am trying to create an ensemble using Stacking. Text features : HashingVectorizer (number of features 2**20, char analyzer) TSVD to reduce the dimensionality (n_components=200). text_pipeline = Pipeline([ ('hashing_vectorizer', HashingVectorizer(n_features=2**20, analyzer='char'))

Confusion in understanding the output of BERTforTokenClassification class from Transformers library

旧巷老猫 提交于 2021-01-28 19:04:01
问题 It is the example given in the documentation of transformers pytorch library from transformers import BertTokenizer, BertForTokenClassification import torch tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = BertForTokenClassification.from_pretrained('bert-base-uncased', output_hidden_states=True, output_attentions=True) input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0) # Batch size 1 labels = torch.tensor([1] * input

How to extract numbers from a text file and multiply them together?

我与影子孤独终老i 提交于 2021-01-28 19:02:06
问题 I have a text file which contains 800 words with a number in front of each. (Each word and its number is in a new line. It means the file has 800 lines) I have to find the numbers and then multiply them together. Because multiplying a lot of floats equals to zero, I have to use logarithm to prevent the underflow, but I don't know how. this is the formula: cNB=argmaxlogP(c )+log P(x | c ) this code doesn't print anything. output = [] with open('c:/python34/probEjtema.txt', encoding="utf-8")

Different models with gensim Word2Vec on python

我们两清 提交于 2021-01-28 14:02:40
问题 I am trying to apply the word2vec model implemented in the library gensim in python. I have a list of sentences (each sentences is a list of words). For instance let us have: sentences=[['first','second','third','fourth']]*n and I implement two identical models: model = gensim.models.Word2Vec(sententes, min_count=1,size=2) model2=gensim.models.Word2Vec(sentences, min_count=1,size=2) I realize that the models sometimes are the same, and sometimes are different, depending on the value of n. For

how to send multiple text strings in a single post request to google cloud natural language api

主宰稳场 提交于 2021-01-28 12:30:54
问题 here is my python code def sentiment_local_file(text): """Detects sentiment in the local document""" language_client = language.Client() if isinstance(text, six.binary_type): text = text.decode('utf-8') with open("abhi.txt",'r') as fr: data = json.loads(fr.read()) print ([data['document']['content']]) document = language_client.document_from_text(data['document']['content']) result = document.annotate_text(include_sentiment=True, include_syntax=False, include_entities=False) I am trying to