nlp | 易学教程

Azure ML Studio ML Pipeline - Exception: No temp file found

阅读更多关于 Azure ML Studio ML Pipeline - Exception: No temp file found

问题 I've successfully run an ML Pipeline experiment and published the Azure ML Pipeline without issues. When I run the following directly after the successful run and publish (i.e. I'm running all cells using Jupyter), the test fails! interactive_auth = InteractiveLoginAuthentication() auth_header = interactive_auth.get_authentication_header() rest_endpoint = published_pipeline.endpoint response = requests.post(rest_endpoint, headers=auth_header, json={"ExperimentName": "***redacted***",

How to handle variable length data for LSTM

阅读更多关于 How to handle variable length data for LSTM

问题 From what I know the general steps to preprocess data for LSTM include the following steps vocab_size = 20000 # Only consider the top 20k words maxlen = 200 # Only consider the first 200 words of each movie review (x_train, y_train), (x_val, y_val) = keras.datasets.imdb.load_data(num_words=vocab_size) print(len(x_train), "Training sequences") print(len(x_val), "Validation sequences") x_train0 = keras.preprocessing.sequence.pad_sequences(x_train, maxlen=maxlen) x_val0 = keras.preprocessing

Why do we use log probability in deep learning?

阅读更多关于 Why do we use log probability in deep learning?

问题 I got curious while reading the paper 'Sequence to Sequence Learning with Neural Networks'. In fact, not only this paper but also many other papers use log probabilities, is there a reason for that? Please check the attached photo. 回答1: For any given problem we need to optimise the likelihood of parameters. But optimising the product require all data at once and requires huge computation. We know that a sum is a lot easier to optimise as the derivative of a sum is the sum of derivatives. So,

Can't set the attribute “trainable_weights”, likely because it conflicts with an existing read-only

阅读更多关于 Can't set the attribute “trainable_weights”, likely because it conflicts with an existing read-only

问题 My code was running perfectly in colab. But today it's not running. It says Can't set the attribute "trainable_weights", likely because it conflicts with an existing read-only @property of the object. Please choose a different name. I am using LSTM with the attention layer. class Attention(Layer): def __init__(self, **kwargs): self.init = initializers.get('normal') #self.input_spec = [InputSpec(ndim=3)] super(Attention, self).__init__(**kwargs) def build(self, input_shape): assert len(input

Get antonyms for a word in java - Wordnet JWI

阅读更多关于 Get antonyms for a word in java - Wordnet JWI

问题 I am interested in finding antonyms for a word using wordnet in Java. I am currently using this method to find antonyms but I have yet to find any words which have antonyms. Are antonyms not common in Wordnet? Or is this implementation flawed? public List<String> getAntonyms(String baseWord) { List<String> synonymList = new ArrayList<>(); IIndexWord[] baseWordPOS = getAllPOSForBaseWord(baseWord); for (IIndexWord iIndexWord : baseWordPOS) { if (iIndexWord == null) { continue; } for (IWordID

Unable to do Stacking for a Multi-label classifier

阅读更多关于 Unable to do Stacking for a Multi-label classifier

问题 I am working on a multi-label text classification problem (Total target labels 90). The data distribution has a long tail and class imbalance and around 100k records. I am using the OAA strategy (One against all). I am trying to create an ensemble using Stacking. Text features : HashingVectorizer (number of features 2**20, char analyzer) TSVD to reduce the dimensionality (n_components=200). text_pipeline = Pipeline([ ('hashing_vectorizer', HashingVectorizer(n_features=2**20, analyzer='char'))

Confusion in understanding the output of BERTforTokenClassification class from Transformers library

阅读更多关于 Confusion in understanding the output of BERTforTokenClassification class from Transformers library

问题 It is the example given in the documentation of transformers pytorch library from transformers import BertTokenizer, BertForTokenClassification import torch tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = BertForTokenClassification.from_pretrained('bert-base-uncased', output_hidden_states=True, output_attentions=True) input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0) # Batch size 1 labels = torch.tensor([1] * input

How to extract numbers from a text file and multiply them together?

阅读更多关于 How to extract numbers from a text file and multiply them together?

问题 I have a text file which contains 800 words with a number in front of each. (Each word and its number is in a new line. It means the file has 800 lines) I have to find the numbers and then multiply them together. Because multiplying a lot of floats equals to zero, I have to use logarithm to prevent the underflow, but I don't know how. this is the formula: cNB=argmaxlogP(c )+log P(x | c ) this code doesn't print anything. output = [] with open('c:/python34/probEjtema.txt', encoding="utf-8")

Different models with gensim Word2Vec on python

阅读更多关于 Different models with gensim Word2Vec on python

问题 I am trying to apply the word2vec model implemented in the library gensim in python. I have a list of sentences (each sentences is a list of words). For instance let us have: sentences=[['first','second','third','fourth']]*n and I implement two identical models: model = gensim.models.Word2Vec(sententes, min_count=1,size=2) model2=gensim.models.Word2Vec(sentences, min_count=1,size=2) I realize that the models sometimes are the same, and sometimes are different, depending on the value of n. For

how to send multiple text strings in a single post request to google cloud natural language api

阅读更多关于 how to send multiple text strings in a single post request to google cloud natural language api

问题 here is my python code def sentiment_local_file(text): """Detects sentiment in the local document""" language_client = language.Client() if isinstance(text, six.binary_type): text = text.decode('utf-8') with open("abhi.txt",'r') as fr: data = json.loads(fr.read()) print ([data['document']['content']]) document = language_client.document_from_text(data['document']['content']) result = document.annotate_text(include_sentiment=True, include_syntax=False, include_entities=False) I am trying to