softmax

Word2Vec详解

亡梦爱人 提交于 2019-12-14 19:59:42
原文地址:https://www.cnblogs.com/guoyaohua/p/9240336.html 2013年,Google开源了一款用于词向量计算的工具——word2vec,引起了工业界和学术界的关注。首先,word2vec可以在百万数量级的词典和上亿的数据集上进行高效地训练;其次,该工具得到的训练结果——词向量(word embedding),可以很好地度量词与词之间的相似性。随着深度学习(Deep Learning)在自然语言处理中应用的普及,很多人误以为word2vec是一种深度学习算法。其实word2vec算法的背后是一个浅层神经网络。另外需要强调的一点是,word2vec是一个计算word vector的开源工具。当我们在说word2vec算法或模型的时候,其实指的是其背后用于计算word vector的CBoW模型和Skip-gram模型。很多人以为word2vec指的是一个算法或模型,这也是一种谬误。接下来,本文将从统计语言模型出发,尽可能详细地介绍word2vec工具背后的算法模型的来龙去脉。 Statistical Language Model 在深入word2vec算法的细节之前,我们首先回顾一下自然语言处理中的一个基本问题: 如何计算一段文本序列在某种语言下出现的概率?之所为称其为一个基本问题,是因为它在很多NLP任务中都扮演着重要的角色。 例如

Unsupervised Feature Learning via Non-Parametric Instance Discrimination

試著忘記壹切 提交于 2019-12-14 01:08:08
论文地址: Unsupervised Feature Learning via Non-Parametric Instance Discrimination github代码: NCE代码 摘要: 在有标签数据上训练的神经网络分类器能够很好的捕捉图片间的视觉相似性。文章假设:我们能通过训练基于实例(将每一个样本视为单独的类别)的分类器代替基于类别的分类器,得到可以捕捉视觉相似性的特征表达。我们将其总结为 非参数化实例级判别 ,并且通过**噪声对比估计(noise-contrastive estimation)**解决大量实例类别引起的计算困难。 我们的实验证明了,在无监督学习的限制下,我们的方法在ImageNet数据集上超越了当前最好方法。采用更多的训练数据和更先进的网络结构,我们的方法能够进一步提高分类准确率。通过微调学习到的特征,我们能观察到与半监督学习和目标检测任务上相当的结果。同时,我们的非参数化模型十分紧致:每张图片仅需要提取128维的特征,百万量级的图片也仅需要600MB存储空间,使得实际运行时能够很快达成近邻检索的目的。 引言 研究者在本文中提出的无监督学习的创新方法源于对监督学习物体识别结果的一些观察。在 ImageNet 上,top-5 分类误差远低于 top-1 误差 ,并且图像在 softmax 层输出中的预测值排第二的响应类更可能与真实类有视觉关联。 如图

Where is the origin coding of sparse_softmax_cross_entropy_with_logits function in tensorflow

孤人 提交于 2019-12-13 02:27:52
问题 I want to know what the tensorflow function sparse_softmax_cross_entropy_with_logits mathematically is exactly doing. But I can't find the origin of the coding. Can you help me? 回答1: sparse_softmax_cross_entropy_with_logits is equivalent to a numerically stable version of the following: -1. * tf.gather(tf.log(tf.nn.softmax(logits)), target) or, in more "readable" numpy-code: -1. * np.log(softmax(logits))[target] where softmax(x) = np.exp(x)/np.sum(np.exp(x)) . That is, it computes the softmax

tensorflow - softmax ignore negative labels (just like caffe) [duplicate]

寵の児 提交于 2019-12-12 19:51:42
问题 This question already has answers here : TensorFlow: How to handle void labeled data in image segmentation? (2 answers) Closed 2 years ago . In Caffe, there is an option with its SoftmaxWithLoss function to ignore all negative labels (-1) in computing probabilities, so that only 0 or positive label probabilities add up to 1. Is there a similar feature with Tensorflow softmax loss? 回答1: Just came up with a work-around --- I created a one-hot tensor on the label indices using tf.one_hot (with

Why is softmax not used in hidden layers [duplicate]

大兔子大兔子 提交于 2019-12-12 03:27:30
问题 This question already has answers here : Why use softmax only in the output layer and not in hidden layers? (4 answers) Closed 2 years ago . I have read the answer given here. My exact question pertains to the accepted answer: Variables independence : a lot of regularization and effort is put to keep your variables independent, uncorrelated and quite sparse. If you use softmax layer as a hidden layer - then you will keep all your nodes (hidden variables) linearly dependent which may result in

How to use softmax activation function at the output layer, but relus in the middle layers in TensorFlow?

谁说胖子不能爱 提交于 2019-12-12 03:08:51
问题 I have a neural net of 3 hidden layers (so I have 5 layers in total). I want to use Rectified Linear Units at each of the hidden layers, but at the outermost layer I want to apply Softmax on the logits. I want to use the DNNClassifier. I have read the official documentation of the TensorFlow where for setting value of the parameter activation_fn they say: activation_fn: Activation function applied to each layer. If None, will use tf.nn.relu. I know I can always write my own model and use any

How to apply softmax on an array/vector with huge positive and negative values in TensorFlow?

二次信任 提交于 2019-12-11 15:27:03
问题 I train a convolutional neural network (CNN) with MNIST data set in TensorFlow. I calculate the accuracy for each image from the MNIST test images and looking for the values of the ten output-nodes. I use the following line of code to get it (see all code here: How to get the value from each output-node during eval MNIST testdata in TensorFlow?): pred=prediction.eval(feed_dict={ x: testSet[0], y: testSet[1]}) The output of this line of code is for example this: [[ -13423.92773438 -27312

How to vectorize Softmax probability of a multi dimensional matrix

送分小仙女□ 提交于 2019-12-11 10:06:40
问题 I am trying to go through the assignment 1 for Stanford cs244n class. Problem 1b highly recommend optimization for the Softmax function. I managed to get the Softmax of the N dimensional vector. I also got the Softmax of the MxN dimensional matrix but used a for loop through the columns. I have the following code: def softmax(x): orig_shape = x.shape # Matrix if len(x.shape) > 1: softmax = np.zeros(orig_shape) for i,col in enumerate(x): softmax[i] = np.exp(col - np.max(col))/np.sum(np.exp(col

cs231n笔记(3)

自作多情 提交于 2019-12-11 06:53:04
1. Hinge Loss 表达式 Hinge loss也称之为Multiclass SVM loss L ( W ) = 1 / N ∑ i = 1 N ∑ i ≠ j m a x ( 0 , S i − S j + 1 ) L(W) =1/N \sum_{i=1}^N \sum_{i\neq j} max(0, S_i-Sj+1) L ( W ) = 1 / N ∑ i = 1 N ​ ∑ i  ​ = j ​ m a x ( 0 , S i ​ − S j + 1 ) 2. 正则化 当Hinge loss = 0 时,W的取值不唯一,而通过添加正则项可以使得w的值唯一。 3. Softmax 与cross-entropy损失公式 Softmax: P ( Y = k ∣ X = x i ) = e s k ∑ j e s j P(Y=k|X=x_i)=\frac{e^{s_k}}{\sum_j e^{s_j}} P ( Y = k ∣ X = x i ​ ) = ∑ j ​ e s j ​ e s k ​ ​ cross-entropy loss: L = − l o g ( P ( Y = k ∣ X = x i ) ) L = -log(P(Y=k|X=x_i)) L = − l o g ( P ( Y = k ∣ X = x i ​ ) ) 最大值是无穷大,最小值是0

Vectorized Implementation of Softmax Regression

泪湿孤枕 提交于 2019-12-11 06:46:32
问题 I’m implementing softmax regression in Octave. Currently I’m using a non-vectorized implementation using following cost function and derivatives. Source: Softmax Regression Now I want to implement vectorized version of it in Octave. It seems like bit hard for me to write vectorized versions for these equations. Can somebody help me to implement this ? Thanks Upul 回答1: This is very similar to an exercise in Andrew Ng's deep learning class, they give some hints http://ufldl.stanford.edu/wiki