Character-level Convolutional Networks for Text Classification

余生长醉 提交于 2020-01-22 23:31:30

Abstract

Semantic word spaces have been very useful but cannot express the meaning of longer phrases in a principled way.

语义词空间是非常有用的,但它不能有原则地表达较长短语的意义。

Further progress towards understanding compositionality in tasks such as sentiment detection requires richer supervised training and evaluation resources and more powerful models of composition.

要想在情感检测等任务中进一步理解构成,需要更丰富的监督训练和评估资源,以及更强大的构成模型。

To remedy this, we introduce a Sentiment Treebank.

为了解决这个问题,我们引入了一个情绪树库。

It includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality.

它为11855个句子的解析树中的215154个短语提供了细粒度的情感标签,并为情感构成提出了新的挑战。

To address them, we introduce the Recursive Neural Tensor Network.

为了解决这个问题,我们引入了递归神经张量网络。

When trained on the new treebank, this model outperforms all previous methods on several metrics.

当在新的树桩上进行训练时,该模型在几个指标上优于以前的所有方法。

It pushes the state of the art in single sentence positive/negative classification from 80% up to 85.4%.

它将单一句子的积极/消极分类从80%提升到85.4%。

The accuracy of predicting fine-grained sentiment labels for all phrases reaches 80.7%, an improvement of 9.7% over bag of features baselines.

预测所有短语的细粒度情绪标签的准确性达到80.7%,比功能包基线提高了9.7%。

Lastly, it is the only model that can accurately capture the effects of negation and its scope at various tree levels for both positive and negative phrases.

最后,它是唯一能够准确捕捉否定效果及其在不同树层次上的范围的模型。

1 Introduction

Semantic vector spaces for single words have been widely used as features (Turney and Pantel, 2010).

单个单词的语义向量空间被广泛用作特征(Turney和Pantel, 2010)。

Because they cannot capture the meaning of longer phrases properly, compositionality in semantic vector spaces has recently received a lot of attention (Mitchell and Lapata, 2010; Socher et al., 2010; Zanzotto et al., 2010; Yessenalina and Cardie, 2011; Socher et al., 2012; Grefenstette et al., 2013).

由于不能正确地捕捉较长短语的含义,语义向量空间中的组合性最近受到了很多关注(Mitchell和Lapata, 2010;Socher等人,2010;Zanzotto等人,2010;Yessenalina和Cardie, 2011年;Socher等人,2012;(Grefenstette et al., 2013)。

However, progress is held back by the current lack of large and labeled compositionality resources and models to accurately capture the underlying phenomena presented in such data.

然而,由于目前缺乏大型和标记的可组合性资源和模型来准确地捕获这些数据中呈现的潜在现象,这一进展受到了阻碍。

To address this need, we introduce the Stanford Sentiment Treebank and a powerful Recursive Neural Tensor Network that can accurately predict the compositional semantic effects present in this new corpus.

为了满足这一需求,我们引入了斯坦福情感树库和一个强大的递归神经张量网络,它可以准确地预测这一新语料库中出现的成分语义效应。

Figure 1: Example of the Recursive Neural Tensor Network accurately predicting 5 sentiment classes, very negative to very positive (- 0, +, + +), at every node of a parse tree and capturing the negation and its scope in this sentence.

图1:递归神经张量网络的例子,准确地预测了5个情绪类,从非常负面到非常正面(- 0,+,+ +),在解析树的每个节点,捕捉否定和它在这句话中的范围。

The Stanford Sentiment Treebank is the first corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in language.

斯坦福情绪树库是第一个拥有完整标记的解析树的语料库,它允许对语言中情绪的组成影响进行完整的分析。

The corpus is based on the dataset introduced by Pang and Lee (2005) and consists of 11,855 single sentences extracted from movie reviews.

语料库基于庞和李(2005)介绍的数据集,由11,855个从电影评论中提取的单句组成。

It was parsed with the Stanford parser (Klein and Manning, 2003) and includes a total of 215,154 unique phrases from those parse trees, each annotated by 3 human judges.

它是由斯坦福解析器解析的(Klein和Manning, 2003年),包含了来自这些解析树的总共215,154个独特的短语,每个短语都由3名人类裁判注释。

This new dataset allows us to analyze the intricacies of sentiment and to capture complex linguistic phenomena.

这个新的数据集让我们能够分析情感的复杂性,并捕捉复杂的语言现象。

Fig. 1 shows one of the many examples with clear compositional structure.

图1显示了许多具有清晰的组成结构的例子之一。

The granularity and size of this dataset will enable the community to train compositional models that are based on supervised and structured machine learning techniques.

该数据集的粒度和大小将使社区能够培训基于监督和结构化机器学习技术的组合模型。

While there are several datasets with document and chunk labels available, there is a need to better capture sentiment from short comments, such as Twitter data, which provide less overall signal per document.

虽然有几个带有文档和区块标签的数据集可用,但需要更好地从简短的评论中捕捉情绪,比如Twitter数据,

In order to capture the compositional effects with higher accuracy, we propose a new model called the Recursive Neural Tensor Network (RNTN). Recursive Neural Tensor Networks take as input phrases of any length.

为了获得更准确的成分效应,提出了递归神经张量网络模型。递归神经张量网络作为任意长度的输入短语。

They represent a phrase through word vectors and a parse tree and then compute vectors for higher nodes in the tree using the same tensor-based composition function.

它们通过单词向量和解析树表示短语,然后使用相同的基于时态的组合函数计算树中更高节点的向量。

We compare to several supervised, compositional models such as standard recursive neural networks (RNN) (Socher et al., 2011b), matrix-vector RNNs (Socher et al., 2012), and baselines such as neural networks that ignore word order, Naive Bayes (NB), bi-gram NB and SVM.

我们比较了几种监督的复合模型,如标准递归神经网络(Socher et al., 2011b)、矩阵向量RNNs (Socher et al., 2012),以及基线,如忽略词序的神经网络、朴素贝叶斯(Naive Bayes, NB)、bi-gram NB和SVM。

All models get a significant boost when trained with the new dataset but the RNTN obtains the highest performance with 80.7% accuracy when predicting finegrained sentiment for all nodes.

当使用新数据集进行训练时,所有模型都得到了显著的提升,但是RNTN在预测所有节点的细粒度情绪时获得了最高的性能(80.7%的准确率)。

Lastly, we use a test set of positive and negative sentences and their respective negations to show that, unlike bag of words models, the RNTN accurately captures the sentiment change and scope of negation.

最后,我们使用了一组肯定句和否定句以及它们各自的否定形式来证明,与词汇袋模型不同,RNTN准确地捕捉了情绪变化和否定的范围。

RNTNs also learn that sentiment of phrases following the contrastive conjunction ‘but' dominates.

RNTNs还了解到,短语在对比连接词but之后的感情占主导地位。

The complete training and testing code, a live demo and the Stanford Sentiment Treebank dataset are available at http://nlp.stanford.edu/ sentiment.

完整的训练和测试代码、现场演示和斯坦福情感数据库可以在http://nlp.stanford.edu/ Sentiment上找到。

This work is connected to five different areas of NLP research, each with their own large amount of related work to which we cannot do full justice given space constraints.

这项工作与5个不同的NLP研究领域相关,每个领域都有大量的相关工作,但由于空间限制,我们无法完全公正地对待这些工作。

Semantic Vector Spaces. The dominant approach in semantic vector spaces uses distributional similarities of single words.

语义向量空间。语义向量空间的主要方法是利用单个词的分布相似性。

Often, co-occurrence statistics of a word and its context are used to describe each word (Turney and Pantel, 2010; Baroni and Lenci, 2010), such as tf-idf.

通常,一个单词及其上下文的共现统计数据被用来描述每个单词(Turney和Pantel, 2010; Baroni和Lenci, 2010),例如tf-idf。

Variants of this idea use more complex frequencies such as how often a word appears in a certain syntactic context (Pado and Lapata, 2007; Erk and Pad6, 2008).

这种想法的变体使用更复杂的频率,如一个词在特定的句法环境中出现的频率(Pado和Lapata, 2007;Erk和Pad6, 2008)。

However, distributional vectors often do not properly capture the differences in antonyms since those often have similar contexts.

然而,分布向量往往不能很好地捕捉反义词之间的差异,因为它们通常具有相似的上下文。

One possibility to remedy this is to use neural word vectors (Bengio et al., 2003).

一种可能的补救方法是使用神经词向量(Bengio et al., 2003)。

These vectors can be trained in an unsupervised fashion to capture distributional similarities (Collobert and Weston, 2008; Huang et al., 2012) but then also be fine-tuned and trained to specific tasks such as sentiment detection (Socher et al., 2011b).

这些向量可以在无监督的方式下训练,以捕获分布相似性(Collobert和Weston, 2008;(Huang et al.,2012),但也会针对特定的任务进行微调和培训,如情绪检测(Socher et al.,2011b)。

The models in this paper can use purely supervised word representations learned entirely on the new corpus.

本文的模型完全可以使用在新语料库上学习到的纯监督词表示。

Compositionality in Vector Spaces. Most of the compositionality algorithms and related datasets capture two word compositions.

向量空间的合成性。大多数的合成算法和相关的数据集捕获两个词的合成。

Mitchell and La- pata (2010) use e.g. two-word phrases and analyze similarities computed by vector addition, multiplication and others.

Mitchell和La- pata(2010)使用例如两个单词的短语,并分析由向量加法、乘法等计算出来的相似性。

Some related models such as holographic reduced representations (Plate, 1995), quantum logic (Widdows, 2008), discrete-continuous models (Clark and Pulman, 2007) and the recent compositional matrix space model (Rudolph and Giesbrecht, 2010) have not been experimentally validated on larger corpora.

一些相关的模型如全息简化表示(Plate, 1995)、量子逻辑(Widdows, 2008)、离散-连续模型(Clark and Pulman, 2007)和最近的合成矩阵空间模型(Rudolph and Giesbrecht, 2010)还没有在更大的语料库上进行实验验证。

Yessenalina and Cardie (2011) compute matrix representations for longer phrases and define composition as matrix multiplication, and also evaluate on sentiment.

Yessenalina和Cardie(2011)计算较长短语的矩阵表示,并将组合定义为矩阵乘法,同时根据情感进行评估。

Grefen- stette and Sadrzadeh (2011) analyze subject-verbobject triplets and find a matrix-based categorical model to correlate well with human judgments.

Grefen- stette和Sadrzadeh(2011)分析了主语-动词-宾语三胞胎,并发现了一个基于矩阵的分类模型,该模型与人类的判断密切相关。

We compare to the recent line of work on supervised compositional models. In particular we will describe and experimentally

我们比较了最近关于监督合成模型的工作。特别地,我们将描述和实验比较我们的新RNTN模型与递归神经网络(RNN) (Socher et al.,2011b)和矩阵向量RNNs (Socher et al., 2012),这两者都已应用于语料包。

Logical Form. A related field that tackles compositionality from a very different angle is that of trying to map sentences to logical form (Zettlemoyer and Collins, 2005).

逻辑形式。从一个非常不同的角度处理构成的相关领域是试图将句子映射成逻辑形式(Zettlemoyer和Collins, 2005)。

While these models are highly interesting and work well in closed domains and on discrete sets, they could only capture sentiment distributions using separate mechanisms beyond the currently used logical forms.

虽然这些模型非常有趣,而且在封闭域和离散集上都能很好地工作,但它们只能使用当前使用的逻辑形式之外的单独机制来捕获情绪分布。

Deep Learning. Apart from the above mentioned work on RNNs, several compositionality ideas related to neural networks have been discussed by Bot- tou (2011) and Hinton (1990) and Arstmodels such as Recursive Auto-associative memories been experimented with by Pollack (1990).

深度学习。除了上述关于RNNs的工作外,Bot- tou(2011)和Hinton(1990)还讨论了几个与神经网络相关的合成思想,Pollack(1990)还对递归自联想记忆等arstmodel进行了实验。

The idea to relate nh by a tensor have been proposed for relation classification (Sutskever et al., 2009; Jenatton et al., 2012), extending Restricted Boltzmann machines (Ranzato and Hinton, 2010) and as a special layer for speech recognition (Yu et al., 2012).

通过张量将nh联系起来的思想已经被提出用于关系分类(Sutskever et al.,2009; (Jenatton et al., 2012),扩展受限玻尔兹曼机(Ranzato and Hinton, 2010),作为语音识别的特殊层(Yu et al., 2012)。

Sentiment Analysis. Apart from the above mentioned work, most approaches in sentiment analysis use bag of words representations (Pang and Lee, 2008).

情绪分析。除了上述工作外,情感分析的方法大多采用袋式的词汇表征(Pang和Lee,2008)。

eviews in more detail by analyzing the sentiment of multiple aspects of restaurants, such as food or atmosphere.

通过分析餐厅的食物或氛围等多个方面的情感进行更详细的评论)。

Several works have explored sentiment compositionality through careful engineering of features or polarity shifting rules on syntactic structures (Polanyi and Zaenen, 2006; Moilanen and Pulman, 2007; Rentoumi etal., 2010; Nakagawa etal., 2010).

有几部作品通过仔细研究句法结构的特征或极性转换规则来探索情感构成(Polanyi和Zaenen, 2006;Moilanen和Pulman, 2007;Rentoumi,2010;Nakagawa等等。,2010)。

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!