任务 |
描述 |
corpus/dataset |
评价指标 |
SOTA 结果 |
Papers |
Chunking |
组块分析 |
Penn Treebank |
F1 |
95.77 |
A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks |
Common sense reasoning |
常识推理 |
Event2Mind |
cross-entropy |
4.22 |
Event2Mind: Commonsense Inference on Events, Intents, and Reactions |
Parsing |
句法分析 |
Penn Treebank |
F1 |
95.13 |
Constituency Parsing with a Self-Attentive Encoder |
Coreference resolution |
指代消解 |
CoNLL 2012 |
average F1 |
73 |
Higher-order Coreference Resolution with Coarse-to-fine Inference |
Dependency parsing |
依存句法分析 |
Penn Treebank |
POS UAS LAS |
97.3 95.44 93.76 |
Deep Biaffine Attention for Neural Dependency Parsing |
Task-Oriented Dialogue/Intent Detection |
任务型对话/意图识别 |
ATIS/Snips |
accuracy |
94.1 97.0 |
Slot-Gated Modeling for Joint Slot Filling and Intent Prediction |
Task-Oriented Dialogue/Slot Filling |
任务型对话/槽填充 |
ATIS/Snips |
F1 |
95.2 88.8 |
Slot-Gated Modeling for Joint Slot Filling and Intent Prediction |
Task-Oriented Dialogue/Dialogue State Tracking |
任务型对话/状态追踪 |
DSTC2 |
Area Food Price Joint |
90 84 92 72 |
Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems |
Domain adaptation |
领域适配 |
Multi-Domain Sentiment Dataset |
average accuracy |
79.15 |
Strong Baselines for Neural Semi-supervised Learning under Domain Shift |
Entity Linking |
实体链接 |
AIDA CoNLL-YAGO |
Micro-F1-strong Macro-F1-strong |
86.6 89.4 |
End-to-End Neural Entity Linking |
Information Extraction |
信息抽取 |
ReVerb45K |
Precision Recall F1 |
62.7 84.4 81.9 |
CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information |
Grammatical Error Correction |
语法错误纠正 |
JFLEG |
GLEU |
61.5 |
Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation |
Language modeling |
语言模型 |
Penn Treebank |
Validation perplexity Test perplexity |
48.33 47.69 |
Breaking the Softmax Bottleneck: A High-Rank RNN Language Model |
Lexical Normalization |
词汇规范化 |
LexNorm2015 |
F1 Precision Recall |
86.39 93.53 80.26 |
MoNoise: Modeling Noise Using a Modular Normalization System |
Machine translation |
机器翻译 |
WMT 2014 EN-DE |
BLEU |
35.0 |
Understanding Back-Translation at Scale |
Multimodal Emotion Recognition |
多模态情感识别 |
IEMOCAP |
Accuracy |
76.5 |
Multimodal Sentiment Analysis using Hierarchical Fusion with Context Modeling |
Multimodal Metaphor Recognition |
多模态隐喻识别 |
verb-noun pairs adjective-noun pairs |
F1 |
0.75 0.79 |
Black Holes and White Rabbits: Metaphor Identification with Visual Features |
Multimodal Sentiment Analysis |
多模态情感分析 |
MOSI |
Accuracy |
80.3 |
Context-Dependent Sentiment Analysis in User-Generated Videos |
Named entity recognition |
命名实体识别 |
CoNLL 2003 |
F1 |
93.09 |
Contextual String Embeddings for Sequence Labeling |
Natural language inference |
自然语言推理 |
SciTail |
Accuracy |
88.3 |
Improving Language Understanding by Generative Pre-Training |
Part-of-speech tagging |
词性标注 |
Penn Treebank |
Accuracy |
97.96 |
Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Token Encodings |
Question answering |
问答 |
CliCR |
F1 |
33.9 |
CliCR: A Dataset of Clinical Case Reports for Machine Reading Comprehension |
Word segmentation |
分词 |
VLSP 2013 |
F1 |
97.90 |
A Fast and Accurate Vietnamese Word Segmenter |
Word Sense Disambiguation |
词义消歧 |
SemEval 2015 |
F1 |
67.1 |
Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison |
Text classification |
文本分类 |
AG News |
Error rate |
5.01 |
Universal Language Model Fine-tuning for Text Classification |
Summarization |
摘要 |
Gigaword |
ROUGE-1 ROUGE-2 ROUGE-L |
37.04 19.03 34.46 |
Retrieve, Rerank and Rewrite: Soft Template Based Neural Summarization |
Sentiment analysis |
情感分析 |
IMDb |
Accuracy |
95.4 |
Universal Language Model Fine-tuning for Text Classification |
Semantic role labeling |
语义角色标注 |
OntoNotes |
F1 |
85.5 |
Jointly Predicting Predicates and Arguments in Neural Semantic Role Labeling |
Semantic parsing |
语义解析 |
LDC2014T12 |
F1 Newswire F1 Full |
0.71 0.66 |
AMR Parsing with an Incremental Joint Model |
Semantic textual similarity |
语义文本相似度 |
SentEval |
MRPC SICK-R SICK-E STS |
78.6/84.4 0.888 87.8 78.9/78.6 |
Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning |
Relationship Extraction |
关系抽取 |
New York Times Corpus |
P@10% P@30% |
73.6 59.5 |
RESIDE: Improving Distantly-Supervised Neural Relation Extraction using Side Information |
Relation Prediction |
关系预测 |
WN18RR |
H@10 H@1 MRR |
59.02 45.37 49.83 |
Predicting Semantic Relations using Global Graph Properties |
来源:oschina
链接:https://my.oschina.net/u/4316091/blog/4270383