2019 Interspeech speech emotoin recognition paper reading

1. Improved End-to-End Speech Emotion Recognition Using Self Attention Mechanism and Multitask Learning

东京大学
端到端多任务学习with self attention，辅助任务是gender。
首先从语谱图提取特征speech spectrogram，而不是用手工特征。然后CNN-BLSTM E2E网络。随后用self attention mechanism聚焦到情感 salient periods。最后考虑到emotion and gender classification tasks之间的相互特征，结合了性别分类作为附加task，与主要任务emotion classification share有用的信息。
摘要从人机交互应用说明SER has attracted great attention，更有画面感。介绍，分别叙述了特征、语谱图的优越性、HMM GMM SVM等traditional machine learning approaches, CNN RNN traditional machine learning approaches。
multi-headed self attention
提取语谱图：长度归一化到7.5s，不足的补零，长的cut。Hanning windows 800。sampling rate 16000Hz.
短时傅里叶变换
$\alpha$ 和 $\beta$ 是1

IEMOCAP combine EXCITED and HAPPY into HAPPY 四类一共5531samples。
在这里插入图片描述
实验结果对比有5-fold cross-validation（2018），也有leave-one-session-out。

在这里插入图片描述

“Attention is all you need”2017 Available
based on an encoder-decoder structure that 没有使用任何 recurrence, but instead uses weighted
correlations between the elements of the input sequence
Transformer：把input sequence映射成a query, a key and a value
介绍了各种attention。
提出了 a global windowing system that works works on top of the local windows.
classification and regression.

5 fold cross validation.
因为happy少，换成了excited，这样balance。不知道这老哥比较的对不对，[2]也是excited 5折吗？
在这里插入图片描述

两个库 CASIA IEMOCAP
在这里插入图片描述

在这里插入图片描述

来源：CSDN

作者：wangdapang_2

链接：https://blog.csdn.net/qq_38221026/article/details/103887135

标签