从说话人识别demo开始学习kaldi--(6)

心已入冬 提交于 2019-12-11 20:38:34

完整步骤在这里:
https://github.com/kaldi-asr/kaldi/blob/master/egs/aishell/v1/run.sh
下面是从训练对角矩阵开始的

sid/train_diag_ubm.sh --cmd "$train_cmd" --num-threads 16 data/dev 1024 exp/diag_ubm_1024

现在我们使用dev的数据来训练一个对角ubm
除了必要的参数设置,data/dev文件夹下面一定要有的是feats.scp,vad.scp
最后生成的是一个exp/diag_ubm_1024/final.dubm
下面是过程中出现的:

sid/train_diag_ubm.sh --cmd run.pl --num-threads 16 data/dev 1024 exp/diag_ubm_1024
sid/train_diag_ubm.sh: initializing model from E-M in memory, 
sid/train_diag_ubm.sh: starting from 512 Gaussians, reaching 1024;
sid/train_diag_ubm.sh: for 20 iterations, using at most 500000 frames of data
Getting Gaussian-selection info
sid/train_diag_ubm.sh: will train for 4 iterations, in parallel over
sid/train_diag_ubm.sh: 4 machines, parallelized with 'run.pl'
sid/train_diag_ubm.sh: Training pass 0
sid/train_diag_ubm.sh: Training pass 1
sid/train_diag_ubm.sh: Training pass 2
sid/train_diag_ubm.sh: Training pass 3

16个线程,JOB数是4,只需要迭代4次,最多使用50万帧等等,都是提前设置好的参数,可以改的。但是train_diag_ubm.sh 里面的逻辑还是没有看明白,还需要以后再看。

可以用下面命令查看生成的dubm文件
/data/kaldi/src/gmmbin/gmm-global-copy --binary=false final.dubm final_dubm.txt
(当前目录是final.dubm所在的文件夹哦)
vi 打开后可以看到gconst,weights,means_invvars都是什么,
GMM模型中这些参数都是什么,可以参考:
http://notes.funcwj.cn/2017/05/28/kaldi-gmm/

sid/train_full_ubm.sh --cmd "$train_cmd" data/dev exp/diag_ubm_1024 exp/full_ubm_1024

现在我们使用dev的数据来训练一个full-ubm,前面已经训练好了对角的ubm
除了必要的参数设置,data/dev文件夹下面一定要有的是feats.scp,vad.scp。
exp/diag_ubm_1024文件夹中一定要有final.ubm or final.dubm
最后生成的结果是exp/full_ubm_1024/final.ubm

可以用下面命令查看生成的ubm文件
/data/kaldi/src/fgmmbin/fgmm-global-copy --binary=false final.ubm final_ubm.txt
(当前目录是final.ubm所在的文件夹哦)
vi 打开后可以看到gconst,weights,means_invvars都是什么,
GMM模型中这些参数都是什么,可以参考:
http://notes.funcwj.cn/2017/05/28/kaldi-gmm/

sid/train_ivector_extractor.sh  --num-iters 5 exp/full_ubm_1024/final.ubm data/dev  exp/extractor_1024

这一步需要之前生成的ubm文件和feats.scp文件,生成的文件是exp/extractor_1024/final.ie,有的文章说这个final.ie就是T矩阵,不知道对不对

exp/extractor_1024文件夹里还有个5.ie,我感觉这个final.ie就是5.ie的软链接,它俩大小是一模一样的,过程中的显示如下:
在这里插入图片描述
4.

sid/extract_ivectors.sh exp/extractor_1024 data/dev exp/ivector_train_1024

输入主要是这三个文件:final.ie ,final.ubm ,feats.scp
生成的文件主要是:ivector.scp,num_utts.ark,spk_ivector.scp
还有log文件

ivector.scp是concat得来的,因为设置了nj=30,所以它是由30个小文件合并的
最后的ivector.scp是经过ivector-normalize-length生成的
num_utts.ark是经过ivector-mean生成的
spk_ivector.scp是经过ivector-normalize-length生成的

vi ivector.scp可以看到前几行是这样的,共有14329行

BAC009S0724W0121 exp/ivector_train_1024/ivector.1.ark:17
BAC009S0724W0122 exp/ivector_train_1024/ivector.1.ark:4092
BAC009S0724W0123 exp/ivector_train_1024/ivector.1.ark:8170
BAC009S0724W0124 exp/ivector_train_1024/ivector.1.ark:12227
BAC009S0724W0125 exp/ivector_train_1024/ivector.1.ark:16311

vi spk_ivector.scp可以看到前几行是这样的,共有40行,我们的dev数据中确实只有40个说话人

S0724 exp/ivector_train_1024/spk_ivector.ark:6
S0725 exp/ivector_train_1024/spk_ivector.ark:1622
S0726 exp/ivector_train_1024/spk_ivector.ark:3238
S0727 exp/ivector_train_1024/spk_ivector.ark:4854
S0728 exp/ivector_train_1024/spk_ivector.ark:6470
S0729 exp/ivector_train_1024/spk_ivector.ark:8086

我想把num_utts.ark和spk_ivector.ark都转换成txt文件看一看,但是下面的命令都出错了,
/data/kaldi/src/featbin/copy-feats ark:num_utts.ark ark,t:num_utts.txt

Failed to read matrix from stream.  : Expected "[", got "354" File position at start is 6, currently 9

/data/kaldi/src/featbin/copy-feats ark:spk_ivector.ark ark,t:spk_ivector.txt

 Failed to read matrix from stream.  : Expected token FM, got FV File position at start is 8, currently 11

看上去好像错误都一样,但是不知道怎么改。
经人指点:
mfcc是矩阵,用copy-feats没问题,spk_vector这些是向量,你要用copy-vector

原来还是用的工具不对。
用下面命令转换spk_ivector.ark成功
/data/kaldi/src/bin/copy-vector ark:spk_ivector.ark ark,t:spk_ivector.txt

vi spk_ivector.txt可以看到有40行,每行是一个说话人的i-vector向量

S0724  [ -1.494889 4.191723 -2.065231 6.424363 -0.568639 1.312023 3.608217 -4.009121 0.1447516 -2.923756 -0.5574226 1.440925 -3.323975 -2.562852 3.096631 -0.2247497 0.4704453 1.913887 -1.113701 -1.201724 0.6903941 1.167155 -0.8199453 3.009851 -2.01485 3.375238 4.916361 -1.110913 2.647804 -0.6668168 -2.410867 -4.51026 -7.304629 2.184405 3.194905 -1.498284 0.553113 3.858151 -1.511438 -4.013847 -2.028534 1.272021 -0.1734654 0.9513246 3.398404 0.4707038 -2.392773 -2.98877 0.9945849 -0.3856373 -1.920934 -1.049368 -1.160322 0.6684188 0.5707002 0.1175598 0.9667767 -1.02162 0.653752 -0.2854674 0.2077845 0.1745595 -0.7070389 -0.1560595 0.1427419 -0.4050522 -0.1530222 -0.1362765 0.06851027 0.1643819 0.2111228 0.1814216 -0.6835693 -1.014317 0.4377705 1.108795 0.03665004 -0.6145264 0.05215325 0.5097237 -0.4897903 -0.08710022 -1.574562 0.6131901 0.5488392 -0.1153821 0.05639264 0.4592863 0.2603703 -0.4789504 -0.414952 -0.6252485 0.1550305 -0.4125877 0.01622202 -0.2704884 0.5788739 -0.2037064 -0.5070553 -0.5928639 0.04789318 -0.4258216 0.228256 0.5971364 -0.06404956 -0.03200082 0.1308726 0.1356846 -0.2531437 0.1886812 0.1081557 -0.4833564 -0.678308 -0.04845379 0.09927486 -0.9495237 -0.2245603 -0.3165013 0.02694877 0.2020769 0.4898948 -0.3703753 0.2678764 -0.1782853 -0.007617008 0.3214126 -0.1696634 -0.01588814 0.6087376 0.1138676 0.06505909 -0.01939752 -0.1003009 -0.07192703 -0.2716809 0.2042752 0.03706297 0.006570957 0.04687546 -0.1889757 0.04962946 -0.4122401 4.747756e-05 0.1157316 0.3687924 -0.4359643 0.2279454 -0.2779478 -0.1074038 0.008337801 0.1925775 0.3197043 0.1443411 -0.1369099 0.8319659 -0.08869965 -0.07899147 -0.4317599 -0.2138949 0.3487202 0.07691551 -0.2397741 -0.3202068 -0.3259673 0.2402112 -0.1903881 -0.250883 0.2644596 0.3829333 0.3844469 0.0740787 0.0932843 0.3846858 -0.3141307 0.5252094 -0.1991133 0.2100554 -0.02659893 -0.3463546 0.2528807 0.2810422 -0.2118971 0.005206075 -0.08887724 -0.1757096 -0.1740363 0.1263264 -0.1046663 -0.1455405 0.1333608 -0.2458466 0.05562488 0.1959593 -0.1207889 0.103675 0.01059854 0.4145905 -0.1224965 0.2875637 0.1132103 0.4379716 -0.3254119 0.2134016 -0.1532089 -0.1043074 -0.04225086 -0.09411705 -0.3552141 0.1115215 -0.09294089 -0.01204276 -0.1291037 -0.08631192 0.09826779 -0.1650585 -0.2444597 0.185434 0.0730179 0.03874771 0.6316848 -0.1259742 -0.01365566 0.09506631 0.160343 -0.1040568 -0.06245096 0.0087858 -0.1986545 -0.02488568 -0.2329771 0.2102711 0.1532938 -0.08635561 -0.170257 -0.2415882 0.1320202 0.06303477 0.2031551 0.333845 -0.2798648 -0.1150485 0.1489522 -0.1097059 0.2288545 0.07462427 0.2448708 -0.08298375 0.07950476 -0.07130237 -0.2022092 -0.03802164 0.03048396 0.251083 0.2141706 -0.02539639 0.0585734 0.2051167 0.3220056 -0.1617552 -0.2211852 0.4313145 -0.02947356 0.04032288 -0.06955115 0.2292648 -0.04206774 -0.4483271 -0.03293414 0.009052359 0.02929015 -0.4798698 0.02724461 -0.1812279 -0.01284112 -0.2944111 0.05225592 -0.06088836 -0.2536156 0.2747388 0.2773945 0.3572772 0.1332619 0.0728696 -0.002162603 -0.2960485 0.3648685 -0.1643449 0.1382677 0.1315332 -0.2055638 0.07308404 0.0775911 0.05101816 -0.1609955 -0.03998369 0.3833353 0.0762384 0.2259601 -0.3533331 0.1723336 -0.1273794 -0.0579498 -0.04028374 -0.2807759 -0.02513879 0.2404649 0.1638513 -0.2971145 0.01142092 -0.1097506 -0.3540099 -0.2323555 -0.151731 0.02238801 -0.1644489 0.2464665 0.01209402 0.07488076 -0.008290177 -0.1148907 -0.1289323 0.2015132 0.1364088 -0.001538895 -0.1333063 0.1076715 0.1780327 -0.1260636 0.3358147 -0.1411725 0.1081913 0.2306425 0.03675358 -0.2819468 0.1171324 0.00843475 0.1094012 -0.1817945 0.09073194 0.2270119 -0.03117535 -0.1676576 -0.2841377 -0.2444807 -0.1583724 0.09714724 0.1997346 -0.1385091 0.02599241 0.2083211 -0.09906653 0.1285685 0.2275843 0.272047 0.08569031 -0.02492276 -0.09767421 0.07894547 -0.1982982 -0.1328524 -0.01622497 -0.03954167 0.06320029 -0.01101624 -0.222086 0.1233027 -0.09254745 -0.2186535 -0.1013665 -0.0359985 0.1449545 0.02803968 -0.02912375 0.1636554 -0.04126236 0.1816033 -0.02329022 0.1719275 0.05906342 0.004998843 -0.08574011 0.1626289 -0.132733 0.2194036 0.2026509 -0.1516984 0.3285471 -0.01439776 0.1683345 -0.1854422 0.02448971 -0.06717207 0.1345156 -0.04313087 -0.3431467 -0.1976715 -0.1164039 0.3548833 -0.05123337 0.205405 ]
S0725  [ 0.8041796 5.94068 -0.8887277 6.294258 -2.406076 -2.882549 3.511944 -0.2957812 -2.881301 -2.190311 -0.08636046 -2.519117 0.8694347 2.786109 0.1098586 0.8748216 0.1324227 0.6397336 0.9138678 -3.644598 -2.416142 3.535862 1.579952 1.184159 -1.537188 1.407604 2.370275 -5.242565 -2.816892 -2.607595 0.4930034 2.480039 1.271463 -1.900119 2.85736 0.7579505 0.5555257 -5.944658 -3.871988 -0.3516358 -4.378802 0.9084898 -2.910641 -1.331581 -2.092254 -3.686688 1.488191 5.053768 0.6904211 1.002361 -0.6862143 0.5013492 -1.48068 -0.7957741 0.3480401 0.2529564 -0.6769273 0.5072885 0.6042187 0.03246704 -1.616165 -0.003808041 0.6318277 -0.07449961 -0.6052511 -0.1681036 1.178499 0.5140572 0.07924429 0.09277374 -0.04609898 -0.473026 0.821262 0.7670045 0.8529429 -0.08135009 0.2607883 0.5047837 -0.6034142 0.4513271 0.0432246 0.4231757 0.4001434 0.5678196 0.5255061 -0.03991098 -0.1794385 -0.6687717 0.4597293 -0.2335239 -0.4026968 -0.2005372 -0.2745363 -0.2405087 0.1719116 0.2940277 -0.2869862 -0.1807654 -0.06652585 0.5426919 0.1690021 -0.3698992 -0.257915 -0.4971139 0.05586409 -0.005944984 0.1588144 0.05333629 0.3704939 -0.6044451 -0.04667811 -0.5552963 0.4823722 0.1958621 -0.07260018 0.3352361 -0.5052102 -0.02079422 0.4317544 -0.1032136 -0.5556011 0.09207463 0.2045453 0.1224866 -0.1346444 0.3177164 0.130081 -0.1433135 0.03923687 -0.2099407 0.1972057 -0.2211918 -0.289142 0.1196337 0.09689581 -0.1696378 -0.2859829 -0.7996486 0.0428399 0.3895903 -0.108304 0.01645086 0.5325287 0.1354358 -0.2319508 0.2415285 -0.1153102 0.1938278 -0.325315 -0.116042 -0.3992225 0.2280821 -0.5746618 0.06443445 -0.2182213 0.3069575 0.4022468 0.2219922 0.4018189 -0.02602194 0.2421206 0.1445634 0.09558339 0.007263102 -0.344185 -0.6733993 0.0003353585 0.2169537 0.03506657 -0.5429401 0.1469321 0.1200037 -0.2797755 0.1757131 -0.3604022 0.3293722 -0.3214694 0.01938171 0.4461701 0.02956007 -0.4175106 0.002300848 0.527434 0.283853 0.4642438 0.2481068 0.231111 -0.002678571 0.08016433 0.1523745 -0.02990462 -0.07880116 -0.103456 -0.1423509 0.2529326 0.2251648 -0.2896209 -0.0295428 -0.1072371 -0.2268201 -0.3058139 0.1187028 -0.1205553 -0.1229838 -0.07447968 0.1943579 -0.1342085 -0.01923033 0.02637444 -0.493792 0.0433018 0.397303 0.1019525 0.005464092 0.07126387 -0.1023972 0.05623214 0.02288585 0.06535462 -0.0354303 -0.4313107 -0.3371484 -0.09296048 -0.06355966 -0.03899284 0.1582913 0.09922784 0.2149033 0.139515 0.1481672 0.2223821 0.3337491 -0.116579 -0.07792707 0.2245804 -0.1954061 -0.03152033 0.006412492 -0.2759017 -0.09991004 0.04058399 -0.2376727 -0.111545 -0.06499203 -0.1128237 0.04170181 0.07870664 0.2807561 -0.2407328 0.06216154 0.1232609 0.03881149 -0.1409105 -0.6106396 0.2001712 0.05155772 -0.2086526 0.1649832 -0.1136075 -0.0101153 -0.07759444 0.2446389 -0.3371444 0.1467948 -0.1506406 0.1093872 0.2175004 0.0786595 0.296654 0.09387305 0.277629 -0.1875827 0.4124308 0.0344426 0.1127761 -0.06170657 0.3204251 -0.05337205 0.1880175 0.3471883 -0.4033144 -0.1266061 0.1426925 -0.02332265 -0.1607591 -0.1535366 0.2280417 0.04479303 0.02889769 -0.1088656 0.1178713 -0.05752108 -0.4668415 0.001221162 -0.1810047 0.05586271 -0.08012377 0.02761767 0.2970267 -0.09274887 0.2588959 0.08028596 0.1634502 -0.02121091 -0.1524414 -0.2171274 -0.08831488 -0.06643203 0.1489908 0.1274579 0.1265518 0.03492879 -0.2730893 -0.1239263 0.1891675 -0.3137865 0.2507259 -0.1369135 0.1598279 0.2403593 0.2587933 0.1178048 0.0457137 -0.307166 -0.1206572 0.05193166 -0.2492081 0.3415496 0.1707307 -0.2951979 -0.1114113 -0.218349 0.050657 -0.1909794 -0.1417466 0.1768075 -0.1277354 -0.2174968 -0.06037929 -0.3430539 0.002913235 0.0684198 0.1796879 0.01486284 0.08621324 -0.008714312 -0.1153829 0.3616968 -0.2983994 -0.4504571 -0.1443657 -0.3296406 0.2083447 -0.08574943 -0.1386682 0.1122162 0.07711552 -0.2683686 -0.1106308 0.02889133 -0.1343029 -0.07115016 0.1620745 -0.01212995 -0.01165778 0.1374442 0.07343705 -0.04005316 -0.003502321 0.01970092 -0.1294793 -0.08542334 0.05443333 0.1664351 0.1160903 -0.1575787 -0.06139877 -0.1823147 -0.103742 -0.04204597 -0.1111197 -0.0933818 -0.3988104 -0.2606748 -0.01516879 0.1474117 -0.02395137 0.2461554 -0.1507015 -0.1248211 0.001748668 -0.2032605 -0.06010894 0.121857 0.2702911 0.04425021 0.06294047 -0.2531649 0.03569878 0.1902132 ]
S0726  [ -1.340571 4.691498 -2.993285 1.920235 4.008678 3.785384 -6.476716 1.654098 -1.074844 5.548283 -0.5517797 -1.565583 1.563615 -0.02729667 4.343427 2.126348 -0.7766336 0.2725935 -1.031017 2.987354 -0.3090956 -1.312063 -4.198803 -2.141245 -3.696198 0.1116055 1.017338 -2.562538 3.494059 -2.867281 -0.5113228 -2.200865 -1.312422 2.250089 -2.675797 2.540892 1.835521 -2.393276 4.312166 -0.2578609 1.963957 -3.104418 -4.946022 -2.559365 0.3120093 -2.87882 2.213581 0.7689951 -0.2024663 1.003667 0.4563357 2.278036 -0.2433449 -0.4259536 -0.8010104 -1.596963 1.230152 -0.6071995 -1.24744 1.181948 1.087675 -0.01530222 -0.828462 -0.2775151 1.776297 -0.104835 -0.09839502 0.2182278 -0.4824146 -0.9133016 0.2472073 0.02435649 0.2804052 -0.8097093 -0.1962302 0.5421208 0.7430857 0.1610352 0.8629059 0.08666219 0.3733602 0.1901667 -0.2630891 0.08525112 1.067609 -0.07115719 -0.5636137 0.6031322 -0.5179823 -0.115532 0.2603432 0.5440969 0.005699251 -0.04578539 -0.1939033 0.3821617 -0.2515148 0.1686436 0.182062 0.2545769 -0.6174753 0.1867031 0.02933959 0.4289897 0.2867047 -0.04903397 0.1971467 -0.1124311 -0.5184212 -0.09997346 -0.152709 0.02524968 -0.4159936 -0.1691817 0.195645 0.210189 -0.0702686 0.1089707 0.3233504 -0.08706411 -0.1153426 0.2903626 0.1779463 -0.08237841 -0.310904 -0.1256424 0.7416076 0.06467074 -0.8405453 -0.4117872 0.09906352 -0.00331719 0.1452024 -0.08190262 -0.1068315 -0.1353912 0.02532914 0.3162935 0.4996234 0.08206555 -0.08366828 0.01807739 -0.05346161 -0.1660512 -0.01636232 0.1951329 0.02820286 0.1838468 0.1029802 0.01625567 -0.1224076 -0.08937152 0.4477914 -0.1005422 0.1601439 -0.08400351 -0.3638651 -0.01489335 0.1085772 0.285939 0.07375979 0.06779338 -0.05135624 -0.3099602 -0.02534228 0.3333646 -0.1332639 -0.2098357 -0.2710572 -0.2629967 -0.1089047 0.1083974 0.06158936 -0.2756806 -0.05032137 0.06327216 -0.04338876 0.06196227 0.108547 -0.3900363 -0.1897604 -0.217908 0.04078824 -0.1405389 0.1902128 0.03140945 -0.05359526 -0.4618537 0.2392075 0.07328679 -0.1651965 0.1214406 -0.3249526 0.005201185 -0.1688347 0.6247766 -0.004709977 0.2219362 0.05677909 0.09456781 -0.1208571 -0.091753 0.01673936 -0.09552711 -0.07130214 -0.4672336 -0.2194615 0.4854264 -0.09539215 0.354326 0.1747648 -0.3306045 -0.1513683 -0.0997763 -0.3555816 0.08428674 0.3245367 -0.2413439 0.2424032 0.2561027 0.06659213 -0.06164191 -0.3369367 0.07145388 -0.2504554 0.1533228 0.07741093 0.06212189 -0.09227102 -0.1335045 -0.1398007 -0.04935808 0.1178666 0.09204834 -0.06958939 0.07082949 -0.001988559 -0.3136062 -0.3910876 -0.3533334 0.1111924 0.02602771 0.1195863 -0.05434574 0.06413475 -0.3141328 0.1001284 -0.2232476 0.398571 0.1928085 -0.02919568 -0.1025953 0.09743194 0.4436168 0.1102299 -0.2582489 0.185295 -0.2110615 0.07914416 0.004930434 -0.3569424 0.22357 -0.07820505 0.1403013 -0.09690109 -0.2313591 0.02116148 0.04271566 0.04106966 0.2722642 -0.1220954 0.01026134 0.02061396 -0.1472459 0.163418 -0.1449603 0.1068102 -0.3402374 -0.2024111 0.1000245 0.09743467 -0.02891107 -0.1421191 0.1008919 0.1994208 -0.4757963 0.01956815 0.129471 -0.07649661 -0.2033038 -0.1620552 0.01348152 -0.08549049 -0.007245569 0.2410901 0.1245441 -0.1624105 0.1819825 -0.1169036 -0.5291017 0.5396571 -0.002323939 0.07588347 -0.168654 -0.2573726 -0.07254156 -0.3947659 -0.06828147 0.2263659 0.09561229 -0.01351237 0.02564162 -0.08341207 0.3699638 -0.1186806 -0.01459215 0.1815215 0.09441543 -0.2547249 -0.008088351 0.08027849 0.04729156 0.1431286 0.05851899 -0.04010045 0.3495248 -0.05698694 0.1876737 -0.009963759 0.04271591 -0.05629878 0.03008218 0.08714964 0.1841383 0.4573995 -0.1442645 0.1620999 -0.3055896 -0.1940916 -0.1181043 -0.2900674 -0.1641857 -0.06665977 -0.08350671 -0.1783914 0.1549092 0.169354 0.2380639 -0.2681069 -0.2300841 0.02572402 -0.2237318 -0.0440267 -0.1852065 0.344095 0.01668577 0.2331298 0.171599 0.2066052 -0.06064308 0.4551559 0.07329386 0.2318103 0.1112986 -0.05322633 0.2539975 0.003321742 0.1328607 0.1397749 -0.1392178 -0.08895784 -0.02799051 -0.03951854 0.02249463 0.03079843 0.02243672 0.08000381 0.002154835 0.176899 0.1237146 0.0139229 0.1453949 0.1870653 0.2821339 0.02763713 -0.2077475 0.2478437 -0.1937061 0.02949828 0.2551794 -0.1168287 0.02534366 0.02826694 -0.0979787 -0.1647054 0.05608033 -0.3463041 0.2184189 0.08276559 0.04314472 ]

但是每个说话人的向量不是一样长的,第一个spk是4249列,第二个是4244列,第三个是4260列。

下面命令转换num_utts.ark成功:
/data/kaldi/src/bin/copy-int-vector ark:num_utts.ark ark,t:num_utts.txt
打开txt文件查看,也有40行,前几行是这样的:

S0724 354
S0725 357
S0726 355
S0727 362
S0728 360
S0729 358

其实num_utts.ark是由dev/spk2utt得到的,估计就是统计了一下每个spk总计说了多少句话。经验证dev/S0724文件夹下面确实有354个wav文件。

#train plda
$train_cmd exp/ivector_train_1024/log/plda.log \
  ivector-compute-plda ark:data/dev/spk2utt \
  'ark:ivector-normalize-length scp:exp/ivector_train_1024/ivector.scp  ark:- |' \
  exp/ivector_train_1024/plda

单引号中的ivector.scp在上一步中是经过了ivector-normalize-length的呀,为什么要再来一遍呢?

这一步是训练plda model,主要输入是:dev/spk2utt,ivector.scp,生成的文件是exp/ivector_train_1024/plda,应该是就是plda model了。

这一步具体的逻辑下面博客好像说的很清楚,但我依然完全不懂,只知道PLDA的基本公式,也不知道哪个对应哪个:
https://blog.csdn.net/liusongxiang666/article/details/83024845
https://blog.csdn.net/zjm750617105/article/details/52832295

#split the test to enroll and eval
mkdir -p data/test/enroll data/test/eval
cp data/test/{spk2utt,feats.scp,vad.scp} data/test/enroll
cp data/test/{spk2utt,feats.scp,vad.scp} data/test/eval
local/split_data_enroll_eval.py data/test/utt2spk  data/test/enroll/utt2spk  data/test/eval/utt2spk
trials=data/test/aishell_speaker_ver.lst
local/produce_trials.py data/test/eval/utt2spk $trials
utils/fix_data_dir.sh data/test/enroll
utils/fix_data_dir.sh data/test/eval

对于split_data_enroll_eval.py,内部注释如下,也就是说没个spk只有3句话被用于注册,其它的都被用于eval

# This script splits the test set utt2spk into enroll set and eval set
# For each speaker, 3 utterances are randomly selected as enroll samples,
# and the others are used as eval samples for evaluation
# input: test utt2spk
# output: enroll utt2spk, eval utt2spk

对于produce_trials.py,内部注释如下,

# This script generate trials file.
# Trial file is formatted as:
# uttid spkid target|nontarget

# If uttid belong to spkid, it is marked 'target',
# otherwise is 'nontarget'.
# input: eval set uttspk file
# output: trial file

打开aishell_speaker_ver.lst可以看到,前几行如下,一共有14232行

BAC009S0764W0166 S0764 target
BAC009S0764W0166 S0765 nontarget
BAC009S0764W0166 S0766 nontarget
BAC009S0764W0166 S0767 nontarget

最后两个检查,结果如下:
在这里插入图片描述
7.

#extract enroll ivector
sid/extract_ivectors.sh --cmd "$train_cmd" --nj 10 \
  exp/extractor_1024 data/test/enroll  exp/ivector_enroll_1024
#extract eval ivector
sid/extract_ivectors.sh --cmd "$train_cmd" --nj 10 \
  exp/extractor_1024 data/test/eval  exp/ivector_eval_1024

这里跟第4步是一样的,输入和输出,可直接参考上面第4步。

对于i-vector的提取公式,M=m+T*w
w是我们要求的i-vector
有文章说,final.ie就是T矩阵,是由EM算法得到的
m在final.ubm中,是ubm的均值超矢量
M是由feats.scp和m得到的,

针对每一帧语音,用最大后验概率MAP去自适应当前句子的GMM模型,只更新均值,
然后形成M个分量的GMM,以每个GMM分量的均值矢量串接,就形成了该帧语音的高斯均值超矢量M

不知道对不对。

应该是先算帧级别(frame level)的i-vector,再聚合成句子级别(utterence level)的i-vector,再聚合成说话人级别(spk level)的i-vector。至于怎么聚合,可能是取平均?

中间显示如下:
在这里插入图片描述
8.

#compute plda score
$train_cmd exp/ivector_eval_1024/log/plda_score.log \
  ivector-plda-scoring --num-utts=ark:exp/ivector_enroll_1024/num_utts.ark \
  exp/ivector_train_1024/plda \
  ark:exp/ivector_enroll_1024/spk_ivector.ark \
  "ark:ivector-normalize-length scp:exp/ivector_eval_1024/ivector.scp ark:- |" \
  "cat '$trials' | awk '{print \\\$2, \\\$1}' |" exp/trials_out

把exp/ivector_enroll_1024/num_utts.ark转成TXT,打开是这样的:
只有20行,test数据集也只有20个说话人,每人随机挑3句作为注册语音

S0764 3
S0765 3
S0766 3
S0767 3
S0768 3
S0769 3
S0770 3
S0901 3
S0902 3
S0903 3
S0904 3
S0905 3
S0906 3
S0907 3
S0908 3
S0912 3
S0913 3
S0914 3
S0915 3
S0916 3

打开trials_out,共有14232行,前几行是这个样子:

S0764 BAC009S0764W0166 13.97638
S0765 BAC009S0764W0166 -32.16938
S0766 BAC009S0764W0166 -39.67115
S0767 BAC009S0764W0166 -51.88986
S0768 BAC009S0764W0166 -82.01186
#compute eer
awk '{print $3}' exp/trials_out | paste - $trials | awk '{print $1, $4}' | compute-eer -

# Result
# Scoring against data/test/aishell_speaker_ver.lst
# Equal error rate is 0.140528%, at threshold -12.018

我的结果如下,1.447,它用的是train数据集,我用的是dev数据集
在这里插入图片描述
至于eer是怎么计算的,可以参考下面:
https://blog.csdn.net/zjm750617105/article/details/52558779
https://blog.csdn.net/zjm750617105/article/details/60503253

有关awk的使用方法:
https://blog.csdn.net/mosesmo1989/article/details/51093485

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!