Accuracy research using neural networks for speaker idenfitification

Laurynas Dovydaitis; Vytautas Rudžionis

doi:10.3846/mla.2018.3464

DOI: https://doi.org/10.3846/mla.2018.3464

Abstract

In this paper we present results on speaker identification task by using neural networks for acoustic modelling. Article is structured by describing speaker identification workflow, later identifying specific steps needed for speaker identification. Afterwards we identify number of different neural network configurations which can be used for speaker identification.

Article in Lithuanian.

Diktoriaus identifikavimo tikslumo tyrimas naudojant neuroninius tinklus

Santrauka

Šiame straipsnyje nagrinėjami kai kurie diktoriaus identifikavimo problemos aspektai. Problemos aktualumas nulemtas praktinių galimybių suteikti adaptuotas paslaugas konkrečiam asmeniui, žinant jo tapatybę. Straipsnyje aprašoma diktoriaus identifikavimo veiksmų seka ir išskiriami identifikavimo etapai. Apžvelgiami moksliniai akustinių modelių kūrimo darbai pasitelkiant neuroninius tinklus. Šiame straipsnyje siūlomos kelios neuroninių tinklų konfigūracijos, kurios gali būti naudojamos diktoriaus akustiniam modeliavimui. Teikiami pasiūlymai eksperimentiniu būdu patikrinami fiksuojant gaunamą diktoriaus identifikavimo tikslumą su LIEPA projekto metu surinktu garsynu.

Reikšminiai žodžiai: diktoriaus identifikavimas, neuroniniai tinklai, GRU, BGRU, LSTM, BLSTM, MFCC.

Keyword : speaker identification, neural nets, GRU, BGRU, LSTM, BLSTM, MFCC

How to Cite

Dovydaitis, L., & Rudžionis, V. (2018). Accuracy research using neural networks for speaker idenfitification. Mokslas – Lietuvos Ateitis / Science – Future of Lithuania, 10. https://doi.org/10.3846/mla.2018.3464

Published in Issue

Oct 9, 2018

Abstract Views

775

PDF Downloads

677

This work is licensed under a Creative Commons Attribution 4.0 International License.

References

Amodei, D., Ananthanarayanan, S., & et al. (2015). Deep speech 2: end-to-end speech recognition in English and Mandarin. Retrieved from arXiv:1512.02595 [cs.CL]

Aroon, A., & Dhonde, S. B. (2015). Speaker recognition system using Gaussian Mixture model. International Journal of Computer Applications (0975 – 8887), 130(14, November) (pp. 38-40) https://doi.org/10.5120/ijca2015907193

Baker, B., Gupta, O., Naik, N., & et al. (2017). Designing neural network architectures using reinforcement learning. International Conference on Learning Representations, ICLR. Vancouver, BC, Canada.

Chung, J., Gulcehre, C., Cho, K., & Cho., Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. Retrieved from arXiv:1412.3555v1 [cs.NE]

Dovydaitis, L. ir Rudžionis, V. (2018a). Building Bi-directional LSTM neural network based speaker identification system. Computational Science and Techniques (pp. 574-580). Klaipėda university.

Dovydaitis, L. ir Rudžionis, V. (2018b). Speaker identification accuracy improvement using blstm neural network. Indian Journal of Computer Science and Engineering, 9(2), 31-37.

Gal, Y., & Ghahramani, Z. (2016). A theoretically grounded application of dropout in recurrent neural networks. University of Cambridge.

Hannun, A., & Case, C. (2014). Deep speech: scaling up end-to-end speech recognition. Retrieved from arXiv:1412.5567v2 [cs.CL]

Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735

Laurinčiukaitė, S., Telksnys, L., Kasparaitis, P., Kliukienė, R. ir Paukštytė, V. (2017). Lithuanian speech corpus Liepa for the development of Lithuanian speech controlled equipment. DRAFT.

Molau, S., Pitz, M., & et al. (2001). Computing mel-frequency cepstral coefficients on the power spectrum. Acoustics, Speech, and Signal Processing, 2001. Proceedings. (ICASSP ‘01). IEEE. https://doi.org/10.1109/ICASSP.2001.940770

Programinė įranga HTK. (n.d.). Prieiga per internetą: http://htk.eng.cam.ac.uk/

Programinė įranga Python. (n.d.). Prieiga per internetą: https://www.python.org/

Programinė įranga SIDEKit. (n.d.). Prieiga per internetą: http://www-lium.univ-lemans.fr/sidekit/index.html

Programinė įranga Keras. (n.d.). Prieiga per internetą: https://keras.io/

Programinė įranga Theano. (n.d.). Prieiga per internetą: http://deeplearning.net/software/theano/

Ravindran, S., Anderson, V. D., & Slaney, M. (2006). Improving the noise-robustness of mel-frequency cepstral coefficients for speech processing. ISCA Tutorial and Research Workshop on Statistical And Perceptual Audition SAPA2006 16 September 2006, Pittsburgh PA.

Rubal, & Mehan, V. (2017). Isolated word recognition and sign language detection using LPC and MFCC. International Journal of Advance Research in Computer Science and Management Studies, 5(7), 137-145.

Saon, G., & Chien, J. (2012). Large-vocabulary continuous speech recognition systems. IEEE signal processing magazine, 29(6). https://doi.org/10.1109/MSP.2012.2197156

Schuster, M., & Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11), 2673-2681. https://doi.org/10.1109/78.650093

Šalna, B. ir Kamarauskas, J. (2010). Evaluation of effectiveness of different methods in speaker recognition. Elektronika ir elektrotechnika, 2(98), 67-70.

Tiwari, V. (2010). MFCC and its applications in speaker recognition. International Journal on Emerging Technologies, 1(1), 19-22.