A Hybrid Architecture for Accent-Based Automatic Speech Recognition Systems for E-Learning Environment

AJU Omojokun Gabriel; OSUBOR Veronica Ijebusomma

doi:10.4316/JACSM.202202003

Paper title:	A Hybrid Architecture for Accent-Based Automatic Speech Recognition Systems for E-Learning Environment
DOI:	https://doi.org/10.4316/JACSM.202202003
Published in:	Issue 2, (Vol. 16) / 2022
Publishing date:	2022-10-11
Pages:	20-25
Author(s):	AJU Omojokun Gabriel, OSUBOR Veronica Ijebusomma
Abstract.	The adoption of accents-based speech recognition into the e- learning environment has revolutionized the e-learning technology and reducing the learners’ barriers to knowledge acquisition, particularly as it relates to the effects of accents on the participants’ comprehension. Several researchers had worked on the development of accents-based speech recognition models using different techniques and architectures, such as acoustic model adaptation, pronunciation adaptation, Restricted Boltzmann Machine (RBM), Hidden Markow Model, Gaussian Mixture Model, Artificial Neural Network, Recurrent Neural Networks, Convolution Neural Network, among others. However, the accuracy rate of these models with these architectures and techniques have also become subject of research discussions. In this paper, we propose a new approach that combines the Long-Short Term Memory (LSTM) and Restricted Boltzmann Machine (RBM) of deep neural network techniques to form an optimal architecture for the accents-based speech recognition system development
Keywords:	Algorithm, Architecture, Automatic Speech Recognition, Long-Short Term Memory, Restricted Boltzmann Machine
References:	1. Behravan, H., Hautamäki, V and Kinnunen, T. (2015). Factors Affecting i-Vector Based Foreign Accent Recognition: A Case Study in Spoken Finnish. Speech Communication, Vol. 66, pp. 118–129. 2. Chakraborty, K., Talele, A and Upadhya, S. (2014). Voice Recognition Using MFCC Algorithm. International Journal of Innovative Research in Advanced Engineering (IJIRAE) ISSN: 2349-2163. Vol. 1, Issue 10. Pp. 158-161. 3. Chen, T., Huang, T., Chang, E and Wang, J. (2001). Automatic Accent Identification Using Gaussian Mixture Models. In Proceedings of 2001 IEEE International Conference on Automatic Speech Recognition and Understanding (IEEE ASRU 2001), pp. 343–346. 4. Chung, J. S., Nagrani, A and Zisserman, A. (2018). VoxCeleb2: Deep speaker recognition. In Proceedings of Annual Conference of the International Speech Communication (INTERSPEECH, 2018), Hyderabad, India, pp. 1086–1090. 5. Deshpande, S., Chikkerur, S and Govindaraju, V. (2005). Accent Classification in Speech. In the Proceedings of the Fourth IEEE Workshop on Automatic Identification Advanced Technologies (AutoID’05), pp. 139–143 6. Desplanques, B., Thienpondt, J and Demuynck, K. (2020) ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification. In Proceedings of the INTERSPEECH 2020, pp. 3830-3834. 7. Dimitra, V., Lori, L and Jean-Luc, G. (2010). Automatic Speech Recognition of Multiple Accented English Data. In the Proceedings of the 11th Annual International Conference of the International Speech Communication Association (INTERSPEECH, 2010), Chiba, Japan. pp. 1652-1655. 8. Graves, A and Jaitly, N. (2014). Towards End-To-End Speech Recognition with Recurrent Neural Networks. In the Proceedings of the 31st International Conference on Machine Learning (ICML, 2014), pp. 1764–1772. 9. Han, H., Zhu, X and Li, Y. (2020). Generalizing Long Short-Term Memory Network for Deep Learning from Generic Data. ACM Transactions on Knowledge Discovery from Data Vol. 14, Issue 2. pp. 22-28. 10. Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohammed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P and Sainath, T. N. (2012). Deep Neural Networks for Acoustic Modelling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Processing Magazine, Vol. 29, No. 6, pp. 82–97. 11. Jiao, Y., Tu, M., Berisha, V and Liss, J. L. (2016). Accent Identification by Combining Deep Neural Networks and Recurrent Neural Networks Trained on Long and Short Term Features. In Proceedings of 2016 Annual Conference of the International Speech Communication (INTERSPEECH, 2016), San Francisco, California, pp. 2388–2392. 12. Lee, C. H and Siniscalchi, S. M. (2013). An Information Extraction Approach to Speech Processing: Analysis, Detection, Verification and Recognition. In Proceedings of the IEEE, Vol.101, Issue. 5, pp. 1089–1115. 13. Li, X and Wu, X. (2015). Improving Long Short-Term Memory Networks using Maxout Units For Large Vocabulary Speech Recognition. In the Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP, 2015). South Brisbane, Australia. pp. 4600–4604. 14. Liu, M., Wang, Y., Wang, J., Wang, J and Xie, X. (2018). Speech Enhancement Method Based on LSTM Neural Network for Speech Recognition. In the Proceedings of the 14th IEEE International Conference on Signal Processing (ICSP, 2018), Beijing, China. pp. 1406-1413. 15. Poonkuzhali, C., Karthiprakash, R., Valarmathy, S and Kalamani, M. (2013). An Approach to Feature Selection Algorithm Based on Ant Colony Optimization for Automatic Speech Recognition, International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering, Vol. 11, No. 2, pp. 364-372. 16. Rathi, C. (2001). Accent-Independent Universal HMM-Based Speech Recognizer for American, Australian and British English. In the Proceedings of the 7th European Conference on Speech Communication and Technology, Aalborg, Denmark. Vol. 3, September 3-7, 2001. 17. Sahidullah, M and Saha, G. (2012). Design, Analysis and Experimental Evaluation of Block Based Transformation in MFCC Computation for Speaker Recognition. Speech Communication, Vol.54, No. 4. pp. 543-565. 18. Tan, T., Lu, Y., Ma, R., Zhu, S., Guo, J and Qian, y. (2021). AISpeech-SJTU ASR System for the Accented English Speech Recognition Challenge. In the Proceedings of 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP, 2021). Toronto, Canada. 19. Tappert, C. C., Dixon, N. R and Rabinowitz, A. S. (1973). Application of Sequential Decoding for Converting Phonetic to Graphemic Representation in Automatic Recognition of Continuous Speech. IEEE Transaction Audio Electroacoustic, Vol. AU-21, pp. 225-228. 20. Tawaqal, B And Suyanto, S. (2021). Recognizing Five Major Dialects in Indonesia Based on MFCC and DRNN. Journal of Physics (Conference Series), Vol. 1844. 21. Upadhyay, R and Lui, S. (2018). Foreign English Accent Classification Using Deep Belief Networks. In the Proceedings of the IEEE 12th International Conference on Semantic Computing (ICSC, 2018). Laguna Hills, California. pp. 290-293. 22. Weninger, F., Sun, Y., Park, J., Willett, D and Zhan, D. (2019). Deep Learning Based Mandarin Accent Identification for Accent Robust Automatic Speech Recognition. In Proceedings of 2019 Annual Conference of the International Speech Communication (INTERSPEECH, 2019), Graz, Austria, pp. 510–514. 23. Wu, C. (2018). Structured Deep Neural Networks for Speech Recognition (PhD Thesis). Department of Engineering, University of Cambridge, Cambridge. 24. Yanli, Z., Richard, S., Liang, G and et al. (2005). Accent Detection and Speech Recognition for Shanghai-Accented Mandarin. In Proceedings of the 9th European Conference on Speech Communication and Technology (INTERSPEECH, 2005), Lisbon, Portugal. pp. 217-220. 25. Ying, W., Zhang, L., and Deng, H. (2019). Sichuan Dialect Speech Recognition with Deep LSTM Network. Frontiers of Computer Science. Vol. 14. pp. 378-387. 26. Zhenhao, G. (2015). Improved Accent Classification Combining Phonetic Vowels with Acoustic Features. In the Proceedings of IEEE 8th International Congress on Image and Signal Processing (CISP 2015). pp. 1204–1209.
Back to the journal content

This article is licensed under a
Creative Commons Attribution-ShareAlike 4.0 International License.