Speech Processing Research Papers

Revisiting the speech error phenomenon: A thematic narrative review

2025, International Journal of Science and Research Archive

This review examines how researchers have approached speech errors across a wide range of studies, drawing on work from databases such as Scopus, Web of Science, and Google Scholar. Rather than beginning with fixed categories, patterns... more

descriptionView Paper arrow_downwardDownload

Spectral Normalisation MFCC Derived Features for Robust Speech Recognition

by Adriano tavares

2025, 9th Conference Speech and …

This paper presents a method for extracting MFCC parameters from a normalised power spectrum density. The underlined spectral normalisation method is based on the fact that the speech regions with less energy need more robustness, since... more

descriptionView Paper arrow_downwardDownload

Inferring Speech Activity from Encrypted Skype Traffic

by Chin-Laung Lei

2025

Normally, voice activity detection (VAD) refers to speech processing algorithms for detecting the presence or absence of human speech in segments of audio signals. In this paper, however, we focus on speech detection algorithms that take... more

descriptionView Paper arrow_downwardDownload

Audio segmentation for meetings speech processing

by Nelson Morgan

2025

descriptionView Paper arrow_downwardDownload

Combining Discriminative Feature, Transform, and Model Training for Large Vocabulary Speech Recognition

by Nelson Morgan

2025, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07

Recent developments in large vocabulary continuous speech recognition (LVCSR) have shown the effectiveness of discriminative training approaches, employing the following three representative techniques: discriminative Gaussian training... more

descriptionView Paper arrow_downwardDownload

Design of a lower-error fixed-width multiplier for speech processing application

by Wu-Shiung Feng

2025

A lower-error and lower-variance n X ?Z multiplier is suitably proposed for VLSI design. Considering next lower significant stage in P,-' column and useful error-compensation model in the least significant part, and utilizing a near... more

descriptionView Paper arrow_downwardDownload

Speech Intelligibility Improvement through Optimized Voice Transformation in Transfer Learning Framework

by Ritujoy Biswas

2025, Ph.D. Thesis

Given the importance of speech in our day-to-day activities and almost all forms of communication, it is crucial to address issues that hinder effective communication. Often, in critical security and military applications, the... more

Given the importance of speech in our day-to-day activities and almost all forms of communication, it is crucial to address issues that hinder effective communication. Often, in critical security and military applications, the intelligibility of speech is of higher importance than the quality. In such cases, it is crucial that the speech utterance is comprehended for the exact message it was meant to convey, while the pleasantness of the speech and how good it sounds assumes secondary importance. In cases where noise degrades speech intelligibility, it is vital to develop measures that allow speech utterances to retain their intelligibility despite the ambient noise. When modified in a certain way, the intelligibility is improved over the original voice in the presence of noise. This is known as the Lombard effect. Of the several techniques to achieve Lombard speech, one effective approach is formant shifting, where the formant frequencies are relocated to regions in the spectrum where the noise cannot mask the phonetic information they contain. This shift was guided by a trapezoidal voice transformation function (TVTF) that mapped the original locations of formant frequencies in the spectrum to new locations. However, when done empirically, formant shifting results in artifacts.

{Hence, the objective of this thesis is to ascertain techniques to optimize such shifts in formants in order to maximize the boost in intelligibility in a near-end noisy environment. Such optimization should encompass factors like the language being spoken, the presence of realistic noises that are frequently encountered, as well as variation in noise intensities in terms of signal SNR.} {To that end, as the first contribution of this thesis}, we propose optimizing the shaping parameters of the TVTF via a genetic optimization technique called comprehensive learning particle swarm optimization (CLPSO). {Such optimization was specific to a certain combination of language, noise type, and SNR level. Next}, we propose a joint enhancement scheme where we combine several voice modification techniques with formant shifting to preserve speech quality while boosting intelligibility. These include time-scale modification, energy redistribution, and smoothing of formant contours. Although the performance increased, the optimization, which was already computationally intensive, took even longer to converge due to the increased dimension of the input vector corresponding to the additional techniques used in Lombard speech generation. To address this, {as yet another contribution,} a statistical configuration of the VTF is proposed - the Gaussian voice transformation function (GVTF), which had just three parameters to be optimized (instead of five in the TVTF). This reduced the convergence time and resulted in a smoother contour of the shifted formants across the frames.

In case of changes in the ambient conditions of SNR levels, language, and/or noise type, {our contributions focus on the proposition of} transfer learning on all environmental factors like SNR, languages, and noise types. The transfer learning across SNR levels was achieved through a Gaussian process regression, where the known VTF parameters at some SNR levels were used to estimate the unknown parameters for other SNR levels. In case of a change in the language, the comparative analysis of pitch and formant frequencies between the source and target languages was used to modify the shaping parameters of the VTF to conform to the target language. However, the transfer across noises was only made possible through GVTF. This is handled through comparative analysis of the Gaussian approximations of the noise magnitude spectra of the source and target noises. Finally, we establish a mechanism for combined transfer learning across languages and noises for dealing with cases where both those conditions change.

To demonstrate the efficacy of the proposed algorithms in real-time, the entire optimization cycle, using TVTF and GVTF, along with all transfer learning mechanisms, have all been built into an application user interface developed in the AppDesigner in MATLAB. As a practical application of the optimization used in this thesis, the importance of an optimal dataset generated through CLPSO has been demonstrated towards training a lightweight ANN on a resource-constrained Raspberry Pi to suggest optimal microphone location in a room relative to a speaker.

descriptionView Paper arrow_downwardDownload

Dark and low-contrast image enhancement using dynamic stochastic resonance in discrete cosine transform domain

by Rajib K Jha

2025, APSIPA Transactions on Signal and Information Processing

A novel technique based on dynamic stochastic resonance (DSR) in discrete cosine transform (DCT) domain has been proposed in this paper for the enhancement of dark as well as low-contrast images. In conventional DSR-based techniques, the... more

descriptionView Paper arrow_downwardDownload

Network Modeling for Functional Magnetic Resonance Imaging (fMRI) Signals during Ultra-Fast Speech Comprehension in Late-Blind Listeners

by Hermann Ackermann

2025, PLOS ONE

In many functional magnetic resonance imaging (fMRI) studies blind humans were found to show cross-modal reorganization engaging the visual system in non-visual tasks. For example, blind people can manage to understand (synthetic) spoken... more

descriptionView Paper arrow_downwardDownload

Foreign accent classification for Arabic speech learning

by A Mars

2025, world-comp.org

This paper proposes an acoustic phonetic study of the foreign accents in the Arabic language. To analyze on a large scale of the connected variations, the contribution of the automatic tools acoustico-phonetic decoding tools along the... more

descriptionView Paper arrow_downwardDownload

Speech emotion recognition system using both spectral and prosodic features

by cr srinivasan

2025

In this paper, we propose an emotion recognition system from speech signal using both spectral and prosodic features. Most traditional systems have focused on spectral features or prosodic features. Since both the spectral and the... more

descriptionView Paper arrow_downwardDownload

On the Use of Forward Gain Switching for Acoustic Howling Control

by Oluwatobi Balogun

2025, 2024 IEEE NIGERCON

Several techniques have been introduced over the years to mitigate the effect of howling sound production in sound reinforcement applications. The algorithmic computational complexity of the howling controllers directly affects their... more

descriptionView Paper arrow_downwardDownload

Learning Intonation Pattern Embeddings for Arabic Dialect Identification

by Elsayed Issa

2025, Interspeech 2020

This article presents a full end-to-end pipeline for Arabic Dialect Identification (ADI) using intonation patterns and acoustic representations. Recent approaches to language and dialect identification use linguistic-aware deep... more

descriptionView Paper arrow_downwardDownload

Environmental Sound Recognition in Embedded Systems: Bridging Experiments in Passenger Vehicles to Autonomous Vehicle Applications in Smart Cities

by Andre L Florentino

2025, Master´s thesis

The autonomous vehicle market is experiencing significant growth, with indications of transitioning from the "trough of disillusionment" to the "slope of enlightenment" on the Gartner hype cycle chart. Fundamental technologies... more

The autonomous vehicle market is experiencing significant growth, with indications of transitioning from the "trough of disillusionment" to the "slope of enlightenment" on the Gartner hype cycle chart. Fundamental technologies encompassing extensive data analytics, computational capabilities, and sensor fusion techniques have already been established, and all stakeholders in this industry are persistently exploring novel approaches to enhance the overall perception of end users in terms of safety and trustworthiness. In this context, this project aims to develop and implement an Environmental Sound Recognition (ESR) algorithm in an embedded system for deployment in autonomous vehicles for Smart Cities in 2025, targeting advanced functionalities for early warning systems. Due to hardware constraints, a regular passenger vehicle was used, embedding the ESR algorithm in a Raspberry Pi with a microphone array. The limited literature on ESR algorithms for vehicles primarily focuses on siren detection without real-time inferences, and to address this, a dataset benchmarking study confirmed classifiers’ accuracy, leading to the creation of a new dataset tailored to autonomous vehicles. This new dataset provided a comprehensive baseline where several classifiers were trained and evaluated for accuracy, memory usage, and prediction time, with CNN 2D using aggregated features emerging as the top-performing model, achieving an average accuracy of 80% in the sliding window process. During the indoor experiment, the total prediction time attained an average of 47.6 ms, validating the algorithm’s performance with weighted F1-scores close to or better than cross-validation results. In the final phase of the methodology, real-world tests conducted in a passenger vehicle yielded similar results. However, inconsistencies were observed in certain classes due to insufficient sample diversity and environmental noise, which affected their accuracy. The results of this project indicate that its general objective was successfully achieved, contributing to understanding of ESR algorithms in embedded systems within passenger vehicles, and it is ready for integration into the electric and electronic architecture of autonomous vehicles for Smart Cities. Additionally, upon conducting further experiments across various vehicle categories to assess cabin insulation effects, this project could potentially enhance safety features for drivers with hearing impairments by adapting the ESR algorithm as an add-on feature in regular passenger vehicles.

descriptionView Paper arrow_downwardDownload

Comparison between the equalization and cancellation model and state of the art beamforming techniques

by Jesper Udesen

2025

This paper investigates the performance of a selection of state-of-the-art array signal-processing techniques for the purpose of predicting the binaural listening experiments from the equalization and cancellation (EC) paper by Durlach... more

descriptionView Paper arrow_downwardDownload

Building an Intelligent Voice Assistant Using Open-Source Speech Recognition Systems

by Venkata Baladari

2025, Journal of Scientific and Engineering Research

This study delves into designing intelligent voice assistants through the implementation of opensource speech recognition algorithms. Developers can build AI-powered voice interfaces by utilizing technologies such as Whisper, DeepSpeech,... more

descriptionView Paper arrow_downwardDownload

Emotional recognition applying speech signal processing

by Alvaro Angel Orozco Gutierrez

2025

In this paper a methodology for extraction of features in emotional speech recognition is presented. Different emotional states of a speaker produce physiological changes in the human speech system, which is reflected in the variation of... more

descriptionView Paper arrow_downwardDownload

JOURNEY TO THE KINGDOM: THE THREE-FOLD COMPOSITIONAL ARC OF BOOK II OF THE PSALTER

by Jerod Gilcher

2025

Employing the methodology of Editorial Criticism, this article seeks to demonstrate that Book II of the Psalter (i.e., Psalms 42-72) consists of three parallel, compositional arcs that take the form of a journey. Based on keyword links,... more

descriptionView Paper arrow_downwardDownload

Rational expressions: Two applications in Combinatorial Physics

by Christophe Tollu

2025

 First application  Classical Fock space  Transfer packet and transfer value  Solution as double continued fractions  Second application  Calculus in Sweedler's duals  Conclusion

descriptionView Paper arrow_downwardDownload

Digital Signal Processing For Noise Suppression In Voice Signals

by Muthukumaran Vaithianathan

2025, Digital Signal Processing For Noise Suppression In Voice Signals

In numerous applications that rely on audible voice signals, including speech recognition, audio recording, and telecommunications, the suppression of background noise is an essential component. The present study introduces an innovative... more

descriptionView Paper arrow_downwardDownload

The effects of temporal asynchrony on the intelligibility of accelerated speech

by Brian Simpson

2025

When the audio and visual portions of a speech stimulus are presented synchronously, the resulting enhancement in intelligibility is generally much larger than the one obtained when the audio and visual stimuli are presented sequentially.... more

descriptionView Paper arrow_downwardDownload

A study on different linear and non-linear filtering techniques of speech and speech recognition

by Minajul haque

2025, ADBU Journal of Engineering Technology (AJET)

In any signal noise is an undesired quantity, however most of thetime every signal get mixed with noise at different levels of theirprocessing and application, due to which the information containedby the signal gets distorted and makes... more

descriptionView Paper arrow_downwardDownload

A review on speech filtering and its different techniques

by Minajul haque

2025, ADBU Journal of Engineering Technology (AJET)

Speech is a form of communication that most people came across in their day to day life. Speech can be used for many purposes like speech communication, speech recognition, speaker identification etc. In all of these applications a noise... more

descriptionView Paper arrow_downwardDownload

Maximising Audiovisual Correlation with Automatic Lip Tracking and Vowel Based Segmentation

by Quóc Anh Nguyễn

2025, Lecture Notes in Computer Science

In recent years, the established link between the various human communication production domains has become more widely utilised in the field of speech processing. In this work, a state of the art Semi Adaptive Appearance Model (SAAM)... more

descriptionView Paper arrow_downwardDownload

Assessment of emerging reading skills in young native speakers and language learners

by THẢO MINH DƯƠNG

2025, Speech Communication

To automate assessments of beginning readers, especially those still learning English, we have investigated the types of knowledge sources that teachers use and have tried to incorporate them into an automated system. We describe a set of... more

descriptionView Paper arrow_downwardDownload

Aswat: Arabic Audio Dataset for Automatic Speech Recognition Using Speech-Representation Learning

by Lamya Alkanhal

2025

Recent advancements in self-supervised speech-representation learning for automatic speech recognition (ASR) approaches have significantly improved the results on many benchmarks with low-cost data labeling. In this paper, we train two... more

descriptionView Paper arrow_downwardDownload

SiCRNN: A Siamese Approach for Sleep Apnea Identification via Tracheal Microphone Signals

by stefano squartini

2025, Sensors MDPI

SiCRNN: A Siamese Approach for Sleep Apnea Identification via
Tracheal Microphone Signals

descriptionView Paper arrow_downwardDownload

Seeing the talker’s face supports executive processing of speech in steady state noise

by Jerker Rönnberg

2025, Frontiers in Systems Neuroscience

Listening to speech in noise depletes cognitive resources, affecting speech processing. The present study investigated how remaining resources or cognitive spare capacity (CSC) can be deployed by young adults with normal hearing. We... more

descriptionView Paper arrow_downwardDownload

A multisensor data acquisition and processing system for speech production investigation

by Alain Ghio

2025, HAL (Le Centre pour la Communication Scientifique Directe)

The articulatory phonology study requires the simultaneous recording of the speech wave and as many articulatory parameters as possible. To this end, for many years,we have developed the integrated PHONART workstation for speech... more

descriptionView Paper arrow_downwardDownload

Call For Papers - 3rd International Conference on Speech and NLP (SPNLP 2025)

by William Shakespeare

2025

3rd International Conference on Speech and NLP (SPNLP 2025) will provide an excellent international forums for sharing knowledge and results in theory, methodology and applications of speech and Natural Language Processing (NLP).

descriptionView Paper arrow_downwardDownload

Warped linear predictive audio coding in video conferencing application

by Aki Härmä

2025, 9th European Signal Processing Conference (EUSIPCO 1998)

A codec for wideband 12kHz speech and audio in a video conferencing application is proposed in this paper. The codec is based on warped linear predictive coding algorithm which utilizes the auditory Bark frequency resolution. The... more

descriptionView Paper arrow_downwardDownload

A comparison of warped and conventional linear predictive coding

by Aki Härmä

2025, IEEE Transactions on Speech and Audio Processing

Frequency-warped signal processing techniques are attractive to many wideband speech and audio applications since they have a clear connection to the frequency resolution of human hearing. A warped version of linear predictive coding... more

descriptionView Paper arrow_downwardDownload

The EMA Study on the Inter-individual Variability and Differences in Articulation between Polish Oral and Nasalised Vowels

by Robert Wielgat

2025, Science, Technology and Innovation

The electromagnetic articulography (EMA) is a relatively exact and efficient method used in study on speech production physiology. It allows to precisely estimate movement trajectories of speech articulators like tongue, lips, jaw etc. by... more

descriptionView Paper arrow_downwardDownload

Improved Feature Parameter Extraction from Speech Signals Using Machine Learning Algorithm

by Furkat Safarov

2025, Sensors

Speech recognition refers to the capability of software or hardware to receive a speech signal, identify the speaker’s features in the speech signal, and recognize the speaker thereafter. In general, the speech recognition process... more

descriptionView Paper arrow_downwardDownload

Delighting in the Torah: The Affective Dimension of Psalm 1

by Lee Roy Martin

2025, Old Testament essays

It is argued in this article that the common interpretation of Ps 1 as a call for obedience, a view exemplified by Walter Brueggemann's influential article, "Bounded by Obedience and Praise: The Psalms as Canon," does not quite capture... more

descriptionView Paper arrow_downwardDownload

Recognition of handwritten word: First and second order hidden Markov model based approach

by Amlan Kundu

2025, Pattern Recognition

In this work, handwritten word recognition problem is modeled in the framework of hidden Markov model (HMM). The states of HMM are identified with the letters of the alphabet. The optimum symbols are then generated by experimental study... more

descriptionView Paper arrow_downwardDownload

Effects of personality traits on listeningoriented dialogue

by Kohji Dohsaka

2025

This paper investigates the effects of personality traits on listening-oriented dialogue to gain insight into building automated listening agents. The analysis of the frequency of dialogue act and the dialogue flow using Hidden Markov... more

descriptionView Paper arrow_downwardDownload

Indicios de Percepción Binaural

by Guillermo Jardon

2025

Indicios de Percepción Binaural. Guillermo Jardon-2022 Resumen El siguiente artículo es una adaptación de los capítulos 1, 2 y 3 del libro Head Related Transfer Function and Acoustic Virtual Reality de Kazuhiro Iida (Springer; 2019) donde... more

descriptionView Paper arrow_downwardDownload

Quantifying parameters of a source-filter model for oesophageal speech

by Bego garcia

2025

Signal processing methods can improve the quality and intelligibility of oesophageal speech. Current methods show only moderate improvement leaving potential for better results. Quantifying parameters of oesophageal speech relative to... more

descriptionView Paper arrow_downwardDownload

Quantifying parameters of a source-filter model for oesophageal speech

by Bego garcia

2025, 2011 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)

Signal processing methods can improve the quality and intelligibility of oesophageal speech. Current methods show only moderate improvement leaving potential for better results. Quantifying parameters of oesophageal speech relative to... more

descriptionView Paper arrow_downwardDownload

LIA human-based system description for NIST HASR 2010

by Solange Rossato

2025, HAL (Le Centre pour la Communication Scientifique Directe)

HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or... more

descriptionView Paper arrow_downwardDownload

Submit Your Research Articles - 3rd International Conference on Speech and NLP (SPNLP 2025)

by George Zebrowski

2025

3rd International Conference on Speech and NLP (SPNLP 2025) will provide an excellent international forums for sharing knowledge and results in theory, methodology and applications of speech and Natural Language Processing (NLP).

descriptionView Paper arrow_downwardDownload

Comparative Study on Adaptive Questionnaire Model Building and Mixed Reality Simulation for Recruitment Process

by avishek mukherjee

2025

Artificial Intelligence hasn’t left a facet of life untouched. As it makes its mark in every step of an organization, its application in the field of Human Resource(HR) needs critical analysis. Recruitment is the backbone of a... more

descriptionView Paper arrow_downwardDownload

A Review of Accent-Based Automatic Speech Recognition Models for E-Learning Environment

by Omojokun Gabriel Aju

2025, Covenant Journal of Informatics & Communication Technology

The adoption of electronics learning (e-learning) as a method of disseminating knowledge in the global educational system is growing at a rapid rate, and has created a shift in the knowledge acquisition methods from the conventional... more

descriptionView Paper arrow_downwardDownload

Accent-Based Speech Recognition Model for the Major Nigerian Ethnics English Speakers

by Omojokun Gabriel Aju

2025, FUOYE Journal of Pure and Applied Sciences

The adoption of accent-based automatic speech recognition (ASR) to remove the limitations of accent variations among the e-learning participants from different accents background has been considered a milestone. Several accents-based... more

descriptionView Paper arrow_downwardDownload

Universal Speech Models

by Fabio Chefane

2025, "Universal Speech Models"

descriptionView Paper arrow_downwardDownload

Speech and Image Compression Using Discrete Wavelet Transform

by Mukhtiar unar

2025, IEEE/Sarnoff Symposium on Advances in Wired and Wireless Communication, 2005.

The fast development of multimedia computing has led to the demand of using digital speech and images. The manipulation, storage and transmission of speech and images in their raw form is very expensive, and significantly slows the... more

descriptionView Paper arrow_downwardDownload

Design Principles for Safety in Human-Robot Interaction

by manuel giuliani

2025, International Journal of Social Robotics

The interaction of humans and robots has the potential to set new grounds in industrial applications as well as in service robotics because it combines the strengths of humans, such as flexibility and adaptability, and the strengths of... more

descriptionView Paper arrow_downwardDownload

Criteria for the Evaluation of Automated Speech-Recognition Scoring Algorithms

by Simon Dobrisek

2025

Variations of the basic string-alignment algorithm are commonly used for the detection and classification of speech-recognition errors. In this procedure, reference and system-output hypothesis speech transcriptions are first aligned... more

descriptionView Paper arrow_downwardDownload

Respublika konferensiyasi 5 June

by Asrorbek Roʻzimatov Uzbekistan art and culture

2025, ALISHER NAVOIY ADABIY MEROSIDA MILLIY MUSIQA TALQIN

Turkiy adabiyotning buyuk siymosi Alisher Navoiy butun hayoti, ijodi va kuch – qudratlarini inson baxt saodati uchun kurashga bag’ishladi. Shoirning o‘lmas asarlarida o‘zi yashagan zamon va muhitning barcha muhim masalalarini qamrab oldi,... more

descriptionView Paper arrow_downwardDownload

Log In

Speech Processing

Related Topics