This workshop explores the machine learning (ML) challenge of foreign accent conversion (FAC), a special speech processing problem, within the context of real-time applications. We provide a concise introduction to core ML concepts, highlighting its role in speech processing. A brief overview of digital signal processing (DSP) techniques for audio manipulation is also included.
We then delve into established FAC approaches, analyzing their strengths and weaknesses, particularly regarding their suitability for real-time scenarios. This analysis highlights the critical limitations of high parameter counts and extensive algorithmic lookahead that impede practical implementation.
We propose a novel FAC solution specifically designed to address these limitations,
enabling real-time operation. Our approach prioritizes efficient parameter usage and restricted algorithmic lookahead, making it suitable for resource-constrained environments.
Finally, we establish objective evaluation metrics for FAC tasks and present the performance results achieved by our model.