I am currently a Ph.D. student in the School of Information Science and Electrical Engineering at Kyushu University, working under the supervision of Prof. Jianjun Zhao and Prof. Lei Ma. Prior to this, I received my Master’s degree from Beijing Institute of Technology and Bachelor’s degree from Northeastern University in China.
My research interests primarily focus on the modeling of speech processing systems, including speech synthesis, voice conversion, speech recognition, speech emotion recognition. Besides, I am also interested in the Software Engineering (SE) support for complex AI-based systems (quality assurance for AI).
🔍 Research Area
Speech Processing: Speech Recognition, Speech Emotion Generation, Voice Conversion, Speech Generation
Large Language Models: Speech Tokenizer, Speech LLMs, Diffusion Models
Software Engineering: Software Testing, Analysis, and Repair
📝 Publications (⭐ denotes equal contribution.)
2025:
- AAAI 2025 StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching. J Yao, Y Yang, Y Pan, Z Ning, J Ye, H Zhou, L Xie. [PDF]
2024:
-
ICASSP 2024 GEmo-CLAP: Gender-Attribute-Enhanced Contrastive Language-Audio Pretraining for Accurate Speech Emotion Recognition. Y Pan, Y Hu, Y Yang, W Fei, J Yao, H Lu, L Ma, J Zhao. [PDF]
-
IEEE SLT 2024 GMP-TL: Gender-augmented Multi-scale Pseudo-label Enhanced Transfer Learning for Speech Emotion Recognition. Y Pan⭐, Y Yang⭐, Y Huang, T Jin, J Yin, Y Hu, H Lu, L Ma, J Zhao. [PDF]
-
ICASSP 2024 PromptVC: Flexible stylistic voice conversion in latent space driven by natural language prompts. J Yao, Y Yang, Y Lei, Z Ning, Y Hu, Y Pan, J Yin, H Zhou, H Lu, L Xie. [PDF] [DemoPage]
-
DCC 2024 Initialization Seeds Facilitating Neural Network Quantization. W Fei, L Ding, Y Pan, W Dai, C Li, J Zou, H Xiong. [PDF]
-
IEEE TASLP Musa: Multi-lingual speaker anonymization via serial disentanglement. J Yao, Q Wang, P Guo, Z Ning, Y Yang, Y Pan, L Xie. [PDF]
-
Arixv 2024 (Technical report) Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models. [PDF]
-
Arixv 2024 CTEFM-VC: Zero-Shot Voice Conversion Based on Content-Aware Timbre Ensemble Modeling and Flow Matching. Y Pan, Y Yang, J Yao, J Ye, H Zhou, L Ma, J Zhao. [PDF]
-
Arixv 2024 Takin-VC: Zero-shot Voice Conversion via Jointly Hybrid Content and Memory-Augmented Context-Aware Timbre Modeling. Y Yang⭐, Y Pan⭐, J Yao⭐, X Zhang⭐, J Ye, H Zhou, L Xie, L Ma, J Zhao. [PDF]
-
Arixv 2024 PromptCodec: High-Fidelity Neural Speech Codec using Disentangled Representation Learning based Adaptive Feature-aware Prompt Encoders. Y Pan, L Ma, J Zhao. [PDF]
-
Arixv 2024 DPGAN: A Dual-Path Generative Adversarial Network for Missing Data Imputation in Graphs. X Zheng, Y Wu, Y Pan, W Lin, L Ma, J Zhao. [PDF]
2023:
-
ICASSP 2023 Hybridformer: Improving Squeezeformer with Hybrid Attention and NSR Mechanism. Y Yang⭐, Y Pan⭐, J Yin, J Han, L Ma, H Lu. [PDF]
-
SMAC 2023 Exploring the power of cross-contextual large language model in mimic emotion prediction. G Yi, Y Yang, Y Pan, Y Cao, J Yao, X Lv, C Fan, Z Lv, J Tao, S Liang, H Lu. [PDF]
-
Arixv 2023 MSAC: Multiple Speech Attribute Control Method for Reliable Speech Emotion Recognition. Y Pan, Y Yang, Y Huang, J Yin, Y Hu, H Lu, L Ma, J Zhao. [PDF]
💻 Competitions
2023.10 The 1st place winner in the 4th Multimodal Sentiment Analysis Challenge and Workshop (MuSe) Mimic Sub-challenge 2023 @ ACM MM.
Thanks for the template of acad-homepage.github.io