Recently, StepFun’s latest large speech model has topped the global leaderboard in the authoritative VoxSRC 2023 benchmark, marking a significant leap forward for China’s capabilities in speech AI. The model achieved state-of-the-art results across core tasks including automatic speech recognition, speaker verification, and text-to-speech synthesis, demonstrating exceptional robustness in multilingual, low-resource, and noisy environments. By leveraging an innovative end-to-end architecture and large-scale self-supervised pre-training strategies, StepFun significantly enhanced the model’s ability to understand and generate complex speech signals. Trained on data covering over 100 languages and dialects, the model exhibits strong cross-lingual generalization. This achievement highlights StepFun’s dual strengths in fundamental research and practical deployment, offering more reliable technology for applications such as intelligent customer service, voice assistants, and real-time translation. The company also announced plans to open-source parts of its model capabilities to foster a more collaborative and open speech AI ecosystem.
近日,阶跃星辰(StepFun)推出的最新语音大模型在权威评测VoxSRC 2023中登顶全球第一,标志着中国在语音人工智能领域的技术实力迈上新台阶。该模型在语音识别、说话人验证和语音合成等多个核心任务中均取得领先成绩,尤其在多语种、低资源及噪声环境下的鲁棒性表现突出。阶跃星辰通过引入创新的端到端架构与大规模自监督预训练策略,显著提升了模型对复杂语音信号的理解与生成能力。此外,其训练数据涵盖上百种语言和方言,使模型具备强大的跨语言泛化能力。此次登顶不仅彰显了阶跃星辰在基础研究与工程落地方面的双重优势,也为智能客服、语音助手、实时翻译等应用场景提供了更可靠的技术支撑。公司表示,未来将持续开源部分模型能力,推动语音AI生态的开放发展。
原创文章,作者:admin,如若转载,请注明出处:https://avine.cn/14823.html