Recently, Liang Wenfeng, founder and Chief Scientist of the Chinese large model company DeepSeek, published a new paper titled ‘Efficient and Scalable Language Model Training via Dynamic Token Pruning’ as the first author. The study introduces an innovative training method called ‘Dynamic Token Pruning,’ which significantly reduces computational costs and GPU memory usage during large language model (LLM) training without substantially compromising model performance. Unlike conventional approaches that treat all input tokens equally, this method employs a lightweight gating mechanism to dynamically identify and skip tokens that contribute minimally to the final output during training. Experimental results show that the approach achieves up to 30% faster training across multiple benchmarks while maintaining competitive performance on language understanding and generation tasks. This breakthrough not only offers a more cost-effective pathway for LLM training but also opens new possibilities for deploying models in resource-constrained environments. Submitted to a top-tier AI conference, the paper has already drawn significant attention from the research and industry communities.
近日,国产大模型公司深度求索(DeepSeek)创始人兼首席科学家梁文锋以第一作者身份发布了一篇题为《Efficient and Scalable Language Model Training via Dynamic Token Pruning》的新论文。该研究提出了一种名为“动态词元剪枝”(Dynamic Token Pruning)的创新训练方法,旨在在不显著牺牲模型性能的前提下,大幅降低大语言模型训练过程中的计算开销与显存占用。论文指出,传统训练方法对所有输入词元一视同仁,而新方法通过引入轻量级门控机制,在训练过程中动态识别并跳过对最终输出贡献较小的词元,从而实现高效训练。实验结果表明,该方法在多个基准测试中实现了高达30%的训练加速,同时保持了模型在语言理解与生成任务上的竞争力。这项成果不仅为大模型训练提供了更具成本效益的路径,也为资源受限环境下的模型部署带来新可能。作为DeepSeek在基础研究领域的重要突破,该论文已提交至顶级人工智能会议,并引发业界广泛关注。
原创文章,作者:admin,如若转载,请注明出处:https://avine.cn/13363.html