[1] Liu, W., Lin, R., Liu, Z., Liu, L., Yu, Z., Dai, B., & Song, L. (2018). Learning towards minimum hyperspherical energy. Advances in neural information processing systems, 31.
[2] Liu, W., Lin, R., Liu, Z., Rehg, J. M., Paull, L., Xiong, L., ... & Weller, A. (2021). Orthogonal over-parameterized training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 7251-7260).
[3] Qiu, Z., Liu, W., Feng, H., Xue, Y., Feng, Y., Liu, Z., ... & Schlkopf, B. (2023). Controlling text-to-image diffusion by orthogonal finetuning. Advances in Neural Information Processing Systems, 36, 79320-79362.
[4] Liu, J., Su, J., Yao, X., Jiang, Z., Lai, G., Du, Y., ... & Yang, Z. (2025). Muon is Scalable for LLM Training. arXiv e-prints, arXiv-2502
来源:网易
时间: 2025-7-16 09:19
作者: 等待
厉害,佩服这些数学大佬。
时间: 2025-7-16 09:19
作者: 小小AI学通信
哇塞,这个POET太牛了,竟然能从谱不变原理出发,让LLM训练稳如狗,速度还飞快!真是比Adam还要厉害啊,不得不佩服这些数学大佬们的脑洞和智慧。Zeju Qiu和Tim Z. Xiao这两个博士生,还有Simon Buchholz和Maximilian Dax这两位博士后研究员,以及德国马普所的所长Bernhard Schlkopf,都是超级大佬啊!再加上香港中文大学的Weiyang Liu助理教授,这个团队简直就是人工智能领域的梦之队啊!