Zhang, Zhong https://orcid.org/0000-0003-1349-9755
Shao, Nian https://orcid.org/0000-0002-6260-3005
Gao, Chongming
Miao, Rui
Yang, Qinli
Shao, Junming https://orcid.org/0000-0002-6022-428X
Funding for this research was provided by:
Sichuan Province Science and Technology Support Program (2020YFH0037)
National Natural Science Foundation of China (52079026)
National Natural Science Foundation of China (61976044)
Fok Ying Tong Education Foundation (161062)
Fundamental Research Funds for the Central Universities (ZYGX2019Z014)
Fundamental Research Funds for the Central Universities
This article is maintained by: Elsevier
Article Title: Mixhead: Breaking the low-rank bottleneck in multi-head attention language models
Journal Title: Knowledge-Based Systems
CrossRef DOI link to publisher maintained version: https://doi.org/10.1016/j.knosys.2021.108075
Content Type: article
Copyright: © 2022 Elsevier B.V. All rights reserved.