DocPedia: unleashing the power of large multimodal model in the frequency domain for versatile document understanding
Crossref DOI link: https://doi.org/10.1007/s11432-024-4250-y
Published Online: 2024-12-13
Published Print: 2024-12
Update policy: https://doi.org/10.1007/springer_crossmark_policy
Feng, Hao
Liu, Qi
Liu, Hao
Tang, Jingqun
Zhou, Wengang
Li, Houqiang
Huang, Can
Text and Data Mining valid from 2024-12-01
Version of Record valid from 2024-12-01
Article History
Received: 29 March 2024
Revised: 8 September 2024
Accepted: 11 November 2024
First Online: 13 December 2024