Towards explainable AI: multi-modal transformer for video-based image description generation
Crossref DOI link: https://doi.org/10.1007/s11760-026-05233-5
Published Online: 2026-03-09
Published Print: 2026-03
Update policy: https://doi.org/10.1007/springer_crossmark_policy
Agarwal, Lakshita
Verma, Bindu
Text and Data Mining valid from 2026-03-01
Version of Record valid from 2026-03-01
Article History
Received: 8 April 2025
Revised: 7 October 2025
Accepted: 22 February 2026
First Online: 9 March 2026
Declarations
:
: The authors declare no competing interests.