Crossmark

Measuring Chemical LLM robustness to molecular representations: a SMILES variation-based framework

Crossref DOI link: https://doi.org/10.1186/s13321-025-01079-0

Published Online: 2025-10-30

Update policy: https://doi.org/10.1007/springer_crossmark_policy

Authors

Ganeeva, Veronika

Khrabrov, Kuzma

Kadurin, Artur

Tutubalina, Elena
Funding

Funding for this research was provided by:

Russian Science Foundation (23-11-00358)
License Information

Text and Data Mining valid from 2025-10-30

Version of Record valid from 2025-10-30
More Information

Article History

Received: 30 November 2024

Accepted: 8 August 2025

First Online: 30 October 2025

Declarations

:

: First, we evaluated modes that are publicly available at HuggingFace (HF). We note that there are other popular models such as Chemformer ( ), Molformer ( ) and T5Chem ( ), which we failed to plug as HF checkpoints. Second, the evaluated models primarily focus on the sequence format of molecules, but it is important to consider in future other formats, such as 3D structures, which also hold significant importance. Third, we emphasize that the evaluated models were developed for research purposes and may contain unintended biases, and any molecules generated by them should undergo thorough evaluation through standard clinical testing. Furthermore, SELFIES [ ] and other molecule naming systems are also widespread in the chemical field. In our research, we have focused on SMILES due to its popularity, but the augmentations of other systems are yet to be explored.

: The models and datasets used in this work are publicly available for research purposes. The incorporation of AI into applied chemistry brings forth a variety of risks and ethical dilemmas. First, the direct implementation of AI-generated predictions, potentially hazardous or dangerous, without rigorous validation could result in human injuries, casualties, and damage to laboratory facilities. Second, the absence of proper oversight could lead to the misuse of chemical language models and AI in general, potentially facilitating the production of dangerous and illegal chemical compounds, with significant ethical and societal consequences. To address these concerns, it is essential to develop and implement safe ethical guidelines for the development and deployment of AI in chemistry.

: Not applicable.

Document is current

Any future updates will be listed below