Ganeeva, Veronika
Khrabrov, Kuzma
Kadurin, Artur
Tutubalina, Elena
Funding for this research was provided by:
Russian Science Foundation (23-11-00358)
Article History
Received: 30 November 2024
Accepted: 8 August 2025
First Online: 30 October 2025
Declarations
:
: First, we evaluated modes that are publicly available at HuggingFace (HF). We note that there are other popular models such as Chemformer ( ), Molformer ( ) and T5Chem ( ), which we failed to plug as HF checkpoints. Second, the evaluated models primarily focus on the sequence format of molecules, but it is important to consider in future other formats, such as 3D structures, which also hold significant importance. Third, we emphasize that the evaluated models were developed for research purposes and may contain unintended biases, and any molecules generated by them should undergo thorough evaluation through standard clinical testing. Furthermore, SELFIES [ ] and other molecule naming systems are also widespread in the chemical field. In our research, we have focused on SMILES due to its popularity, but the augmentations of other systems are yet to be explored.
: The models and datasets used in this work are publicly available for research purposes. The incorporation of AI into applied chemistry brings forth a variety of risks and ethical dilemmas. First, the direct implementation of AI-generated predictions, potentially hazardous or dangerous, without rigorous validation could result in human injuries, casualties, and damage to laboratory facilities. Second, the absence of proper oversight could lead to the misuse of chemical language models and AI in general, potentially facilitating the production of dangerous and illegal chemical compounds, with significant ethical and societal consequences. To address these concerns, it is essential to develop and implement safe ethical guidelines for the development and deployment of AI in chemistry.
: Not applicable.