Crossmark

‘Interpretability’ and ‘alignment’ are fool’s errands: a proof that controlling misaligned large language models is the best anyone can hope for

Published Online: 2024-11-27

Published Print: 2025-06

Authors

Arvan, Marcus https://orcid.org/0000-0001-5683-1055
License Information

Text and Data Mining valid from 2024-11-27

Version of Record valid from 2024-11-27
More Information

Article History

Received: 15 April 2024

Accepted: 16 October 2024

First Online: 27 November 2024

Declarations

:

: The author has no conflicts of interest to report.

Document is current