Zhou, Xukun https://orcid.org/0009-0005-5478-1791
Li, Jiwei
Zhang, Tianwei
Lyu, Lingjuan
Yang, Muqiao https://orcid.org/0000-0001-6273-0138
Chapter History
First Online: 22 August 2024
Ethical Declarements
: Backdoor attacks pose a major risk to natural language processing by subtly manipulating model inferences. While existing defenses examine syntactic correctness and repetition, we propose a fluency-preserving perturbation method, named NURA, to clandestinely poison language models during generation rather than post-hoc. By subtly altering inputs, our approach evades rule-based detection while producing fluent poisoned texts. Through this work, we aim to raise awareness of stealthy input-aware backdoors and spur discussion on mitigation, as adversarial examples integrated during training challenge standard defenses and model auditing. Continued exploration of techniques detecting pattern shifts introduced during poisoning may help safeguard applications, emphasizing proactive consideration of diverse attack vectors throughout development to strengthen protections for real-world language systems.
Conference Information
Conference Acronym: ECML PKDD
Conference Name: Joint European Conference on Machine Learning and Knowledge Discovery in Databases
Conference City: Vilnius
Conference Country: Lithuania
Conference Year: 2024
Conference Start Date: 8 September 2024
Conference End Date: 12 September 2024
Conference Number: 24
Conference ID: ecml2024
Conference URL: https://2024.ecmlpkdd.org/