Crossmark

Fine tuning large language models for hate speech detection in Hinglish and code mixed custom dataset through a socially responsible approach for safer digital platforms

Crossref DOI link: https://doi.org/10.1007/s43621-025-02190-w

Published Online: 2025-12-22

Update policy: https://doi.org/10.1007/springer_crossmark_policy

Authors

Rathore, Bhawani Singh

Chaurasia, Sandeep
Funding

Funding for this research was provided by:

Manipal University Jaipur
License Information

Text and Data Mining valid from 2025-12-22

Version of Record valid from 2025-12-22
More Information

Article History

Received: 17 July 2025

Accepted: 27 October 2025

First Online: 22 December 2025

Declarations

:

: Not applicable. The annotators were members of the research team who participated voluntarily. No sensitive personal data were collected, and formal ethical approval was not required.

: Not applicable.

: Not applicable. This study used publicly available data from Reddit without any direct interaction with individuals.

: The authors declare no conflict of interest.

: Evaluation and error analysis were stratified over demographic proxies observable in Hinglish social media (e.g., gendered terms, religion-coded terms, identity slurs). We inspected the per-group precision, recall, F1, and confusion patterns to detect disproportionate false positives/negatives. Shortcut reliance was probed with neutral identity term sentences and minimally edited counterfactuals that swap protected attributes.

: The dataset was label-balanced, near-duplicate slur templates were filtered, and hard-negative examples were added iteratively. Parameter-efficient tuning enabled periodic refreshes, and high false-positive terms were flagged for targeted corrections.

: Expert annotators applied a rubric with context-aware adjudication. For future updates, a participatory feedback loop with Indian platform moderators is planned to review FP/FN exemplars, refine thresholds for borderline categories (satire and reclaimed slurs), and document changes in a public model card.

: Models are decision support with human escalation for ambiguous/high-impact cases; interventions are logged for audit; periodic bias audits monitor per-group error trends with predefined triggers for refresh.

Document is current

Any future updates will be listed below