Rathore, Bhawani Singh
Chaurasia, Sandeep
Funding for this research was provided by:
Manipal University Jaipur
Article History
Received: 17 July 2025
Accepted: 27 October 2025
First Online: 22 December 2025
Declarations
:
: Not applicable. The annotators were members of the research team who participated voluntarily. No sensitive personal data were collected, and formal ethical approval was not required.
: Not applicable.
: Not applicable. This study used publicly available data from Reddit without any direct interaction with individuals.
: The authors declare no conflict of interest.
: Evaluation and error analysis were stratified over demographic proxies observable in Hinglish social media (e.g., gendered terms, religion-coded terms, identity slurs). We inspected the per-group precision, recall, F1, and confusion patterns to detect disproportionate false positives/negatives. Shortcut reliance was probed with neutral identity term sentences and minimally edited counterfactuals that swap protected attributes.
: The dataset was label-balanced, near-duplicate slur templates were filtered, and hard-negative examples were added iteratively. Parameter-efficient tuning enabled periodic refreshes, and high false-positive terms were flagged for targeted corrections.
: Expert annotators applied a rubric with context-aware adjudication. For future updates, a participatory feedback loop with Indian platform moderators is planned to review FP/FN exemplars, refine thresholds for borderline categories (satire and reclaimed slurs), and document changes in a public model card.
: Models are decision support with human escalation for ambiguous/high-impact cases; interventions are logged for audit; periodic bias audits monitor per-group error trends with predefined triggers for refresh.