Evers, Audrey
Stade, Elizabeth C.
Salecha, Aadesh
Tait, Zoe
Khazanov, Gabriela
Article History
Received: 9 May 2025
Accepted: 20 April 2026
First Online: 25 April 2026
Declarations
:
: The authors declare no competing interests.
: A computational tool for analyzing large-scale textual data using machine learning and natural language processing (NLP). DLATK is often used for linguistic and psychological research, allowing users to extract features such as word frequency, n-grams, and psycholinguistic attributes from text data. It integrates with various machine learning models to study correlations between language use and psychological, social, or behavioral outcomes.
: A probabilistic generative model used for topic modeling in natural language processing. LDA assumes that each document in a corpus is a mixture of various topics, and each topic is characterized by a distribution of words. The model assigns words to topics based on probability distributions, enabling researchers to discover hidden thematic structures in large textual datasets.
: A text analysis tool that categorizes words into psychological, emotional, and cognitive dimensions. LIWC uses predefined dictionaries to quantify linguistic features such as emotional tone, social engagement, cognitive processing, and personal concerns.
: A lexicon-based sentiment analysis tool that associates words with eight primary emotions (anger, anticipation, disgust, fear, joy, sadness, surprise, and trust) as well as positive and negative sentiment. The NRC Lexicon is widely used in sentiment and emotion analysis for social media, psychological studies, and computational linguistics, helping researchers quantify emotional content in textual data.
: A rule-based sentiment analysis tool designed to assess the polarity and intensity of emotions in text. VADER incorporates a lexicon of words with assigned sentiment scores and applies heuristics to capture contextual nuances such as negation, intensifiers, and punctuation. It is particularly effective for analyzing social media content, short text snippets, and informal language.