Researchers from Drexel's College of Computing & Informatics have created large language model program that can help people avoid using language online that creates stigma around substance use disorder.
Drug addiction has been one of America’s growing public health concerns for decades. Despite the development of effective treatments and support resources, few people who are suffering from a substance use disorder seek help. Reluctance to seek help has been attributed to the stigma often attached to the condition. So, in an effort to address this problem, researchers at Drexel University are raising awareness of the stigmatizing language present in online forums and they have created an artificial intelligence tool to help educate users and offer alternative language.
Presented at the recent Conference on Empirical Methods in Natural Language Processing (EMNLP), the tool uses large language models (LLMs), such as GPT-4 and Llama to identify stigmatizing language and suggest alternative wording — the way spelling and grammar checking programs flag typos.
“Stigmatized language is so engrained that people often don’t even know they’re doing it,” said Shadi Rezapour, PhD, an assistant professor in the College of Computing & Informatics who leads Drexel’s Social NLP Lab, and the research that developed the tool. “Words that attack the person, rather than the disease of addiction, only serve to further isolate individuals who are suffering — making it difficult for them to come to grips with the affliction and seek the help they need. Addressing stigmatizing language in online communities is a key first step to educating the public and reducing its use.”
According to the Substance Abuse and Mental Health Services Administration, only 7% of people living with substance use disorder receive any form of treatment, despite tens of billions of dollars being allocated to support treatment and recovery programs. Studies show that people who felt they needed treatment did not seek it for fear of being stigmatized.
“Framing addiction as a weakness or failure is neither accurate nor helpful as our society attempts to address this public health crisis,” Rezapour said. “People who have fallen victim in America suffer both from their addiction, as well as a social stigma that has formed around it. As a result, few people seek help, despite significant resources being committed to addiction recovery in recent decades.”
Awareness of stigma as an impediment to treatment has grown in the last two decades. In the wake of America’s opioid epidemic — when strategic, deceitful marketing, promotion and overprescription of addictive painkillers resulted in millions of individuals unwittingly becoming addicted — the general public began to recognize addiction as a disease to be treated, rather than a moral failure to be punished — as it was often portrayed during the “War on Drugs” in the 1970s and ‘80s.
But according to a study by the Centers for Disease Control and Prevention, while stigmatizing language in traditional media has decreased over time, its use on social media platforms has increased. The Drexel researchers suggest that encountering such language in an online forum can be particularly harmful because people often turn to these communities to seek comfort and support.
“Despite the potential for support, the digital space can mirror and magnify the very societal stigmas it has the power to dismantle, affecting individuals’ mental health and recovery process adversely,” Rezapour said. “Our objective was to develop a framework that could help to preserve these supportive spaces.”
By harnessing the power of LLMs — the machine learning systems that power chatbots, spelling and grammar checkers, and word suggestion tools— the researchers developed a framework that could potentially help digital forum users become more aware of how their word choices might affect fellow community members suffering from substance use disorder.
To do it, they first set out to understand the forms that stigmatizing language takes on digital forums. The team used manually annotated posts to evaluate an LLM’s ability to detect and revise problematic language patterns in online discussions about substance abuse.
Once it was able to classify language to a high degree of accuracy, they employed it on more than 1.2 million posts from four popular Reddit forums. The model identified more than 3,000 posts with some form of stigmatizing language toward people with substance use disorder.
Using this dataset as a guide, the team prepared its GPT-4 LLM to become an agent of change. Incorporating non-stigmatizing language guidance from the National Institute on Drug Abuse, the researchers prompt-engineered the model to offer a non-stigmatizing alternative whenever it encountered stigmatizing language in a post. Suggestions focused on using sympathetic narratives, removing blame and highlighting structural barriers to treatment.
The programs ultimately produced more than 1,600 de-stigmatized phrases, each paired as an alternative to a type of stigmatizing language.
Using a combination of human reviewers and natural language processing programs, the team evaluated the model on the overall quality of the responses, extended de-stigmatization, and fidelity to the original post.
“Fidelity to the original post is very important,” said Layla Bouzoubaa, a doctoral student in the College of Computing & Informatics who was a lead author of the research. “The last thing we want to do is remove agency from any user or censor their authentic voice. What we envision for this pipeline is that if it were integrated onto a social media platform, for example, it will merely offer an alternate way to phrase their text if their text contains stigmatizing language towards people who use drugs. The user can choose to accept this or not. Kind of like a Grammarly for bad language.”
Bouzoubaa also noted the importance of providing clear, transparent explanations of why the suggestions were offered and strong privacy protections of user data when it comes to widespread adoption of the program.
To promote transparency in the process, as well as helping to educate users, the team took the step of incorporating an explanation layer in the model so that when it identified an instance of stigmatizing language it would automatically provide a detailed explanation for its classification, based on the four elements of stigma identified in the initial analysis of Reddit posts.
“We believe this automated feedback may feel less judgmental or confrontational than direct human feedback, potentially making users more receptive to the suggested changes,” Bouzoubaa said.
This effort is the most recent addition to the group’s foundational work examining how people share personal stories online about experiences with drugs and the communities that have formed around these conversations on Reddit.
“To our knowledge, there has not been any research on addressing or countering the language people use (computationally) that can make people in a vulnerable population feel stigmatized against,” Bouzoubaa said. “I think this is the biggest advantage of LLM technology and the benefit of our work. The idea behind this work is not overly complex; however, we are using LLMs as a tool to reach lengths that we could never achieve before on a problem that is also very challenging and that is where the novelty and strength of our work lies.”
In addition to making public the programs, the dataset of posts with stigmatizing language, as well as the de-stigmatized alternatives, the researchers plan to continue their work by studying how stigma is perceived and felt in the lived experiences of people with substance use disorders.
In addition to Rezapour and Bouzoubaa, Elham Aghakhani contributed to this research.
Read the full paper here: https://aclanthology.org/2024.emnlp-main.516/
This is an RTE component