Unveiling the Digital Terrain: Exploring the Landscape of Cyberbullying on Twitter Through the Analysis of Common Toxic Words Used by Cyberbully

Nandini Narayana, Sharmin Jahan

Research output: Contribution to conferencePosterpeer-review


Introduction: Cyberbullying is a serious issue that has become more prevalent in our digital age. The negative effects on the mental health of cyberbullying victims can be severe, and it can lead to anxiety, depression, and even suicidal thoughts. Addressing cyberbullying requires a multifaceted approach that involves individuals, communities, educators, online platforms, and policymakers. Creating a safer and more inclusive digital environment involves responding to incidents of cyberbullying and prevention efforts through awareness campaigns and promoting positive online behaviors. However, designing an effective cyberbullying prevention awareness program can be challenging due to the ever-evolving nature of online interactions. Analyzing toxic words plays a crucial role in staying ahead of evolving tactics and adapting prevention efforts accordingly. This study aims to conduct a trend analysis of cyberbullying based on natural language processing (NLP) to gain a deeper understanding of emerging trends and patterns in cyberbullying behavior.

Methods: We have applied NLP techniques using sentiment analysis to online text posts that are related to cyberbullying. By doing so, we are able to understand the trends over time. One of the models that we used is the naive Bayes (NB) classification model, which is a probabilistic machine learning model based on Bayes’ theorem. It is commonly used for text classification and assumes independence among features. To perform the analysis, we used a Tweets cyberbullying dataset that contains #### tweets related to cyberbullying. Before running the NB model, we preprocessed the dataset by tokenizing, removing stop words, and converting them to lowercase. The NB model classifies the tweets into two categories: either "not cyberbully" or if they are "cyberbully," then they are further categorized into "race," "gender," "religion," and "ethnicity." After classification, we calculated the frequency distribution of words in each category to identify the commonly used toxic words for each cyberbully category. Throughout the analysis process, we leveraged Python libraries like NLTK or spaCy.

Results: The study investigates trend analysis of the most commonly used toxic words in different types of cyberbullying tweets. We listed the toxic words mapped with the cyberbullying category related to “race,” “gender,” “religion,” and “ethnicity,” along with the frequency distribution.

Conclusion: The study systematically investigates the association of toxic words related to different categories of cyberbullying from Twitter post content. It provides valuable insights into the prevalence and patterns of cyberbullying on social media platforms. Understanding the contextual factors of cyberbullying is crucial in preventing and combating such behavior. The findings from the study can contribute to the ongoing efforts to make the online space safe and inclusive for everyone.
Original languageAmerican English
StatePublished - 16 Feb 2024
Oklahoma State University Center for Health Sciences Research Week 2024
- Oklahoma State University Center for Health Sciences, Tulsa, United States
Duration: 13 Feb 202417 Feb 2024


Oklahoma State University Center for Health Sciences Research Week 2024
Country/TerritoryUnited States
Internet address


  • cyberbully
  • digital platforms
  • safe online space
  • natural language processing
  • trend analysis


Dive into the research topics of 'Unveiling the Digital Terrain: Exploring the Landscape of Cyberbullying on Twitter Through the Analysis of Common Toxic Words Used by Cyberbully'. Together they form a unique fingerprint.

Cite this