Publication Date

12-2022

Date of Final Oral Examination (Defense)

10-17-2022

Type of Culminating Activity

Thesis

Degree Title

Master of Science in Computer Science

Department Filter

Computer Science

Department

Computer Science

Supervisory Committee Chair

Francesca Spezzano, Ph.D.

Supervisory Committee Member

Edoardo Serra, Ph.D.

Supervisory Committee Member

Nasir Eisty, Ph.D.

Abstract

Social media platforms provide users with a powerful platform to share their ideas. Using one’s right to expression to incite hatred toward a particular group of people is inappropriate. However, hate speech is pervasive in our society. Spreading hate through online social networks like Facebook, Twitter, Tiktok, and Instagram is commonplace in today’s milieu. One such case is the unprecedented COVID-19 pandemic, which engendered anti-Asian hate.

In current literature, there is limited study on using user features in conjunction with textual features to detect hate. This thesis aims to combine textual features with user features to improve the state-of-the-art hate speech detection technique. To test our approach, we used four different datasets available in the public domain. We have used various tools to access Twitter APIs to extract required user information, either to use directly or further compute other features using that information.

We have represented the textual features in the form of BERT embeddings and linguistic features. The 97 linguistic measures computed with a Linguistic Inquiry and Word Count (LIWC) tool quantify the text’s cognitive, affective, and grammatical processes. The user feature consisted of demographic, behavioral-based, emotion-based, personality, readability, and writing style features. Our experimental evaluation over three datasets shows that the top twenty linguistic features and the top twenty user features are the best combinations for hate speech detection.

Hate speech is mostly emotionally charged. We further analyzed these user and linguistic features. Among the most intuitive and prominent results was that features like anger, negative emotion, swearing, fear, and annoyance were high in hate speech, while the happiness feature was low.

We compared multiple approaches along with the existing state-of-the-art. We found that the best approach with textual features was combining LIWC features with BERT embeddings. This combination gave us the F₁ of 0.82 and 0.79 on Crowd-sourced (DS1) and Kaggle (DS3), respectively. Followed by this, we identified the top LIWC and user features for hate speech detection. We found that features representing negative emotions like anger, fear, sadness, and annoyance were prominently high in hate speech. Happiness is lower in hate speech. After this, we analyzed the F₁ scores with standalone LIWC and user features. We also used their combinations. We found that the combination of the top twenty LIWC and top twenty user features gives the best F₁ scores of 0.74, 0.90, and 0.64 on DS1, NAACL (DS2), and anti-Asian Covid hate (DS4) dataset.

Finally, we used traditional machine learning algorithms combining BERT embeddings with the top twenty linguistic features and the top twenty user features. We obtained the F₁ scores of 0.78, 0.92, and 0.84 on DS1, DS2, and DS4 respectively. We also compared our approach with other studies using user and textual features.

DOI

https://doi.org/10.18122/td.2035.boisestate

Recommended Citation

Raut, Rohan, "Hate Speech Detection Using Textual and User Features" (2022). Boise State University Theses and Dissertations. 2035.
https://doi.org/10.18122/td.2035.boisestate

Download

Included in

Computer Sciences Commons, Social and Behavioral Sciences Commons

COinS

ScholarWorks

Boise State University Theses and Dissertations

Hate Speech Detection Using Textual and User Features

Publication Date

Date of Final Oral Examination (Defense)

Type of Culminating Activity

Degree Title

Department Filter

Department

Supervisory Committee Chair

Supervisory Committee Member

Supervisory Committee Member

Abstract

DOI

Recommended Citation

Included in

Browse

Links

Search

Author Corner

Links

ScholarWorks

Boise State University Theses and Dissertations

Hate Speech Detection Using Textual and User Features

Author

Publication Date

Date of Final Oral Examination (Defense)

Type of Culminating Activity

Degree Title

Department Filter

Department

Supervisory Committee Chair

Supervisory Committee Member

Supervisory Committee Member

Abstract

DOI

Recommended Citation

Included in

Share

Browse

Links

Search

Author Corner

Links