DOI: 10.1109/ICICCS48265.2020.9120939
Terbit pada 1 Mei 2020 Pada International Conference Intelligent Computing and Control Systems

Classification of Online Toxic Comments Using Machine Learning Algorithms

Jatin Hooda Rahul Harsh Kajla + 1 penulis

Abstrak

Toxic comments are disrespectful, abusive, or unreasonable online comments that usually make other users leave a discussion. The danger of online bullying and harassment affects the free flow of thoughts by restricting the dissenting opinions of people. Sites struggle to promote discussions effectively, leading many communities to limit or close down user comments altogether. This paper will systematically examine the extent of online harassment and classify the content into labels to examine the toxicity as correctly as possible. Here, we will use six machine learning algorithms and apply them to our data to solve the problem of text classification and to identify the best machine learning algorithm based on our evaluation metrics for toxic comments classification. We will aim at examining the toxicity with high accuracy to limit down its adverse effects which will be an incentive for organizations to take the necessary steps.

Artikel Ilmiah Terkait

Jury Learning: Integrating Dissenting Voices into Machine Learning Models

Mitchell L. Gordon Michael S. Bernstein J. Park + 4 lainnya

7 Februari 2022

Whose labels should a machine learning (ML) algorithm learn to emulate? For ML tasks ranging from online comment toxicity to misinformation detection to medical diagnosis, different groups in society may have irreconcilable disagreements about ground truth labels. Supervised ML today resolves these label disagreements implicitly using majority vote, which overrides minority groups’ labels. We introduce jury learning, a supervised ML approach that resolves these disagreements explicitly through the metaphor of a jury: defining which people or groups, in what proportion, determine the classifier’s prediction. For example, a jury learning model for online toxicity might centrally feature women and Black jurors, who are commonly targets of online harassment. To enable jury learning, we contribute a deep learning architecture that models every annotator in a dataset, samples from annotators’ models to populate the jury, then runs inference to classify. Our architecture enables juries that dynamically adapt their composition, explore counterfactuals, and visualize dissent. A field evaluation finds that practitioners construct diverse juries that alter 14% of classification outcomes.

An efficient hybrid system for anomaly detection in social networks

Md. Shafiur Rahman S. Halder U. Acharjee + 1 lainnya

2 Maret 2021

Anomaly detection has been an essential and dynamic research area in the data mining. A wide range of applications including different social medias have adopted different state-of-the-art methods to identify anomaly for ensuring user’s security and privacy. The social network refers to a forum used by different groups of people to express their thoughts, communicate with each other, and share the content needed. This social networks also facilitate abnormal activities, spread fake news, rumours, misinformation, unsolicited messages, and propaganda post malicious links. Therefore, detection of abnormalities is one of the important data analysis activities for the identification of normal or abnormal users on the social networks. In this paper, we have developed a hybrid anomaly detection method named DT-SVMNB that cascades several machine learning algorithms including decision tree (C5.0), Support Vector Machine (SVM) and Naïve Bayesian classifier (NBC) for classifying normal and abnormal users in social networks. We have extracted a list of unique features derived from users’ profile and contents. Using two kinds of dataset with the selected features, the proposed machine learning model called DT-SVMNB is trained. Our model classifies users as depressed one or suicidal one in the social network. We have conducted an experiment of our model using synthetic and real datasets from social network. The performance analysis demonstrates around 98% accuracy which proves the effectiveness and efficiency of our proposed system.

Model Positionality and Computational Reflexivity: Promoting Reflexivity in Data Science

Darren Gergle S. Cambo

8 Maret 2022

Data science and machine learning provide indispensable techniques for understanding phenomena at scale, but the discretionary choices made when doing this work are often not recognized. Drawing from qualitative research practices, we describe how the concepts of positionality and reflexivity can be adapted to provide a framework for understanding, discussing, and disclosing the discretionary choices and subjectivity inherent to data science work. We first introduce the concepts of model positionality and computational reflexivity that can help data scientists to reflect on and communicate the social and cultural context of a model’s development and use, the data annotators and their annotations, and the data scientists themselves. We then describe the unique challenges of adapting these concepts for data science work and offer annotator fingerprinting and position mining as promising solutions. Finally, we demonstrate these techniques in a case study of the development of classifiers for toxic commenting in online communities.

Health Misinformation Detection in the Social Web: An Overview and a Data Science Approach

Marco Viviani Stefano Di Sotto

1 Februari 2022

The increasing availability of online content these days raises several questions about effective access to information. In particular, the possibility for almost everyone to generate content with no traditional intermediary, if on the one hand led to a process of “information democratization”, on the other hand, has negatively affected the genuineness of the information disseminated. This issue is particularly relevant when accessing health information, which impacts both the individual and societal level. Often, laypersons do not have sufficient health literacy when faced with the decision to rely or not rely on this information, and expert users cannot cope with such a large amount of content. For these reasons, there is a need to develop automated solutions that can assist both experts and non-experts in discerning between genuine and non-genuine health information. To make a contribution in this area, in this paper we proceed to the study and analysis of distinct groups of features and machine learning techniques that can be effective to assess misinformation in online health-related content, whether in the form of Web pages or social media content. To this aim, and for evaluation purposes, we consider several publicly available datasets that have only recently been generated for the assessment of health misinformation under different perspectives.

Research on Sentiment Classification Algorithms on Online Review

Ruixia Yan Yanxi Xie Xiaoli Wang + 2 lainnya

8 September 2020

The product online review text contains a large number of opinions and emotions. In order to identify the public’s emotional and tendentious information, we present reinforcement learning models in which sentiment classification algorithms of product online review corpus are discussed in this paper. In order to explore the classification effect of different sentiment classification algorithms, we conducted a research on Naive Bayesian algorithm, support vector machine algorithm, and neural network algorithm and carried out some comparison using a concrete example. The evaluation indexes and the three algorithms are compared in different lengths of sentence and word vector dimensions. The results present that neural network algorithm is effective in the sentiment classification of product online review corpus.

Daftar Referensi

0 referensi

Tidak ada referensi ditemukan.

Artikel yang Mensitasi

0 sitasi

Tidak ada artikel yang mensitasi.