9 votes

Investigating toxicity changes of cross-community Redditors from two billion posts and comments

5 comments

  1. [5]
    Amarok
    Link
    The citations in this article are a goldmine for related research. I think we all knew this already, but it's good to have the numbers and the rigor.

    In this research, we investigated users’ toxic cross-community behavior based on the toxicity of their posts and comments. Our fine-tuned BERT model achieved a classification accuracy of 91.27% and an average F1 score of 0.79, showing a 2% and 7% improvement in performance compared to the best-performing baseline models based on neural networks. We addressed RQ1 by running a prediction experiment on the posts and comments from our Reddit collection. The analysis showed that 9.33% of the posts are toxic, and 17.6% of the comments are toxic. We answered RQ2 by investigating the changes in the toxicity of users’ content across communities based on two primary conditions.

    First, our analysis showed that 30.68% of posting users showed changes in their toxicity levels. Moreover, 81.67% of commenting users showed changes in their toxicity levels, mainly across multiple communities. Moreover, we found through answering RQ3 that, over time, toxicity disperses with an increase in the number of participating users and the frequency of cross-community participation. This finding is helpful because it can provide community moderators with leads to help them track patterns from active users to prevent them from spreading toxic content online.

    Lastly, we conducted a Granger causality test between the volume of comments, the volume of links in comments, and the volume of toxicity. We found that links in comments can influence toxicity within those comments. This research addresses a prominent issue in social media platforms: toxic behavior negatively impacts other users’ experience. Thus, we believe it is necessary to conduct more research on users’ toxic behavior to help us understand the behavior’s dynamics.

    The citations in this article are a goldmine for related research. I think we all knew this already, but it's good to have the numbers and the rigor.

    6 votes
    1. [2]
      vektor
      Link Parent
      Don't have the time to dig too deep right now, so I'm just going to go off your comment here and will read more later. Still wanna leave here that Granger causality is not proper causality. It's...

      Don't have the time to dig too deep right now, so I'm just going to go off your comment here and will read more later. Still wanna leave here that Granger causality is not proper causality. It's my understanding that you can't test for proper causality without performing interventions. See also clinical trials: Observational studies (which this seems to be, basically) are much weaker than randomized controlled trials, which perform interventions and thus prove proper causality.

      Granger causality can be computed without interventions, but can't demonstrate proper causality, merely correlation.

      6 votes
      1. Amarok
        Link Parent
        Thanks for the clarification.

        Thanks for the clarification.

        2 votes
    2. [2]
      NoblePath
      Link Parent
      This may be in TFA, but did they control for bots?

      This may be in TFA, but did they control for bots?

      4 votes
      1. Amarok
        Link Parent
        They did indeed.

        They did indeed.

        Additionally, we excluded bot users (i.e., automated accounts) to avoid potential biases in the subsequent analysis by using a publicly available list of 320 bots on Reddit (https://www.reddit.com/r/autowikibot/wiki/redditbots; retrieved on May 13, 2019). Since the available bot list is outdated, it potentially misses newer bot accounts. Thus, for this work, we used the Pushshift API (https://pushshift.io; retrieved on May 22, 2019) to retrieve a list of accounts with a minimal comment reply delay. Setting the comment reply delay to 30 s allowed us to find more bot accounts that quickly reply to other users. We removed additional bot accounts by combining the bot list and Pushshift API list. When conducting this study, we found 37 bot accounts that produce around 2% of automated content. The massive volume of bot-generated content reaffirms the importance of removing bots in the data-cleaning phase of this study.

        5 votes