30 votes

New research finds that user affiliations on Reddit can be used to predict which subreddits will turn so toxic they eventually get banned

28 comments

  1. [8]
    vakieh
    Link
    Note: arxiv is NOT PEER REVIEWED and is considered pre-print. I would not consider anything you read on there to be science, so much as 'it might be science'. Wait for the actual publication to...

    Note: arxiv is NOT PEER REVIEWED and is considered pre-print. I would not consider anything you read on there to be science, so much as 'it might be science'. Wait for the actual publication to come out, and check the quality of the venue it is eventually published in.

    18 votes
    1. [6]
      JakeTheDog
      Link Parent
      This statement is misleading and a massive disservice to science (and dissemination of knowledge). There are entire fields (physics, maths, computer science (especially AI research)) that have the...

      I would not consider anything you read on there to be science, so much as 'it might be science'.

      This statement is misleading and a massive disservice to science (and dissemination of knowledge). There are entire fields (physics, maths, computer science (especially AI research)) that have the majority of their publications in pre-print servers. In my own field (molecular biology/medicine) high-calibre labs often release pre-prints to increase visibility and to get early feedback so that the work can be improved prior to submission to a high-ranking journal.

      Moreover, even prestigious journals are not immune to shoddy science and outright fraud.

      Instead of a blanket statement like that, you should be doing a quick review of the paper and do a background check. Here, what stands out to me is a lack of affiliations by any of the authors to any established institute, which would normally give credibility or at least provide a way to identify the authors previous work.

      10 votes
      1. [2]
        cge
        Link Parent
        I'm assuming Nithyanand is the PI here, as the names are not in alphabetical order; he's at University of Iowa. Habib and Bin Musa are (very recent, apparently) PhD students with Nithyanand. I'm...

        Here, what stands out to me is a lack of affiliations by any of the authors to any established institute, which would normally give credibility or at least provide a way to identify the authors previous work.

        I'm assuming Nithyanand is the PI here, as the names are not in alphabetical order; he's at University of Iowa. Habib and Bin Musa are (very recent, apparently) PhD students with Nithyanand.

        I'm not sure why this isn't in the paper. I'd suspect this had something to do with not wanting to be too closely linked to the work if it wasn't acknowledged on the authors' sites, but Nithyanand does include it in his CV. Perhaps this is a convention for preprints in that specific field, or a peculiarity of the authors?

        I know that I usually don't include affiliations on my title slides for talks, and several others in my field also omit them, but a paper is quite different, even a preprint.

        1 vote
        1. JakeTheDog
          Link Parent
          Yea exactly. Best I can think off is that their institute doesn't want to be affiliated with the work. Which is what happened to me once when I did a rogue collaboration (not that it was a bad...

          Yea exactly. Best I can think off is that their institute doesn't want to be affiliated with the work. Which is what happened to me once when I did a rogue collaboration (not that it was a bad thing, just a technicality).

          1 vote
      2. [3]
        vakieh
        Link Parent
        The statement is not misleading at all. There is no field where the majority of their publications are only in pre-print servers - and those pre-print servers are intended for feedback from peers...

        The statement is not misleading at all. There is no field where the majority of their publications are only in pre-print servers - and those pre-print servers are intended for feedback from peers and to help boost H-indices for promotion, not for dissemination to the public (because I stand by my claim that it is not science - yet). If it is on arxiv, it might be science. If you can later find it somewhere else, say for instance the ACM conference proceedings the earlier paper mentioned in this thread was published in, then you can be more confident that it is - but anyone and their dog can throw crap at arxiv and it will stick.

        No, the fact the other paper was published at an ACM conference doesn't make it infallible - no venue is perfect no matter how much peer review they do. But peer review is our best and in many cases only defence against garbage work, and I am not going to let it be ignored.

        you should be doing a quick review of the paper and do a background check

        This is what journal/conference prestige is for - because I as an academic am capable of doing that, but this is a learned skill that many others cannot apply effectively.

        1 vote
        1. JakeTheDog
          Link Parent
          There are no stats on this but suffice to say fast-developing fields, particularly in computer science like AI, do have a lot of pre-prints. But don't confuse that with exclusivity. It's not the...

          There is no field where the majority of their publications are only in pre-print servers

          There are no stats on this but suffice to say fast-developing fields, particularly in computer science like AI, do have a lot of pre-prints. But don't confuse that with exclusivity. It's not the sole repository for their work. As I said, it is often used as a testing grounds. AI and other computational fields often do leave there work there because of the short lifetimes anyways.

          But peer review is our best and in many cases only defence against garbage work, and I am not going to let it be ignored

          Sure, I agree with you there. But to be so blatantly dismissive just because it doesn't have a "gold star" is a huge problem and a disservice to science. Officiated peer-review does not make science.

          Critical thinking is a necessary skill for everyone in every situation. Relying on something so simple (and often arbitrary - there are plenty of predatory and pay-for-publication journals out there, and prestige itself is not obvious to most) impedes efforts to produce a science-literate society.

          1 vote
        2. cge
          Link Parent
          Surprisingly, there is: string theory. I will refrain from speculating as to what this implies about the field.

          The statement is not misleading at all. There is no field where the majority of their publications are only in pre-print servers

          Surprisingly, there is: string theory. I will refrain from speculating as to what this implies about the field.

    2. RapidEyeMovement
      Link Parent
      Awesome, thanks! That is important information

      Awesome, thanks! That is important information

      1 vote
  2. [14]
    shiruken
    (edited )
    Link
    arXiv publication: H. Habib, M. B. Musa, F. Zaffar, and R. Nithyanand, To Act or React? Investigating Proactive Strategies For Online Community Moderation (2019). This is only briefly mentioned in...

    arXiv publication: H. Habib, M. B. Musa, F. Zaffar, and R. Nithyanand, To Act or React? Investigating Proactive Strategies For Online Community Moderation (2019).

    This is only briefly mentioned in the Mother Jones article, but this study contradicts a prior study regarding the efficacy of subreddit bans. That previous study of r/FatPeopleHate and r/CoonTown found that banning the subreddits resulted in a reduction in hate speech from the subreddits' users and prompted many users to depart the platform entirely. This study of over 3000 subreddits found this to not be the case:

    Bans and quarantines of dangerous communities do not reduce offensiveness of the impacted members. Looking at Figure 4a and Figure 4c we can see that there is no significant difference in incidence rates of offensive comments before and after a ban or quarantine event for our treatment users.

    Impacted community members of bans and quarantines simply move to other hate subreddits. We see from Figure 4b that users impacted by community bans continue a high level of engagement in other hate communities. Surprisingly, our data shows (Figure 4d) that when communities get quarantined, their users typically increase their participation in hate communities significantly – after a lag of several months (and possibly after the quarantined community was banned).

    Taken together, our analysis shows that participation in dangerous and hateful subreddits does have an impact on a users behavior in the wider community and current administration strategies of bans and quarantines are not effective for mitigating the impact of this participation on individual users.

    13 votes
    1. [7]
      mike10010100
      Link Parent
      It's almost like the solution is to ban hate communities and tightly moderate the site, something that, unless it's built into the concept from the start, won't scale linearly with increased user...

      Surprisingly, our data shows (Figure 4d) that when communities get quarantined, their users typically increase their participation in hate communities significantly – after a lag of several months (and possibly after the quarantined community was banned).

      It's almost like the solution is to ban hate communities and tightly moderate the site, something that, unless it's built into the concept from the start, won't scale linearly with increased user count.

      20 votes
      1. [6]
        vakieh
        Link Parent
        It won't scale linearly no matter what you do.

        unless it's built into the concept from the start, won't scale linearly

        It won't scale linearly no matter what you do.

        1 vote
        1. [5]
          mike10010100
          Link Parent
          I disagree. I think a mix of AI sentiment analysis and keyphrase detection, as well as a robust community of site-level moderators (not subreddit-level, for example), combined with site-wide rules...

          I disagree. I think a mix of AI sentiment analysis and keyphrase detection, as well as a robust community of site-level moderators (not subreddit-level, for example), combined with site-wide rules banning hate speech and bigotry, would work out very well.

          1 vote
          1. [2]
            vakieh
            Link Parent
            You aren't talking about the same thing. There is a fairly horrendous diseconomy of scale when it comes to moderation - you either reduce quality or increase cost at a higher rate proportional to...

            would work out very well
            scale linearly

            You aren't talking about the same thing. There is a fairly horrendous diseconomy of scale when it comes to moderation - you either reduce quality or increase cost at a higher rate proportional to the growth. This is easily observable in any community you want to look at.

            1. mike10010100
              Link Parent
              Perhaps if you switch to a paid model or conscripted some users to swarm-moderate other users while combining it with some AI sentiment detection, you could find a way to distribute the moderation...

              Perhaps if you switch to a paid model or conscripted some users to swarm-moderate other users while combining it with some AI sentiment detection, you could find a way to distribute the moderation among more than a handful of people.

          2. [2]
            Enoch
            Link Parent
            You're not wrong in principle, but when you see this in action IRL the targets scatter and congregate at 10 new sites. Better to know where they are. And most importantly, put anyone with their...

            You're not wrong in principle, but when you see this in action IRL the targets scatter and congregate at 10 new sites. Better to know where they are.

            And most importantly, put anyone with their back to the wall and see what happens.

            I'm for a friendly co-existence with anyone, but purely logically speaking my bigots are someone else's champions, and who am I to force my view? I've seen enough people deflate if you just pay them no mind and keep saying "Hi" in the street.

            1. mike10010100
              Link Parent
              Studies have shown that banning hate subreddits doesn't cause them to "stick around". It causes an overall reduction in hate speech. I mean I'm pretty sure we can all universally agree that Nazis...

              but when you see this in action IRL the targets scatter and congregate at 10 new sites. Better to know where they are.

              Studies have shown that banning hate subreddits doesn't cause them to "stick around". It causes an overall reduction in hate speech.

              I'm for a friendly co-existence with anyone, but purely logically speaking my bigots are someone else's champions, and who am I to force my view?

              I mean I'm pretty sure we can all universally agree that Nazis are bad, no? We stamped out their ideology before and if they rise up again we should do everything possible to ensure they stay gone.

              3 votes
    2. [4]
      Gaywallet
      Link Parent
      I really wish they had more deeply investigated the content of what was being posted in these subs rather than just the connections. While it's great that H2 and H3 are validated and this gives a...

      I really wish they had more deeply investigated the content of what was being posted in these subs rather than just the connections. While it's great that H2 and H3 are validated and this gives a proposed method of identifying hateful subs, it's unfortunately almost entirely based on the connection of one particular sub to other hateful subs (through both user connection and direct community connection). I could see the potential for bad actors to evolve in how they participate by creating fresh accounts to post in these communities and sever/obfuscate any link or connection that they have to other hateful communities in order to outsmart an ML algorithm like this.

      Regardless, it does serve as a relatively accurate model for today, and anything that makes it easier to identify, monitor, and take action on hateful communities is a step in the right direction.

      12 votes
      1. [3]
        skybrian
        Link Parent
        Yes, it sounds like the researchers came up with a tool that Reddit could use to find subreddits they should take a closer look at. But my guess is that just being reactive is taking a significant...

        Yes, it sounds like the researchers came up with a tool that Reddit could use to find subreddits they should take a closer look at. But my guess is that just being reactive is taking a significant amount of resources already.

        2 votes
        1. [2]
          NaraVara
          Link Parent
          This seems like a smarter method of being reactive that’s more rational and uniformly applied. I’d actually imagine this would be really useful for online gaming/matchmaking too. There are...

          This seems like a smarter method of being reactive that’s more rational and uniformly applied.

          I’d actually imagine this would be really useful for online gaming/matchmaking too. There are definitely personality types that feed each other’s toxicity and even if you’re not banning them, avoiding matching them in groups with each other could be big.

          1 vote
          1. Lawrencium265
            Link Parent
            It seems like it would be in reddits interest to abuse such a tool. Engagement is a metric that they can share with their advertisers, and what better way to drive engagement than to put people...

            It seems like it would be in reddits interest to abuse such a tool. Engagement is a metric that they can share with their advertisers, and what better way to drive engagement than to put people with opposing views in front of each other?

            1 vote
    3. [2]
      goodbetterbestbested
      Link Parent
      IIRC, the previous study didn't find that there was a reduction in the hate speech of those individual users, as though the users themselves were self-moderating. The reduction was in hate speech...

      IIRC, the previous study didn't find that there was a reduction in the hate speech of those individual users, as though the users themselves were self-moderating. The reduction was in hate speech overall because many of those users stopped using the site. Some of them move to other hate subreddits, but not 100%. Thus over time, the study suggested that consistent moderation would, in fact, lower hate speech as expected.

      I don't think the studies are contradictory: this conclusion is about the behavior of the individual members, the other one was about incidence of hate speech overall. But please correct me if I'm misremembering the prior study, because I don't have it at hand.

      12 votes
      1. NaraVara
        Link Parent
        Are they leaving the site or are they just no longer having the hate speechy memes and methods of engagement reinforced and validated? Little from column A little from column B?

        Are they leaving the site or are they just no longer having the hate speechy memes and methods of engagement reinforced and validated?

        Little from column A little from column B?

        2 votes
  3. [3]
    Thunder-ten-tronckh
    Link
    Firstly, I think it's incredible that they can predict the evolution of toxicity with 70% to 90% accuracy. Regardless of the how that's defined, it's fascinating how influential shared communities...

    Firstly, I think it's incredible that they can predict the evolution of toxicity with 70% to 90% accuracy. Regardless of the how that's defined, it's fascinating how influential shared communities are at driving the nature of discourse in other subreddits.

    I do have a question though—curious what y'all make of the way they define hateful subreddits:

    We select 118 subreddits that have been frequently reported by redditors and the media for violating the Reddit content policy, yet have not been banned or quarantined by administrators. This list is compiled by analyzing the r/againsthatesubreddits and r/SubredditDrama to identify the most frequently user-reported subreddits. Additional subreddits are manually added to this list based on media reports. Examples of subreddits in this category include r/metacanada and r/KotakuInAction. This dataset was used to understand the evolution and life-cycle of frequently reported subreddits.

    Doesn't it seem a little, idk, limiting to judge something as "hateful" off of user reports? I ask this because on Reddit, a site absolutely ridden with ideological warfare (left vs. right, religion vs. atheism, Packers vs. Bears, etc.), I'm skeptical that all reports are honest reports. I'd be willing to bet a similar study could be commissioned that examined a user's likeliness of reporting something based on the communities they're involved in.

    10 votes
    1. [2]
      Deimos
      (edited )
      Link Parent
      Yeah, I'd really like to see the full list of subreddits they ended up using. Going off posts in /r/AgainstHateSubreddits could be a reasonable approach, but using /r/SubredditDrama doesn't make...

      Yeah, I'd really like to see the full list of subreddits they ended up using. Going off posts in /r/AgainstHateSubreddits could be a reasonable approach, but using /r/SubredditDrama doesn't make any sense since posts there don't (generally) focus on hateful behavior at all and aren't intended to be "reports". It's not even related to honesty in that case, that's just a poorly chosen data source.

      I've only skimmed the paper, but there doesn't seem to be much specific info about the subreddits analyzed in it. I did notice that in Table 5 on page 7, they have /r/GalaxyNote7 included in the "DH" set, which is "hateful subreddits". That seems... blatantly wrong to me, and was probably because of this highly-upvoted SubredditDrama post from about 2.5 years ago. It's an extremely inactive subreddit (only 13 submissions in the last year) that has no connection to hatefulness.

      12 votes
      1. sublime_aenima
        Link Parent
        That makes sense. They probably started their study around that time frame. For many casual users of reddit, /r/SubredditDrama was very much seen as a callout sub similar to AHS. This is because...

        That makes sense. They probably started their study around that time frame. For many casual users of reddit, /r/SubredditDrama was very much seen as a callout sub similar to AHS. This is because many posts that center around “bad” behavior were the most commented on and often the highest upvoted. There were several different groups that had used SRD as one of their main subs for finding toxic behavior, although I never paid attention to whether or not they actually ever published anything.

        3 votes
  4. [3]
    Algernon_Asimov
    Link
    So... people who post hateful shit in one place will also post hateful shit in other places. Nice to know. Well, duh. Anyone who has watched Reddit in action over the past few years already knows...

    a subreddit’s descent into hate can be fairly reliably predicted by the number of its members who are a part of already banned communities or who are a part of other hateful, but not yet banned communities.

    So... people who post hateful shit in one place will also post hateful shit in other places. Nice to know.

    the researchers’ second conclusion is that the company often ignores such signals, and instead only haphazardly enforces its platform rules, often in the wake of media attention on certain, particularly problematic subreddits.

    the team’s research suggesting that Reddit, rather than impose clear and consistent moderation standards, usually only takes action after media stories highlight a particularly egregious subreddit.

    Well, duh. Anyone who has watched Reddit in action over the past few years already knows this is how Reddit's admins operate. Until a subreddit gets reported in a media article, it can get away with just about anything. As soon as Reddit gets bad press, the banhammer falls.

    I have to say, the results of this study definitely fall into the "water is wet" category of scientific endeavours. It's nice to have the evidence, but everyone already knows this.

    6 votes
    1. [2]
      gergir
      Link Parent
      Right!! And banning a whole community isn't fair. Isn't it usually just a few loudmouths who spoil things for everyone else? Image they did that in crime neighbourhoods: "hello, please move; we're...

      So... people who post hateful shit in one place will also post hateful shit in other places. Nice to know.

      Right!! And banning a whole community isn't fair. Isn't it usually just a few loudmouths who spoil things for everyone else? Image they did that in crime neighbourhoods: "hello, please move; we're demolishing your building, because Mrs Hate at #13 is throwing stuff at pedestrians"

      1 vote
      1. Algernon_Asimov
        Link Parent
        Not in some communities. /r/FatPeopleHate and /r/CoonTown were among the first high-profile subreddits that got themselves banned, and they weren't banned just because of a few loudmouths. The...

        Isn't it usually just a few loudmouths who spoil things for everyone else?

        Not in some communities. /r/FatPeopleHate and /r/CoonTown were among the first high-profile subreddits that got themselves banned, and they weren't banned just because of a few loudmouths. The whole purpose of those subreddits was to hate people - fat people and black people, respectively. The majority of posts & comments by the majority of people in those communities were rude and hateful.

        When the Reddit admins ban a subreddit, it usually means that whole community is toxic - not just a few loudmouths.

        9 votes