15 votes

App/browser extension idea if it doesn't already exist: likely bot database

I just finished reading I hate the new internet post, in which the OP stated:

Every social medium is just bots. The front page of Reddit is easily 35% easily detectable bots at least and who knows what the rest is comprised of.

Why couldn't we create a bot database, which I imagine would work similarly to uBlock for ads? There would be a number of signals to attempt to classify users of social media sites (likely human, likely bot, etc.) in addition to user-provided feedback ("I think this person is a bot" or "this account is me -- definitely not a bot").

An extension could then be attached to the database to provide visual changes to social media platforms ("WARNING! LIKELY BOT!") or simply hide bot posts/comments.

Off the top of my head, some bot signals:

  • Posting known duplicate posts with political motivation (e.g. on Reddit you see the same exact post about how the tariffs will create a stronger America by different posters) [strong indicator]
  • Usernames that follow the lazy bot format, e.g., Pretentious_Rabbit_2355 [weak indicator]
  • Usage of AI-generated or ripped off profile pictures, post images, etc. [strong indicator]
  • etc.

On the crowdsourced side, there would have to be some rules in place to prevent profile bombing, etc.

All in all, I could see something like this adding a bit of human value back to the various social media platforms AND I would think it would lead to higher advertisement click rates (bots will become less valuable over time on a given platform and decide to invest their resources elsewhere, while "human" user engagement increases at the same time).

If this concept already exists, I apologize. I only did a very quick google.

13 comments

  1. [7]
    DistractionRectangle
    (edited )
    Link
    Trust is difficult. Once this reaches a certain scale, you'll have bots trying to game the system and report other bots as human in order to evade the filter. So there's the meta problem of...

    Trust is difficult. Once this reaches a certain scale, you'll have bots trying to game the system and report other bots as human in order to evade the filter. So there's the meta problem of shorting your own user based between bots/mal actors and real users. How do you do that without invading the privacy of your users?

    Edit: There's also other meta problems, like identifying AI//ripoff imagery, and taking a viability survey to understand the real correlation between Ai profile pictures and bot accounts. Like that'll probably work for Facebook where profile Pic strongly tied to identity, but probably won't on forums/platforms like discord.

    I very much like this idea, I would suggest you start with a singular target platform (like target bot accounts on reddit ), and branching out to other platforms once the solution has matured//been proven.

    19 votes
    1. [4]
      vord
      Link Parent
      We're back to the good old days: Web of Trust. You have your 'inner circle'. The people you know IRL. You put them at the highest level trust. Work your way out to 'untrusted' through multiple...

      We're back to the good old days: Web of Trust.

      You have your 'inner circle'. The people you know IRL. You put them at the highest level trust.

      Work your way out to 'untrusted' through multiple levels and such.

      Ideally there is a localized search engine for each person which can index their authenticated and tightly-trusted systems. De-commodify search indexers by having each person be their own search engine which can share public results with trusted parties.

      11 votes
      1. DistractionRectangle
        Link Parent
        I actually quite like this idea. This can play quite nicely in weighting user feedback to flag social media accounts. Like you could have an algorithm that weights user feedback by: how far they...

        I actually quite like this idea. This can play quite nicely in weighting user feedback to flag social media accounts.

        Like you could have an algorithm that weights user feedback by:

        • how far they are removed from the root node (you; the threshold for reports needed to flag a social increases as you get further from the root)

        • how far they are from each other (how spread out they are in the invitee tree; less weight is given to clusters, more weight is given reports from users far from each other)

        • historical accuracy (how often their reports aligned with the consensus).

        Combine this with rate limiting on invites, a heuristic to remove users that misreport too often, etc and you have a solid way to boot strap trust and self regulate

        4 votes
      2. [2]
        deathinactthree
        Link Parent
        This is theoretically quite possible, by potentially combining a number of tools/techniques that already exist, such as: Google Programmable Search Engine: Doesn't incorporate a trust network, but...

        This is theoretically quite possible, by potentially combining a number of tools/techniques that already exist, such as:

        • Google Programmable Search Engine: Doesn't incorporate a trust network, but you can create a "localized" (limited to trusted sources) search engine. It's still using Google though.
        • NewsGuard: Browser extension that provides crowd-sourced trust ratings for online news sources.
        • FakerFact: Browser extension that's not crowdsourced, it uses LLMs to categorize articles as satirical, journalistic, sensational, or agenda-driven. Depends on your comfort level with using "AI" but could act as a backstop to the wisdom of crowds.

        Note that I'm not at all endorsing any of the above, I don't use them myself. Just saying that there are enough pieces of the solution out there to imply that the solution could exist.

        1 vote
        1. vord
          Link Parent
          Apache Sol is a self-hostable search engine framework, so that could be a google alternative.

          Apache Sol is a self-hostable search engine framework, so that could be a google alternative.

          1 vote
    2. [2]
      creesch
      Link Parent
      Also, the other way around. People reporting other people they have beef with as bots. If anything moderating big communities has taught me is how creative people can get in riling up a mob and...

      Once this reaches a certain scale, you'll have bots trying to game the system and report other bots as human in order to evade the filter.

      Also, the other way around. People reporting other people they have beef with as bots. If anything moderating big communities has taught me is how creative people can get in riling up a mob and getting that mob to do stupid stuff like reporting accounts as bots.

      There is also a whole gray area where it simply isn't clear if someone is just not fluent in English, just a bit odd or truly a bot. It's difficult to deal with that as well.

      7 votes
      1. LewsTherinTelescope
        Link Parent
        That last part also makes other indicators that often get suggested like something "feeling AI" risky, because (on top of the risk of false positives) some people who aren't very fluent do use...

        That last part also makes other indicators that often get suggested like something "feeling AI" risky, because (on top of the risk of false positives) some people who aren't very fluent do use LLMs to fix up their grammar and such, so they'd be accurately flagged but are legitimately a real person.

        2 votes
  2. post_below
    Link
    One thing to keep in mind... what people call 'bots' are more often actually humans in poorer countries. Bot has just become a catch all term for low effort posts and 'fake' accounts for purposes...

    One thing to keep in mind... what people call 'bots' are more often actually humans in poorer countries. Bot has just become a catch all term for low effort posts and 'fake' accounts for purposes of karma farming or political manipulation or etc..

    From a technical perspective I'm not sure there's a good way to reliably identify these kinds of users from a browser extension. You'd need access to data only the platform has or, at the very least, full post history on every account. Even then you'd have to make hard decisions about false positives versus false negatives.

    Or alternatively you could crowdsource it. The challenge there is reaching the critical mass of users necessary to make it work while being mostly useless until you get there.

    In either case the heavy lifting would need to be done by a service you built and hosted rather than in browser.

    One other option, ironically, would be using an LLM bot to evaluate the text. A lot of the posts are clearly made by people who can barely write or understand english and an LLM would likely do a good job of identifying them. I've been wondering why people, and Redditors in particular, continue to upvote those posts.

    5 votes
  3. vord
    Link
    I like this idea. Some things that would need to happen: There would need to be a trust system. Essentially a keyserver with a trust level tied to how closely a given key is tied to a 'verified...

    I like this idea. Some things that would need to happen:

    There would need to be a trust system. Essentially a keyserver with a trust level tied to how closely a given key is tied to a 'verified real person' identity. And a way to revoke and raise/lower trust levels accordingly.

    There would need to be a eventually-consistent database that can work with fluctuating network conditions globally. Something like couchdb

    4 votes
  4. [2]
    sunset
    Link
    It's a terrible idea. I've been called a bot on reddit a number of times, even though I never post low-effort content, I just happen to sometimes openly disagree with the circlejerk. The internet...

    It's a terrible idea. I've been called a bot on reddit a number of times, even though I never post low-effort content, I just happen to sometimes openly disagree with the circlejerk. The internet already has a problem with echo-chambers, tools like the one you propose will just quadruple the effect by strongly disincentivizing (and straight up hiding) disagreement. Every Trumpist who dares post in r/politics will be marked as a "bot".

    Long term any automated checking will become useless. Completely copy-pasted comments are mostly a thing of the past, nowadays bots are using LLMs more and more. All efforts to detect whether text was generated by AI is riddled with false positives and false negatives. There already exist tools in academia to detect for ChatGPT and they are performing laughably bad. And that's with today's AI, I can only imagine how effective you will be able to distinguish it once the models improve and become even better.

    2 votes
    1. [2]
      Comment deleted by author
      Link Parent
      1. sunset
        Link Parent
        If the text generated by AI looks exactly like something a regular person would write, then there is no way to distinguish it from a person, since it looks the same. It doesn't matter whether you...

        If the text generated by AI looks exactly like something a regular person would write, then there is no way to distinguish it from a person, since it looks the same. It doesn't matter whether you use AI to detect it, or other tools, or even crowdsourcing it to actual people to check - if it looks the same, it looks the same.

  5. [2]
    HeroesJourneyMadness
    Link
    I would love to throw in on this. Not that I have a lot of experience in this area - I worked through exactly one “make an extension” tut a decade ago… but I’d happily wrench on this with someone...

    I would love to throw in on this. Not that I have a lot of experience in this area - I worked through exactly one “make an extension” tut a decade ago… but I’d happily wrench on this with someone who knows what they’re doing.

    Love this idea. I’ve had variations on this for years but your implementation sounds so reasonable. Is Greasemonkey still a thing? It could probably start as just a script.

    1 vote
    1. TheFireTheft
      Link Parent
      I'm all for quick and easy starting points. Unfortunately, I have even less experience with extension dev (zero, to be exact). EDIT: uBlock is open source. There are probably a bunch of concepts...

      I'm all for quick and easy starting points. Unfortunately, I have even less experience with extension dev (zero, to be exact).

      EDIT: uBlock is open source. There are probably a bunch of concepts that could be used, including the cross-browser extension builds and how they handle user contributions.

      1 vote