21 votes

What Tildes statistics would you be interested in seeing?

It has been a while since I last posted some statistics about Tildes, however this time I'd like to ask you Tilderidoos what you would like to see.

Do note that I can only get statistics based on what's publicly visible on the site, so things like site traffic or server performance etc are out of my reach. Deimos would have to provide those if anyone is interested in that. ;);) I will reply to each suggestion if I can or can't do it (and will or won't) and edit in the ones I will do.

As an apéritif (and to get my scraper/chart-maker working again) I have started with 2 charts for you (the blue annotations are invite rounds):

Click to view planned statistics.
  • Activity (posts over time).
  • Posts deleted by author vs. removed by site admin.
  • Posts per group.
  • Most frequent tags.
  • Unique topic/comment authors per week.
  • Posts over time for each group.
  • Average time for a comment to receive a reply.
  • Given a top-level comment with x votes, how many replies does its thread have?
  • Histogram of time between subsequent comments from a user.
  • Activity per hour (per day).

31 comments

  1. [2]
    Soptik
    Link
    Number of users posting at least one comment/post in a timeframe (say a week)?

    Number of users posting at least one comment/post in a timeframe (say a week)?

    8 votes
    1. Bauke
      Link Parent
      Unique topic/comment authors per week, can do!

      Unique topic/comment authors per week, can do!

      5 votes
  2. [4]
    deing
    Link
    Average time between comments and a reply to them being written and thread size vs top comment vote count would be nice.

    Average time between comments and a reply to them being written and thread size vs top comment vote count would be nice.

    7 votes
    1. [3]
      Bauke
      Link Parent
      Just so I’m getting the ands right in your comment: Average time for a comment to receive a reply. Amount of comments vs. highest voted comment? This one is a little unclear.

      Just so I’m getting the ands right in your comment:

      • Average time for a comment to receive a reply.
      • Amount of comments vs. highest voted comment? This one is a little unclear.
      4 votes
      1. [2]
        deing
        Link Parent
        You got the ands right :p The idea for #2 is: Given a top level comment with x votes, how many replies does its thread have?

        You got the ands right :p
        The idea for #2 is: Given a top level comment with x votes, how many replies does its thread have?

        3 votes
        1. Bauke
          Link Parent
          Aha, I see. Added them!

          Aha, I see. Added them!

          1 vote
  3. cfabbro
    Link
    Awesome, can't wait to see all that you come up with @Bauke! And speaking of stats, some more internally tracked ones would be cool to see too. *wink* *wink* *nudge* *nudge* @Deimos. ;)

    Awesome, can't wait to see all that you come up with @Bauke! And speaking of stats, some more internally tracked ones would be cool to see too. *wink* *wink* *nudge* *nudge* @Deimos. ;)

    5 votes
  4. [5]
    hungariantoast
    (edited )
    Link
    The number of topics/comments per day or week for each group. Ahem ;)

    The number of topics/comments per day or week for each group.

    Ahem

    ;)

    4 votes
    1. [4]
      Bauke
      Link Parent
      So the activity charts I have now but for each group individually? That's easy to do. :P

      So the activity charts I have now but for each group individually? That's easy to do. :P

      3 votes
      1. [3]
        hungariantoast
        Link Parent
        Two more I thought of: The users with the most topic edits? (Excluding ~test.) How soon, on average, does each user edit topics after the topics are posted? (Some users edit topics as they are...

        Two more I thought of:

        • The users with the most topic edits? (Excluding ~test.)
        • How soon, on average, does each user edit topics after the topics are posted? (Some users edit topics as they are posted, other users edit old topics to make tags more unifrom.)

        I know this data is only temporary, since topic logs expire, but I would be interested in seeing a snapshot of the data.

        Also, will you be re-doing the graphs and info from the last time you posted statistics? I would be quite interested in seeing how the users/groups/votes info has changed (especially since I have been way more active in posting topics since then).

        3 votes
        1. [2]
          Bauke
          Link Parent
          Can you clarify what you mean with topic edits since it's a little ambiguous? If you mean the "(edited <timestamp>)" for edits to a topic text by the author I can do that. But if you mean tag...

          Can you clarify what you mean with topic edits since it's a little ambiguous? If you mean the "(edited <timestamp>)" for edits to a topic text by the author I can do that. But if you mean tag edits and stuff, I don't really wanna even bother with trying to parse topic logs because it is very unstructured, deing has some experience with it and it's not pretty.

          I might, I haven't really thought about it yet. I'll have to go through the planned statistics here and compare if they overlap with the previous ones before I decide I think.

          2 votes
          1. hungariantoast
            Link Parent
            Oh yeah, sorry, I meant moderation/curation edits like tags and titles. If that stuff is annoying to go through though then don't worry about it. It sounds you're already planning on providing...

            Oh yeah, sorry, I meant moderation/curation edits like tags and titles. If that stuff is annoying to go through though then don't worry about it. It sounds you're already planning on providing more than enough data to sate my appetite!

            2 votes
  5. [2]
    pseudolobster
    Link
    How about subscribers to ~tildes.official over time? Seems like that'd be a fairly good indicator of total userbase.

    How about subscribers to ~tildes.official over time? Seems like that'd be a fairly good indicator of total userbase.

    3 votes
    1. Bauke
      Link Parent
      I don't think I can do that unless I go looking for data outside of Tildes, like archive.org or something. And even then it probably won't be particularly accurate if there's not enough data points.

      I don't think I can do that unless I go looking for data outside of Tildes, like archive.org or something. And even then it probably won't be particularly accurate if there's not enough data points.

      3 votes
  6. [2]
    asoftbird
    Link
    Comments posted daily, maybe weighted based on if it's one user making 30 comments or 30 individual users. Also, amount of comments/posts per subgroup? The most frequent tags?

    Comments posted daily, maybe weighted based on if it's one user making 30 comments or 30 individual users.

    Also, amount of comments/posts per subgroup? The most frequent tags?

    3 votes
    1. Bauke
      Link Parent
      The posts per group and most frequent tags is definitely possible, the comments posted weighted by users I’m not so sure. Would that be a histogram of sorts where it displays something like “90%...

      The posts per group and most frequent tags is definitely possible, the comments posted weighted by users I’m not so sure. Would that be a histogram of sorts where it displays something like “90% of comments are written by the same 5 people”?

      1 vote
  7. [6]
    vektor
    Link
    Histogram of time between subsequent comments from a user. That is: from a user's profile, how far apart in time are the comments? Might have to do some cleaning up of the plot to focus in on the...

    Histogram of time between subsequent comments from a user. That is: from a user's profile, how far apart in time are the comments?

    Might have to do some cleaning up of the plot to focus in on the interesting tail end of that distribution.

    3 votes
    1. [5]
      Bauke
      Link Parent
      I would love to try this but I'm not sure if I'm capable of doing it. If you have any more details on how it could be done that'd be great. Don't tell anyone but I actually don't know much about...

      I would love to try this but I'm not sure if I'm capable of doing it. If you have any more details on how it could be done that'd be great. Don't tell anyone but I actually don't know much about statistics and stuff. :P

      2 votes
      1. [4]
        deing
        Link Parent
        A histogram's a plot that just shows how often a single value (often binned into groups like 0..10, 10..50, 50..200 etc) appears in a data set. So for a single user you'd have a bar chart, with...

        A histogram's a plot that just shows how often a single value (often binned into groups like 0..10, 10..50, 50..200 etc) appears in a data set. So for a single user you'd have a bar chart, with bars that are "amount of comments posted X time after the one before it", with Xs of for example 10m, 1h, 3h, 6h, 1d, 1w, …
        Getting the intervals should be fairly easy using the user comment pages.

        4 votes
        1. [3]
          Bauke
          Link Parent
          Thanks for the response but I should have clarified, I was more wondering how I'd go about doing this for multiple people (ie Tildes as a whole) rather than individuals. I don't think it would be...

          Thanks for the response but I should have clarified, I was more wondering how I'd go about doing this for multiple people (ie Tildes as a whole) rather than individuals. I don't think it would be a useful statistic to include if it were just a handful of people.

          3 votes
          1. [2]
            vektor
            Link Parent
            Well, assuming your data looks something like [(timestamp, username, otherstuff)] for all the comments on tildes (or all the comments made during a specific timespan).. Just filter by users, so...

            Well, assuming your data looks something like [(timestamp, username, otherstuff)] for all the comments on tildes (or all the comments made during a specific timespan).. Just filter by users, so that it looks like [(username, [timestamp])].
            Now for each user:
            Sort the timestamps chronologically and then compute the delta between adjecent ones. Collect all the deltas from all the users and chuck them into your favorite histogram tool.

            I can whip up a python script if you like.

            4 votes
            1. Bauke
              Link Parent
              Thanks! I think you explained it in sufficient detail that I can figure it out from here. I've added it to the planned statistics. :)

              Thanks! I think you explained it in sufficient detail that I can figure it out from here. I've added it to the planned statistics. :)

              2 votes
  8. [7]
    MetArtScroll
    Link
    Activity per hour (i.e., the number of topics or comments submitted between 00:00 UTC and 01:00 UTC across days, then between 01:00 and 02:00 and so on), with further filtering by group. Activity...

    Activity per hour (i.e., the number of topics or comments submitted between 00:00 UTC and 01:00 UTC across days, then between 01:00 and 02:00 and so on), with further filtering by group.

    Activity per time per user. As for per-user statistics, who is going to be able to see it (the user in question, users with elevated privileges, other users, non-logged-in lurkers)? As the access to user pages is limited for those not logged in, I would suggest that per-user statistics should not be available in that case.

    P.S. Thanks a lot, though the graphs are a bit depressing…

    3 votes
    1. [2]
      Bauke
      Link Parent
      Would this be something like this chart? That's an old chart taken from my first statistics topic. I won't be doing per-user statistics, I'm more interested in Tildes as a whole this time around....

      Activity per hour (i.e., the number of topics or comments submitted between 00:00 UTC and 01:00 UTC across days, then between 01:00 and 02:00 and so on), with further filtering by group.

      Would this be something like this chart? That's an old chart taken from my first statistics topic.

      Activity per time per user. As for per-user statistics, [...]

      I won't be doing per-user statistics, I'm more interested in Tildes as a whole this time around. :) Saves me from dealing with all the stuff you mentioned.

      P.S. Thanks a lot, though the graphs are a bit depressing…

      I don't think so, I think they display a very good thing that Tildes has going right now:

      Look at the annotation for where Tildes got 10000 registered users, that was around April 2019. Since then (which is almost a year), we've only gained about 1500 more users, yet the activity has remained relatively the same since then. This shows that even with the invite code barrier to entry, the quality-focused goals and heavily moderated discourse (in comparison to the rest of the internet), Tildes has enough people willing and dedicated to being here after being just under 2 years old. And not just active as in a couple topics and comments here and there, no... Several hundreds of topics and around double or triple that in comments. I would say that's insanely good for such a young community. That's something to be incredibly proud of.

      Things can always be better, but I don't think the charts are depressing at all.

      6 votes
      1. MetArtScroll
        Link Parent
        Yes, this plus the same thing for the hours alone (regardless of the day of week).

        Would this be something like this chart? That's an old chart taken from my first statistics topic.

        Yes, this plus the same thing for the hours alone (regardless of the day of week).

        2 votes
    2. [4]
      hungariantoast
      Link Parent
      Why?

      the graphs are a bit depressing

      Why?

      2 votes
      1. [3]
        MetArtScroll
        Link Parent
        Little activity with no upward trend. However, a lot of activity in 2018 took place in ~tildes and ~tildes.official, so maybe the same graphs for the site excluding the meta groups would look more...

        Little activity with no upward trend. However, a lot of activity in 2018 took place in ~tildes and ~tildes.official, so maybe the same graphs for the site excluding the meta groups would look more optimistic.

        3 votes
        1. [2]
          hungariantoast
          Link Parent
          Yeah, I wouldn't be surprised if the majority of the first month-and-a-half of activity took place in ~tildes. There was quite a bit of arguing and controversy early in the site's history (most of...

          Yeah, I wouldn't be surprised if the majority of the first month-and-a-half of activity took place in ~tildes. There was quite a bit of arguing and controversy early in the site's history (most of which was totally blown out of proportion, in my opinion).

          @Bauke, if you don't mind, do you have a way to confirm that a majority of the site's early activity took place in ~tildes and its subgroup?

          5 votes
          1. Bauke
            (edited )
            Link Parent
            I had a quick look at some older data I have in a spreadsheet from a while back and from what I can tell: There were the most topics in ~tildes during that time, but not by much. ~news, ~talk,...

            I had a quick look at some older data I have in a spreadsheet from a while back and from what I can tell:

            • There were the most topics in ~tildes during that time, but not by much. ~news, ~talk, ~tech, ~comp and ~games also contributed a lot to it.
            • There were the most comments in ~talk during that time, almost doubling the comments posted in ~tildes during that time. The same groups as before also contributed as well.

            We'll get a definitive answer to this when I get around to charting posts per group over time though, as planned. Trying to remember all the numbers isn't all that easy so my estimate above is not all too accurate.

            5 votes
  9. [2]
    ThatFanficGuy
    Link
    Amount of comments deleted by authors vs. moderated.

    Amount of comments deleted by authors vs. moderated.

    2 votes
    1. Bauke
      Link Parent
      Can do, added to the OP! I'll also include topics, since that's possible as well.

      Can do, added to the OP!

      I'll also include topics, since that's possible as well.

      1 vote