16 votes

Tildes Issue Log - January 2019

6 comments

  1. [2]
    Deimos Link
    Thanks Bauke. Unfortunately things are still pretty slow for various reasons, but we'll get through to more interesting times again eventually. I see that you're done scraping for now too. :b...

    Thanks Bauke. Unfortunately things are still pretty slow for various reasons, but we'll get through to more interesting times again eventually. I see that you're done scraping for now too. :b

    Maybe we can look at adding an official stats page of some sort so that the scraping isn't necessary. I probably wouldn't want to have things like user-based "leaderboards", but a decent amount of the info could probably be on the site itself.

    11 votes
    1. Bauke Link Parent
      I'm sure the pace will pick back up soon! And yeah, haha I accidentally overwrote the comments file the other day and couldn't get it back so had to do it over again. :P This time I've made...

      I'm sure the pace will pick back up soon! And yeah, haha I accidentally overwrote the comments file the other day and couldn't get it back so had to do it over again. :P This time I've made backups though! And now I'm also going to adjust the scraper so it doesn't go over every topic again, only the past 30/31 (I guess 28 in Feb's case) days so it should take a lot less requests. Luckily using period=30d that's gonna be very easy.

      And that would be nice to have, like status reports or something that automatically get generated and published each month. I made a spreadsheet earlier to test Google Sheets' graphs and something like that in one document or on a single page would look nice and informative I think.

      6 votes
  2. [4]
    Bauke Link
    The first of the year! Including some changes to the index of the site and more GitLab statistics! I'm also planning to add the Tildes statistics from scraping every month, more info on that is in...

    The first of the year! Including some changes to the index of the site and more GitLab statistics!

    I'm also planning to add the Tildes statistics from scraping every month, more info on that is in the post itself! (reddit post)

    6 votes
    1. [3]
      bun Link Parent
      Hi Bauke. First of all, thanks for doing this each month. It's always nice to have data. Will your scraper be able to correct for users with deleted topics/comments since the last time you ran the...

      Hi Bauke.

      First of all, thanks for doing this each month. It's always nice to have data.

      Will your scraper be able to correct for users with deleted topics/comments since the last time you ran the scraper? I know several users have deleted parts of their content as Tildes slowly opened up to the public. I suspect for instance the users with many topics may have changed since then.

      6 votes
      1. [2]
        Bauke (edited ) Link Parent
        Thank you. :) And no, it can't. The way it works right now is that it will scrape everything that it can see, so the procedure is: Go to https://tildes.net?order=new&period=all&per_page=100 Scrape...

        Thank you. :) And no, it can't. The way it works right now is that it will scrape everything that it can see, so the procedure is:

        • Go to https://tildes.net?order=new&period=all&per_page=100
        • Scrape all the available data from topics in the listing (comments scraping is done later)
        • If the "Next Page" button exists, click that and continue to the next page (until there's no longer a next page)
        • Write the topics data to file.

        After that I have another script that will now use the topics file and go through each topic and get all the comment data from that topic, this takes the longest (+-8 hours). Because of the rate limit I set (2.5s per topic) the data is slightly inaccurate because a topic's data I scraped 3 hours ago could now suddenly have way more (or way less if people deleted) comments, but I don't think that really matters as it's probably only very slightly inaccurate.

        So essentially it will take a snapshot of everything available when it is scraping, if someone deleted a topic or comment after the scraper has already gone through it, it won't know that (unless I run it again).

        Edit: now that I think of it, I guess I can see comments/topics deleted in specific timeframes if I were to scrape that timeframe multiple times and then compare, but that's data that would be easier to get by official means.

        4 votes
        1. bun Link Parent
          Makes sense. I think seeing removed content amount could also be an interesting metric. Thanks for taking the time to answer.

          Makes sense. I think seeing removed content amount could also be an interesting metric.

          Thanks for taking the time to answer.

          4 votes