• Most votes
  • Most comments
  • Newest
  • Activity
  • Showing only topics with the tag "scraping". Back to normal view
    1. TLDR: Album of all 3 graphs Edit 3: Added some more graphs in my top level comment here. Hi all, over the past few days I've slowly been working on a scraper for Tildes with the goal of getting...

      TLDR: Album of all 3 graphs

      Edit 3: Added some more graphs in my top level comment here.

      Hi all, over the past few days I've slowly been working on a scraper for Tildes with the goal of getting some statistics and as an exercise in learning Puppeteer. I may or may not put the source for it online later but it's surprisingly easy to do (so maybe you can learn too, wink wink nudge nudge). Also, I'm not sure whether I want to publish the full data file as I want to respect the privacy of the people, I don't know. You let me know.

      On to the specifics of what I'm scraping then, currently the only things are the topics in the topic listing, which give me:

      export interface ITopic {
        author: string;   // Author of the topic
        comments: number; // Amount of comments the topic has
        date: Date;       // Date the topic was posted
        group: string;    // Group the topic was posted in
        id: string;       // The base 36 ID of the topic
        link: string;     // A link to the topic, primarily for easier Markdown formatting
        tags: string[];   // Tags of the topic, currently not doing anything with them
        title: string;    // Title of the topic, also not using this for anything
        type: TopicTypes; // The type of topic, ie: Link or Text
        votes: number;    // Amount of votes the topic has
      }
      

      And now that the boring stuff is out of the way, actual data:

      Topics

      Line graph of topics per week and line graph of topics per month

      Year Week # of Topics - Year Month # of Topics
      2018 17 119 - 2018 4 42
      2018 18 143 - 2018 5 1306
      2018 19 188 - 2018 6 2045
      2018 20 451 - 2018 7 1440
      2018 21 440 - 2018 8 2016
      2018 22 831 - 2018 9 1405
      2018 23 429 - 2018 10 1370
      2018 24 403 - 2018 11 940
      2018 25 298 - 2018 12 728
      2018 26 281 - 2019 1 334
      2018 27 241 -
      2018 28 336 -
      2018 29 406 -
      2018 30 418 -
      2018 31 506 -
      2018 32 484 -
      2018 33 479 -
      2018 34 350 -
      2018 35 345 -
      2018 36 345 -
      2018 37 354 -
      2018 38 305 -
      2018 39 306 -
      2018 40 289 -
      2018 41 326 -
      2018 42 286 -
      2018 43 324 -
      2018 44 231 -
      2018 45 207 -
      2018 46 243 -
      2018 47 180 -
      2018 48 173 -
      2018 49 199 -
      2018 50 159 -
      2018 51 150 -
      2018 52 152 -
      2019 1 143 -
      2019 2 106 -

      Users

      This table only lists users with more than 100 topics, otherwise it would be far too big. All votes listed are only votes received on topics.

      User # of Topics # of Votes Most Topics In Group (Topics) Most Votes In Group (Votes)
      Algernon_Asimov 728 5863 news (288) news (1828)
      Catt 280 2829 news (40) talk (440)
      Deimos 884 15837 tech (225) tildes.official (6385)
      EightRoundsRapid 472 3068 news (146) news (977)
      EscReality 128 1066 food (41) food (328)
      Neverland 165 1911 news (79) news (922)
      Whom 119 1179 music (87) music (864)
      asteroid 169 1316 tech (50) tech (365)
      boredop 117 862 music (52) music (321)
      cfabbro 275 1725 music (71) music (326)
      dubteedub 513 6263 news (151) news (2120)
      ducks 107 1411 misc (23) tech (285)
      jmillikin 134 1585 news (30) news (351)
      patience_limited 124 1350 tech (19) tech (236)
      rkcr 127 1406 misc (29) tech (323)
      spit-evil-olive-tips 386 3926 news (66) news (735)

      Groups

      The top users' numbers here are how many topics/votes they have in that specific group. And of course Deimos is listed at the bottom with the highest number of topics/votes.

      Graph of this table, excluding top users.

      Group # of Topics # of Votes # of Comments Top User (Topics) Top User (Votes)
      anime 116 974 877 Cleb (19) Cleb (193)
      books 239 2203 2012 Catt (19) cadadr (211)
      comp 671 8890 6150 Deimos (41) Deimos (626)
      creative 259 2586 1294 Bishop (49) Bishop (364)
      enviro 252 2757 1136 dubteedub (42) dubteedub (523)
      food 239 2096 1502 spit-evil-olive-tips (45) spit-evil-olive-tips (368)
      games 891 10914 9038 Deimos (193) Deimos (2230)
      health 174 1635 1062 Catt (26) Catt (238)
      hobbies 85 1100 1403 spit-evil-olive-tips (3) RedditRefugee (48)
      humanities 254 2291 1206 Algernon_Asimov (97) Algernon_Asimov (796)
      lgbt 173 1919 1124 Algernon_Asimov (61) Algernon_Asimov (535)
      life 245 3291 2324 Catt (27) Catt (319)
      misc 884 9286 5332 EightRoundsRapid (69) dubteedub (855)
      movies 263 2331 1722 dubteedub (42) dubteedub (381)
      music 1295 7676 5029 EightRoundsRapid (134) Whom (864)
      news 1700 18115 8817 Algernon_Asimov (288) dubteedub (2120)
      science 519 4952 1551 Algernon_Asimov (66) Algernon_Asimov (556)
      sports 250 1602 722 boredop (24) dubteedub (155)
      talk 600 11406 18618 Catt (37) Catt (440)
      tech 1363 19199 12104 Deimos (225) Deimos (3462)
      tildes 730 11513 14360 Algernon_Asimov (15) Kat (331)
      tildes.official 132 6385 6814 Deimos (132) Deimos (6385)
      tv 292 3200 2733 dubteedub (40) dubteedub (367)
      Totals: 11626 136321 106930 Deimos (884) Deimos (15837)

      I don't want to draw any conclusions or make guesses so I'll leave that up to the reader, I just wanted to share some of the data.

      Edit: minor typos here and there.

      Edit 2: some plans for the future (mainly for myself):

      • Get user's highest voted topic (total and in specific group)
      • Get user's highest commented on topic (total and in specific group)
      • Do stuff with tags like which tags are used a lot etc.
      • Figure out when topics are posted in the week and in the day.
      • Do stuff with title length like average, shortest and longest, etc.
      • Scrape text topics' word counts and get average, shortest/longest, etc.
      41 votes