7 votes

Is there a strong correlation between comment length and comment quality?

Here are the top ten reddit comments from Feb of 2018, based on their character length multiplied by their votes.

E.g. the first comment has 5,144 characters with a vote of 42,457 so it had the highest rank of 218,398,808.

https://www.reddit.com/r/justneckbeardthings/comments/7wwyw5/neckbeard_crew/du4cbk5
https://www.reddit.com/r/news/comments/7xkstl/shooting_at_south_florida_high_school/du94nag
https://www.reddit.com/r/uwaterloo/comments/7w0dgv/dave_tompkins_is_overrated/dtwzhbz
https://www.reddit.com/r/AskReddit/comments/7vwkqg/hey_reddit_what_products_are_identical_to_a_brand/dtvtkzd
https://www.reddit.com/r/news/comments/80xs1v/china_bans_george_orwells_animal_farm_as_xi/duzfoko
https://www.reddit.com/r/NoStupidQuestions/comments/80h9bj/why_is_it_okay_to_cook_some_animals_alive_while/duvwgg8
https://www.reddit.com/r/AskReddit/comments/7xztxf/who_is_the_worst_person_youve_ever_met/ducsa86
https://www.reddit.com/r/AskReddit/comments/7zwebj/barbershairdressers_of_reddit_how_exactly_do_you/durco2m
https://www.reddit.com/r/AskReddit/comments/7wi1g8/what_concept_fucks_you_up_the_most/du13k9x
https://www.reddit.com/r/wifesharing/comments/7wa854/my_bf_is_looking_for_inspiration_what_would_you/duz0q9l

On the whole, there does seem to be a correlation between comment length and comment quality, especially when votes are factored in. More details here:

https://docs.google.com/spreadsheets/d/e/2PACX-1vRC08EWmy1GgzdrIvR2p9EGUpQpIbYjp8MmvlgfNT4REbXbjxOUUdXBHMqXnF_4OGsR9PrV_-xuehlW/pubhtml

12 comments

  1. [4]
    RespectMyAuthoriteh
    Link
    No

    No

    14 votes
    1. [3]
      nic
      Link Parent
      I think you just proved yourself wrong :)

      I think you just proved yourself wrong :)

      4 votes
      1. [2]
        Mumberthrax
        (edited )
        Link Parent
        Length: 2 , score: 10 vs. Length: 41, score: 2 hmm.

        Length: 2 , score: 10
        vs.
        Length: 41, score: 2

        hmm.

        5 votes
        1. Mumberthrax
          Link Parent
          length: 52, score: 4 Now I don't even know what to think.

          length: 52, score: 4

          Now I don't even know what to think.

          1 vote
  2. [2]
    Mumberthrax
    (edited )
    Link
    clickbait title aside, it feels like there's not enough information here to make the inference you seem to be making. Why is there no evaluation of the shortest comments of that month? Or the...

    clickbait title aside, it feels like there's not enough information here to make the inference you seem to be making.

    Why is there no evaluation of the shortest comments of that month? Or the medium/average length ones? Then you could compare the top ten of the three lists and see which has on average the higher scores. The math on the shortest would be easy enough to invert from how you're doing it now, and the medium length ones would need a little bit more work.. like ascertaining exactly what the average comment length is, and making any values above or below that count for less the farther away they are... so the ones which are closest to average length have the greatest multiplier against their karma score.

    edit: or maybe I'm just not understanding what you're trying to convey.

    edit2: and of course, as anyone who uses reddit should know, any voting scores do NOT cleanly correlate to quality. There may be some correlation, but generally speaking there are several major factors that influence how well a comment is scored which have nothing to do with their quality or usefulness.

    7 votes
    1. nic
      (edited )
      Link Parent
      It shows you can systematically identify the quality of comments by sorting comments based on the character length of the comment times the score of the comment. I believe this would be a more...

      It shows you can systematically identify the quality of comments by sorting comments based on the character length of the comment times the score of the comment.

      I believe this would be a more optimal way for Reddit or Tildes to sort the comments within a thread.

      For instance, a lot of the longer comments I linked above are buried under shorter comments with a higher number of votes. In general, I find a comment with 500 characters and 1k votes to be of higher quality than a comment with 2 characters and 50k votes.

      I'm not sure what you are getting at by suggesting an analysis of the shorter comments in the thread. Every single comment was included in the analysis via a database query, and I just pulled the top 1000 comments based on a simple numerical calculation.

      The title is in reference to a previous thread: https://tildes.net/~tildes/1r5/there_is_a_strong_correlation_between_comment_quality_and_length_paul_graham_essay_on_hackernews

      Edit: If you are suggesting that this logic might not work for identifying the quality of shorter comments, then you definitely have a point. Next step I am thinking of randomly looking for comments where my logic predicts they have twice the quality of the top most voted comment, to see how the logic holds up on threads with typical comment lengths.

  3. [2]
    ZaphodBeebblebrox
    Link
    To me, this seems to show that there are high quality, long posts. It doesn’t really show anything about the average long post and if they are any different from the average short post.

    To me, this seems to show that there are high quality, long posts. It doesn’t really show anything about the average long post and if they are any different from the average short post.

    6 votes
    1. nic
      Link Parent
      It shows you can systematically identify the quality of the post by sorting comments based on the character length of the comment times the score of the comment. I believe this would be a more...

      It shows you can systematically identify the quality of the post by sorting comments based on the character length of the comment times the score of the comment.

      I believe this would be a more optimal way for Tildes to sort the comments within a thread.

      For instance, a lot of the longer comments above are buried under shorter comments with a higher number of votes.

      Obviously there are a lot of very long comments with very poor quality, but they tend to not get upvoted.

      Previous thread: https://tildes.net/~tildes/1r5/there_is_a_strong_correlation_between_comment_quality_and_length_paul_graham_essay_on_hackernews

  4. joelthelion
    Link
    Using votes as a proxy for comment quality is debatable at best.

    Using votes as a proxy for comment quality is debatable at best.

    3 votes
  5. [2]
    Fires
    Link
    Fairly interesting. Makes a lot of sense, more interesting content -> more upvotes. I'd be interested in seeing the long comments with little upvotes on posts with many upvotes. e: really dislike...

    Fairly interesting. Makes a lot of sense, more interesting content -> more upvotes. I'd be interested in seeing the long comments with little upvotes on posts with many upvotes.

    e:

    really dislike the title of this post

    2 votes
  6. Cliftonia
    Link
    This just means they select posts with the most words because dumb people need more words per page to explain things to them. The most quality posts are actually a single emoji.

    This just means they select posts with the most words because dumb people need more words per page to explain things to them. The most quality posts are actually a single emoji.