Is there a strong correlation between comment length and comment quality?
Here are the top ten reddit comments from Feb of 2018, based on their character length multiplied by their votes.
E.g. the first comment has 5,144 characters with a vote of 42,457 so it had the highest rank of 218,398,808.
https://www.reddit.com/r/justneckbeardthings/comments/7wwyw5/neckbeard_crew/du4cbk5
https://www.reddit.com/r/news/comments/7xkstl/shooting_at_south_florida_high_school/du94nag
https://www.reddit.com/r/uwaterloo/comments/7w0dgv/dave_tompkins_is_overrated/dtwzhbz
https://www.reddit.com/r/AskReddit/comments/7vwkqg/hey_reddit_what_products_are_identical_to_a_brand/dtvtkzd
https://www.reddit.com/r/news/comments/80xs1v/china_bans_george_orwells_animal_farm_as_xi/duzfoko
https://www.reddit.com/r/NoStupidQuestions/comments/80h9bj/why_is_it_okay_to_cook_some_animals_alive_while/duvwgg8
https://www.reddit.com/r/AskReddit/comments/7xztxf/who_is_the_worst_person_youve_ever_met/ducsa86
https://www.reddit.com/r/AskReddit/comments/7zwebj/barbershairdressers_of_reddit_how_exactly_do_you/durco2m
https://www.reddit.com/r/AskReddit/comments/7wi1g8/what_concept_fucks_you_up_the_most/du13k9x
https://www.reddit.com/r/wifesharing/comments/7wa854/my_bf_is_looking_for_inspiration_what_would_you/duz0q9l
On the whole, there does seem to be a correlation between comment length and comment quality, especially when votes are factored in. More details here:
No
I think you just proved yourself wrong :)
Length: 2 , score: 10
vs.
Length: 41, score: 2
hmm.
length: 52, score: 4
Now I don't even know what to think.
clickbait title aside, it feels like there's not enough information here to make the inference you seem to be making.
Why is there no evaluation of the shortest comments of that month? Or the medium/average length ones? Then you could compare the top ten of the three lists and see which has on average the higher scores. The math on the shortest would be easy enough to invert from how you're doing it now, and the medium length ones would need a little bit more work.. like ascertaining exactly what the average comment length is, and making any values above or below that count for less the farther away they are... so the ones which are closest to average length have the greatest multiplier against their karma score.
edit: or maybe I'm just not understanding what you're trying to convey.
edit2: and of course, as anyone who uses reddit should know, any voting scores do NOT cleanly correlate to quality. There may be some correlation, but generally speaking there are several major factors that influence how well a comment is scored which have nothing to do with their quality or usefulness.
It shows you can systematically identify the quality of comments by sorting comments based on the character length of the comment times the score of the comment.
I believe this would be a more optimal way for Reddit or Tildes to sort the comments within a thread.
For instance, a lot of the longer comments I linked above are buried under shorter comments with a higher number of votes. In general, I find a comment with 500 characters and 1k votes to be of higher quality than a comment with 2 characters and 50k votes.
I'm not sure what you are getting at by suggesting an analysis of the shorter comments in the thread. Every single comment was included in the analysis via a database query, and I just pulled the top 1000 comments based on a simple numerical calculation.
The title is in reference to a previous thread: https://tildes.net/~tildes/1r5/there_is_a_strong_correlation_between_comment_quality_and_length_paul_graham_essay_on_hackernews
Edit: If you are suggesting that this logic might not work for identifying the quality of shorter comments, then you definitely have a point. Next step I am thinking of randomly looking for comments where my logic predicts they have twice the quality of the top most voted comment, to see how the logic holds up on threads with typical comment lengths.
To me, this seems to show that there are high quality, long posts. It doesn’t really show anything about the average long post and if they are any different from the average short post.
It shows you can systematically identify the quality of the post by sorting comments based on the character length of the comment times the score of the comment.
I believe this would be a more optimal way for Tildes to sort the comments within a thread.
For instance, a lot of the longer comments above are buried under shorter comments with a higher number of votes.
Obviously there are a lot of very long comments with very poor quality, but they tend to not get upvoted.
Previous thread: https://tildes.net/~tildes/1r5/there_is_a_strong_correlation_between_comment_quality_and_length_paul_graham_essay_on_hackernews
Using votes as a proxy for comment quality is debatable at best.
Fairly interesting. Makes a lot of sense, more interesting content -> more upvotes. I'd be interested in seeing the long comments with little upvotes on posts with many upvotes.
e:
really dislike the title of this post
The title is in reference to this post: https://tildes.net/~tildes/1r5/there_is_a_strong_correlation_between_comment_quality_and_length_paul_graham_essay_on_hackernews
This just means they select posts with the most words because dumb people need more words per page to explain things to them. The most quality posts are actually a single emoji.