41 votes

The Road to the Tildes 2020 Census: Pandemic Boogaloo

Posted August 15, 2020 by Grzmot

Tags: ask.discussion, casual, tildes census.2020, tildes.demographics

Hello everyone!

Some of you may remember last years census. I did say I wanted to repeat it, and have decided to push the date forward, as last year a few people said that doing it in December at the end of the year probably wasn't the smartest choice, due to well, christmas and new year's eve and I totally agree. Wish I would've seen that earlier, but I guess I'm dumb lol. Summer may give us better stats because more people are on vacation any maybe check in more regularily. Or maybe there's also less people around because everyone's on vacation. Who fucking knows, I don't know. I don't know anything.

Anyway, just like last time, I'd like to organize a preliminary discussion about the census and what questions we should ask this community. As a reminder, this was the 2019 census.

Areas that I'd like to improve: Politics, mainly. The 8values test I think is good, but maybe there's something better out there. We could do it with the political compass and go full quadrant and r/politicalcompassmemes, but of course that's also not going to be very accurate. I honestly don't even know if there's a good solution to this problem.

Other than that, are there other questions you'd like to ask the community? Let's discuss!^{I hope someone replies or this is gonna be emberassing.}

42 comments

[17]

Comment deleted by author
Link
1. [14]
  Grzmot (OP)
  August 16, 2020
  Link Parent
  This was already on last years census. However I just realized that the screenshot I posted was incomplete. Thanks Firefox screenshot capture function, I trusted you. I agree, and the fields have...
  
  For the question asking users about what operating systems they use, in addition to the usual "Windows", "macOS", and "Linux" options, please also include a "BSD" option. Historically, there has been a non-negligible number of users who use BSD systems on Tildes.
  
  This was already on last years census. However I just realized that the screenshot I posted was incomplete. Thanks Firefox screenshot capture function, I trusted you.
  
  For the question above, asking about desktop/server operating systems, and the other question asking about mobile operating systems, please make each of the choices a checkbox, so that users who use multiple operating systems can select all that apply, and not have to just choose one option. (For instance, I own Android and iOS devices.)
  
  I agree, and the fields have been converted to checkboxes (was already like this for computers, but the mobile devices were still single choice).
  
  Finally, as I asked last year, can you please release the actual data collected and not just the results? Doing so would allow other users to take a look at the data and do their own thing with it, such as producing their own graphs and analyses.
  
  If this year's census is anything like the last one, the answers provided will already be anonymized and detached from any usernames, thus there is no real detriment to any user's privacy by releasing the raw data of the census.
  
  I remember having this discussion last year, I think you came late to the party and the census was already ongoing, so I felt uncomfortable releasing the data, as I had stated earlier that I wouldn't. We have plenty of people on Tildes who care a lot about their privacy, and I remember at least one person who didn't answer specifically because they thought the personal detail fields were mandatory. I personally don't have a problem with releasing the census data, but if people don't want their aggregated data to be released to the public, I can't do anything about that. Of course with the data being public, we'd get a diverse analysis from different points of view, but otherwise my analysis is better than none, I guess.
  
  One idea would be to release the data encrypted, and send out the password via Tildes DMs to those who would like to analyse the data. Not very secure of course, but it would be better than just releasing the CSVs to the public.
  
  And also, let's not provide an option to opt out of having your answers be dumped as part of the data.
  
  That choice didn't exist. If you answered the census, your answers were part of the data. However none of the fields were mandatory to fill out, so therefore you had the choice what data to give.
  
  6 votes
  1. [7]
    
    Comment deleted by author
    Link Parent
    
    [6]
    kfwyre
    August 16, 2020
    Link Parent
    I'd encourage you and actually everyone to consider how even a few specific details in conjunction can deanonymize someone. Because our userbase is relatively small, it takes only a few pieces of...
    
    (And for anyone who reads this and recoils in horror at the idea of your census answers being released, once again, let me just remind you that the answers you provide are completely disconnected from your username.)
    
    I'd encourage you and actually everyone to consider how even a few specific details in conjunction can deanonymize someone. Because our userbase is relatively small, it takes only a few pieces of data to narrow down an entry to a high level of confidence, particularly if one or more of them is uncommon.
    
    An entry identifying a male, gay, Linux user who works in education doesn't point directly to me, for example, but it does reduce the possible pool for that entry significantly, where it likely only applies to a maximum of maybe 10 people here (I'd be surprised if it were that high, to be honest). Add to that other information (age range, location, education, etc.) that people either know or can infer about me, and they could easily find my specific entry without any direct links to my username.
    
    10 votes
    
    [5]
    
    Comment deleted by author
    Link Parent
    
    kfwyre
    August 16, 2020
    Link Parent
    I did not realize that's what you were advocating for! Thanks for clarifying. I support this.
    
    I did not realize that's what you were advocating for! Thanks for clarifying. I support this.
    
    9 votes
    
    9000
    August 16, 2020
    Link Parent
    I agree with you and kfwyre that this is much better from a privacy standpoint. The trade off is that it is much less useful from an analysis standpoint, as you can't really correlate anything....
    
    I agree with you and kfwyre that this is much better from a privacy standpoint. The trade off is that it is much less useful from an analysis standpoint, as you can't really correlate anything.
    
    But, it's still interesting, and I think this is a fine choice for a somewhat informal census.
    
    I'm not one of the people likely to do analysis, so I don't actually know how they would feel about this method of releasing the data
    
    5 votes
    
    Grzmot (OP)
    August 17, 2020
    Link Parent
    I think this is a good idea, and tbh it's kind of what I did last year anyway, just not in a machine-readable format. I don't think anyone would have a problem with this.
    
    I think this is a good idea, and tbh it's kind of what I did last year anyway, just not in a machine-readable format. I don't think anyone would have a problem with this.
    
    3 votes
    
    Avocado
    August 21, 2020
    Link Parent
    I agree this would be a better way to disclose the raw data.
    
    I agree this would be a better way to disclose the raw data.
    
    3 votes
    
    9000
    August 16, 2020
    Link Parent
    This is a very real concern. kfwyre is not the only person who would be so identifiable, it is surprisingly easy to do for most people if you have the will-power, some basic information, and some...
    
    This is a very real concern. kfwyre is not the only person who would be so identifiable, it is surprisingly easy to do for most people if you have the will-power, some basic information, and some basic statistics.
    
    My understanding is that the current state-of-the-art for dataset anonymization is differential privacy¹. This may be much too complicated for Grzmot to implement single-handedly (I don't know if there are any good libraries/programs that can do this automatically), but it would make it more tolerable to publicly release the data.
    
    We could also look into k-anonymity, as that is easier to implement, but it still may not be sufficient for a survey with only 250 responses².
    
    1: The Wikipedia article is super math-y. For an explanation targeted to laypeople, check out the minutephysics video on it.
    
    2: My source is last year's results.
    
    3 votes
  2. [7]
    Adys
    August 16, 2020
    Link Parent
    You could provide an opt-out for just the raw data public release (no opt out from aggregated public release).
    
    You could provide an opt-out for just the raw data public release (no opt out from aggregated public release).
    
    [6]
    Grzmot (OP)
    August 16, 2020
    Link Parent
    But then everyone elses graphs/data/analysis would be scuffed, as parts would be missing.
    
    But then everyone elses graphs/data/analysis would be scuffed, as parts would be missing.
    
    4 votes
    
    [2]
    Adys
    August 16, 2020
    Link Parent
    Yeah, it's a workaround. To be clear I'd be in favour of just having the data fully dumped in public, with a big disclaimer you'll be doing that as the first thing the user reads.
    
    Yeah, it's a workaround. To be clear I'd be in favour of just having the data fully dumped in public, with a big disclaimer you'll be doing that as the first thing the user reads.
    
    3 votes
    
    gpl
    August 16, 2020
    Link Parent
    I don’t think that would be functionally different as people who care about that data being released will just self-select out of completing it at all. At least with a check box you can still get...
    
    I don’t think that would be functionally different as people who care about that data being released will just self-select out of completing it at all. At least with a check box you can still get those people included in the aggregated statistics.
    
    2 votes
    
    [3]
    Kuromantis
    August 16, 2020
    Link Parent
    Perhaps we could just ask decently vague questions and have more specific answers be optional? @Algernon_Asimov was the one who noped out because he wanted an age range instead of an exact age so...
    
    Perhaps we could just ask decently vague questions and have more specific answers be optional? @Algernon_Asimov was the one who noped out because he wanted an age range instead of an exact age so asking a few questions of varying specificity like so:
    
    "What's your age?"
    
    By decile (10-20 to 90-100)
    By quintile (10-15 to 95-100)
    By year (13/TOS/legal minimum to 100)
    
    And make the top one obligatory and the other 2 optional (or including a "answered above" option) might be a solution, since it lets you pick how specific you're gonna be about yourself.
    
    I think this can be applied to a decent amount of questions (i.e relationships can probably be simplified) but I'm not sure how many.
    
    3 votes
    
    joplin
    August 16, 2020
    Link Parent
    That's better, but if I recall there were some people who were the only ones in their decile. And it's stuff like that which shows how easy it is to correlate supposedly anonymized data.
    
    That's better, but if I recall there were some people who were the only ones in their decile. And it's stuff like that which shows how easy it is to correlate supposedly anonymized data.
    
    3 votes
    
    j3n
    August 17, 2020
    Link Parent
    There is no "just" when it comes to this stuff. It's way harder than anyone intuitively thinks it is. I didn't participate last year and I have no interest in participating this year if releasing...
    
    There is no "just" when it comes to this stuff. It's way harder than anyone intuitively thinks it is. I didn't participate last year and I have no interest in participating this year if releasing individual responses is an option, even if there's an opt-out.
    
    1 vote
2. Eric_the_Cerise
  August 16, 2020
  Link Parent
  Consider this for each census question; many questions that are "choose one" might benefit from using a "choose all that apply" format.
  
  the other question asking about mobile operating systems, please make each of the choices a checkbox
  
  Consider this for each census question; many questions that are "choose one" might benefit from using a "choose all that apply" format.
  
  5 votes
3. Tardigrade
  August 16, 2020
  Link Parent
  As an addition to this would asking about operating systems in more depth just be a mess with the small number of responses? As in asking "Windows", "macOS", "Linux", or "BSD" and then asking with...
  
  As an addition to this would asking about operating systems in more depth just be a mess with the small number of responses? As in asking "Windows", "macOS", "Linux", or "BSD" and then asking with a freeform text box what actual operating system "Windows 7", "Temple OS", or "Ubuntu".
  
  4 votes
[4]

Comment deleted by author
Link
1. Grzmot (OP)
  August 16, 2020
  Link Parent
  Done and dusted.
  
  I'd greatly appreciate a text-box option for the question on gender
  
  Done and dusted.
  
  8 votes
2. [2]
  nacho
  August 16, 2020
  Link Parent
  I think using a set of umbrella terms is better. Otherwise this group quickly just becomes "other answers __%" rather than counting these actual identities. This holds true for all categories: If...
  
  I think using a set of umbrella terms is better. Otherwise this group quickly just becomes "other answers __%" rather than counting these actual identities.
  
  This holds true for all categories: If there are set options for the most common political views, and you choose a write-in option rather than a somewhat less precise and more general term supplied, that's the most effective way to have your view under-counted and therefore underestimated.
  
  8 votes
  1. Grzmot (OP)
    August 16, 2020
    Link Parent
    That's kind of the problem with aggregation of data, no matter what you do you'll lose some detail of the data, and people will have to deal with the fact that they weren't allowed to enter the...
    
    That's kind of the problem with aggregation of data, no matter what you do you'll lose some detail of the data, and people will have to deal with the fact that they weren't allowed to enter the way they feel about something in the way they wanted to. Especially with something touchy about gender where the range of identities is far and wide, especially on a left-leaning internet forum.
    
    If it helps, I don't think adding a free-form field for gender is going to hurt much, as on last years census, only 4% of all people said they were non-binary. You can have whatever discussion about that you want, but they are a minority within a minority and I don't think that in this case they would dilute the data as much, as for example adding a free-form field for political conviction would.
    
    6 votes
[2]
Omnicrola
August 15, 2020
Link
It might be interesting to set up a preliminary survey that's more free-form, to gather potential answers and see what people are interested in. Something like: list all the options you would...
It might be interesting to set up a preliminary survey that's more free-form, to gather potential answers and see what people are interested in.

Something like:
1. list all the options you would expect to see available in a question about gender
2. list the keywords that you think are useful when describing someone's political views
Then maybe share it here as a word cloud and use that data as a starting point to build census questions.
9 votes
1. Grzmot (OP)
  August 16, 2020
  Link Parent
  Well, the gender question has a freeform answer field now, so anyone can enter whatever they want. If you want your gender to be Communism, you can. So point 1 would be kind of moot. However the...
  
  Well, the gender question has a freeform answer field now, so anyone can enter whatever they want. If you want your gender to be Communism, you can. So point 1 would be kind of moot.
  
  However the second point would definitely be interesting. Last time we did it with an 8values test, but what about letting people enter their political position in a free-form text field? It's going to be a nightmare to analyse and of course you have people often describing themselves as something while actually being something else based on their political convictions.
  
  6 votes
[3]
MonkeyPants
August 16, 2020
Link
I'm kind of curious, for those that are mods or ex-mods from reddit, the approximate number of subscribers they moderate. I keep hearing how many mods/ex-mods there are here, just curious I guess.

I'm kind of curious, for those that are mods or ex-mods from reddit, the approximate number of subscribers they moderate.

I keep hearing how many mods/ex-mods there are here, just curious I guess.

8 votes
1. [2]
  AugustusFerdinand
  August 16, 2020
  Link Parent
  Hmm... How to word this question would be difficult. I currently mod several subs on reddit, most of them "small" (largest is 500k subs) compared to the defaults (even if there aren't technically...
  
  Hmm... How to word this question would be difficult.
  
  I currently mod several subs on reddit, most of them "small" (largest is 500k subs) compared to the defaults (even if there aren't technically defaults any longer), but I was a mod of a couple of defaults at one time along with some other non-default massive-for-the-time subs.
  
  5 votes
  1. Grzmot (OP)
    August 16, 2020
    Link Parent
    What about If you were a moderator on Reddit, what was the peak amount of subscribers you moderated? The question of course is, what do we do with that data? Can you have a discussion about it? Or...
    
    What about If you were a moderator on Reddit, what was the peak amount of subscribers you moderated?
    
    The question of course is, what do we do with that data? Can you have a discussion about it? Or would that maybe just be suited best for a separate Tildes thread with to discuss that specifically?
    
    4 votes
sron
August 16, 2020
Link
Maybe for people that use Tildes on mobile, if they use it with a web wrapper app like Hermit or Native Alpha, use a home screen shortcut from their browser, or just use a browser. Nothing...

Maybe for people that use Tildes on mobile, if they use it with a web wrapper app like Hermit or Native Alpha, use a home screen shortcut from their browser, or just use a browser. Nothing important, just potentially interesting. Maybe.

5 votes
skybrian
August 16, 2020
Link
I recommend going through the questions and putting in an "It's complicated" option. (Not just gender.) Questions are often more ambiguous than they seem to the person who wrote them, and outliers...

I recommend going through the questions and putting in an "It's complicated" option. (Not just gender.) Questions are often more ambiguous than they seem to the person who wrote them, and outliers are interesting.

(On the other hand, outliers are also identifying, so not everyone will want to explain further.)

4 votes
[2]
SheepWolf
August 21, 2020
Link
Not a question for the survey/census itself, but when posting results, would it make sense to have 2 threads (main and freeform/free-form)? One thread to act as the main topic for the census...

Not a question for the survey/census itself, but when posting results, would it make sense to have 2 threads (main and freeform/free-form)? One thread to act as the main topic for the census results with a link to the second topic which more specifically addresses the free-form questions.

It sounds like this time around there might be more free-form questions, and it might be easier for comments to discuss them separately from discussion about the census in general. Are stickied comments a thing in progress on Tildes?

Looking at last year's results, I think with just the 3 pastebins about Tildes, the answers seemed overwhelming (although many of the answers were kind of similar), so ideally there is a better way of handling that. Might be a task to delegate. I'm not entirely sure what the best way to do this sort of thing is.

As an aside, I imagine making nice charts (pie/bar/whatever) is difficult with the amount of different answers?

4 votes
1. Grzmot (OP)
  August 21, 2020
  Link Parent
  Depends on the amount of replies I'll get. I don't think sticky replies are a thing, maybe you're right and separate threads are the way to go. I'm still on edge about more freeform replies. On...
  
  Depends on the amount of replies I'll get. I don't think sticky replies are a thing, maybe you're right and separate threads are the way to go.
  
  I'm still on edge about more freeform replies. On one hand, they give you a way to precisely say what you want. On the other hand, if I have to attempt and extract the core meaning from such a reply to create a graph, the advantage of the freeform reply is lost anyway. Though what I might do this year is offer the most popular solutions to last years freeform replies as options to select.
  
  5 votes
[2]
Gaywallet
August 21, 2020
Link
Do you have a list of all questions you are planning to ask this year? It would be helpful to have in order to provide any additional feedback.

Do you have a list of all questions you are planning to ask this year? It would be helpful to have in order to provide any additional feedback.

4 votes
1. Grzmot (OP)
  August 21, 2020
  Link Parent
  I have not yet formulated them, I was meaning to make a second post with the (planned) 2020 form to garner additional feedback. This was more of a prelimary round to see if there were general...
  
  I have not yet formulated them, I was meaning to make a second post with the (planned) 2020 form to garner additional feedback. This was more of a prelimary round to see if there were general issues to address.
  
  5 votes
[4]
krg
August 16, 2020
Link
I know personal politics are derived from values, but I think questioning that emphasizes what people value in a way that doesn't seem to directly reference politics is more interesting. With...

I know personal politics are derived from values, but I think questioning that emphasizes what people value in a way that doesn't seem to directly reference politics is more interesting. With correlations to be made afterward, I guess...

Oh, and I think knowing how many generations a family has been in the country they reside (on top of ethnic background) is useful.

3 votes
1. [2]
  Sand
  August 16, 2020
  Link Parent
  I doubt most people will know that offhand.
  
  Oh, and I think knowing how many generations a family has been in the country they reside is useful.
  
  I doubt most people will know that offhand.
  
  4 votes
  1. krg
    August 16, 2020
    Link Parent
    Certainly. But, if known, I think it adds to the "picture", so to speak. And it may prompt some folk to look into their family history, which can be enlightening. And scary...
    
    Certainly. But, if known, I think it adds to the "picture", so to speak. And it may prompt some folk to look into their family history, which can be enlightening. And scary...
    
    1 vote
2. j3n
  August 17, 2020
  Link Parent
  Out of curiosity, how do you answer this question? Just looking at my 4 grandparents, one was an immigrant, one has at least some ancestors going back several hundred years in the country, and the...
  
  Oh, and I think knowing how many generations a family has been in the country they reside (on top of ethnic background) is useful.
  
  Out of curiosity, how do you answer this question? Just looking at my 4 grandparents, one was an immigrant, one has at least some ancestors going back several hundred years in the country, and the remaining two I have no idea. How many generations has my family been in the country?
  
  2 votes
[9]

Comment removed by site admin
Link
1. [8]
  Grzmot (OP)
  August 16, 2020
  Link Parent
  250 was pretty low yeah, and I don't know how many accounts have been created, but tbh when writing comments I always see the same people, I don't think the active userbase of Tildes is very big....
  
  I was really disappointed that only 250 people answered since I was expecting a few thousand (10 or 20% of the site). I guess maybe it reflects the 90-9-1 thing.
  
  250 was pretty low yeah, and I don't know how many accounts have been created, but tbh when writing comments I always see the same people, I don't think the active userbase of Tildes is very big.
  
  Given the anonymization thing, I would be interested in seeing some social statistics like romantic relationships (currently with SO, [long or short term variants] [simplified] , no SO, had a SO, widowed?)
  
  Also goes for platonic relationships (has friends, had friends, no friends, fill all that apply probably)
  
  And if we're being ballsy, maybe sexual relationships too (sex, [by frequency], no sex, FWB, paid a sex worker, sex doll, etc.)
  
  I think that's fairly interesting, but I'm wary of freeform questions because they are hard to aggregate besides basically reading them all and giving you a summary. I did release the feedback section, but I think it's just very hard to discuss these things because for personally the workload increases immensely, and if I just release them all, no one will discuss it because no one wants to read a few hundred responses just to discuss other people's love life.
  
  7 votes
  1. [4]
    rish
    August 16, 2020
    Link Parent
    Please don't add these question,, please.
    
    And if we're being ballsy, maybe sexual relationships too (sex, [by frequency], no sex, FWB, paid a sex worker, sex doll, etc.)
    
    Please don't add these question,, please.
    
    11 votes
    
    [3]
    Sand
    August 16, 2020
    Link Parent
    I agree, please don't.
    
    I agree, please don't.
    
    6 votes
    
    [2]
    Kuromantis
    August 16, 2020
    Link Parent
    Fair enough. I guess it's easier to request something when it doesn't apply to you and you don't have a lot of stuff to share about yourself in general. I'm assuming sparing the details/context...
    
    Fair enough.
    
    I guess it's easier to request something when it doesn't apply to you and you don't have a lot of stuff to share about yourself in general.
    
    I'm assuming sparing the details/context and only asking a yes or no question doesn't count, right?
    
    [2]
    
    Comment deleted by author
    Link Parent
    
    Kuromantis
    August 17, 2020 (edited August 23, 2020)
    Link Parent
    A'ight. I guess I did say "and if we're feeling ballsy" in my original comment as a way to acknowledge that's kind of extreme and the doesn't in "only asking a yes or no question doesn't count"...
    
    A'ight.
    
    I guess I did say "and if we're feeling ballsy" in my original comment as a way to acknowledge that's kind of extreme and the doesn't in "only asking a yes or no question doesn't count" shows that I'm asking for clarification rather than a pretty tone-deaf question.
    
    2 votes
  2. [3]
    Kuromantis
    August 16, 2020 (edited August 17, 2020)
    Link Parent
    Huh? From what I can tell, anonymization doesn't apply just to the freeform questions, it applies to the entire data set, right? The only question I said would be freeform was the hobby question,...
    
    Given the anonymization thing, I would be interested in seeing some social statistics [...]
    
    I'm wary of freeform questions because they are hard to aggregate besides basically reading them all and giving you a summary.
    
    Huh? From what I can tell, anonymization doesn't apply just to the freeform questions, it applies to the entire data set, right? The only question I said would be freeform was the hobby question, and for the other ones I even gave a summary of the options that I think would make sense, that you quoted?
    
    EDIT: Oh yeah, and please use a color pallette when you show people's location, please.
    
    [2]
    gpl
    August 16, 2020
    Link Parent
    I think @Grzmot is just saying that freeform answers make data analysis hard, because people word things differently. If there is a question “What hobbies do you have?”, three different users...
    
    I think @Grzmot is just saying that freeform answers make data analysis hard, because people word things differently. If there is a question “What hobbies do you have?”, three different users could answer “Music”, “playing an instrument”, and “guitar”. Those answers should all ostensibly be included under one category when presenting final results but that is hard to do programmatically because it’s not simple to write a program that recognizes those answers as different forms of the same general hobby. So the only thing to do is manually go through and categorize or summarize things which is subjective and time consuming.
    
    5 votes
    
    Kuromantis
    August 17, 2020 (edited August 17, 2020)
    Link Parent
    Fair enough. This question has come around a few times and the threads here been very long. The thing is, it seems really hard to sum up something like hobbies into a 30-question (upper limit) or...
    
    Fair enough. This question has come around a few times and the threads here been very long.
    
    The thing is, it seems really hard to sum up something like hobbies into a 30-question (upper limit) or so format, hence why I think it should be made a free-form. Maybe we might want to put that through the process /Omnicrola described here?
    
    Because I can't come up with a characterization of hobbies off my head.