43 votes

Daily Tildes discussion - Haunted by Data

Bit of a non-standard daily discussion again, but today I thought I'd post another inspiration for some of the decisions made for Tildes. I did this a couple of weeks ago with Clay Shirky's "A Group is Its Own Worst Enemy" as well, if you missed that one.

The one I'm going to post today is a talk by Maciej Ceglowski (who runs the minimal bookmarking service Pinboard). I linked it in a discussion here related to privacy recently, because I think it's a great talk that goes over some of the dangers of tech companies casually collecting so much data on their users:

Maciej Ceglowski - Haunted by Data

That's a link to his slides and a transcript, but a video of him actually doing the talk is also available on YouTube here (20 mins long) if you'd like to watch/listen.

Let me know if you have any thoughts about privacy topics (ones covered in the talk or otherwise), or questions about my approach towards privacy/data-collection on Tildes.

25 comments

  1. [15]
    TrialAndFailure Link
    I'm just glad in general that you seem dedicated to collecting very little data, especially given how most large social media companies make that their entire mode of profitizaiton.

    I'm just glad in general that you seem dedicated to collecting very little data, especially given how most large social media companies make that their entire mode of profitizaiton.

    27 votes
    1. [14]
      Deimos Link Parent
      Yeah, I think it's one of the attributes of a site where in most situations, "what the users want" and "what the company wants" are in direct opposition. Users would almost always prefer for the...

      Yeah, I think it's one of the attributes of a site where in most situations, "what the users want" and "what the company wants" are in direct opposition. Users would almost always prefer for the site to collect as little information as it possibly can, but the companies want to collect and keep all of it, even if they don't have any use or idea of why they'd need it.

      So far most people don't really care (or even think about this) much, but I think at some point, it's inevitable that some sort of massive, embarrassing data leak happens that will get some huge public backlash. Imagine all private messages from a popular site being released, or everyone's entire voting history from reddit, or something similarly disastrous.

      14 votes
      1. [2]
        cfabbro Link Parent
        The Ashley Madison data breach and Adult FriendFinder security breach come to mind there. The recent Equifax data breach was pretty insane too. And yet the public at large still don't seem to...

        it's inevitable that some sort of massive, embarrassing data leak happens that will get some huge public backlash

        The Ashley Madison data breach and Adult FriendFinder security breach come to mind there. The recent Equifax data breach was pretty insane too.

        And yet the public at large still don't seem to really care about taking their online privacy seriously. "If you have nothing to hide you have nothing to fear" is still parroted ad nauseam by people who don't understand how insanely naive an attitude that it.

        I think "what people care about" should largely just be ignored when it comes to privacy, and regulators (like the EU did with GDPR) and individual site admins (like you are with ~) need to simply take it upon themselves to ensure that users' privacy is protected regardless of general public sentiment regarding it.

        13 votes
        1. Gaywallet Link Parent
          What I would prefer, is for stricter data security standards to be enforced. In the health world, we have security standards for how we are allowed to store and send health information. If we were...

          What I would prefer, is for stricter data security standards to be enforced.

          In the health world, we have security standards for how we are allowed to store and send health information. If we were to start creating standards and writing them into law (such as GDPR) on how to store data, we can turn this into a non-problem altogether. As a bonus, we can fine companies that don't adhere, which both creates revenue as well as provides a monetary reason for companies to adhere to standards.

          The problem I've seen with the GDPR is that it's far too open. It forces companies to actually have an IT sec team, but it doesn't force them not store data unencrypted or in plain text. In the health world, you're not allowed to do that - that's what we need to push for; stronger required standards on basic it security such as long passwords, nothing stored in plain text, basic encryption, and secure connections. We can even leave it open like the GDPR did and not specify a type of encryption or connection.

          5 votes
      2. [3]
        annadane Link Parent
        I just can't stand the dishonesty. These websites endlessly preach that privacy is a virtue and the power is in the hands of the users, then they directly stab you in the back. And because there's...

        I just can't stand the dishonesty. These websites endlessly preach that privacy is a virtue and the power is in the hands of the users, then they directly stab you in the back. And because there's generally a single "feedback" link, it's completely opaque, you can't contact anyone specific. There's no accountability. You have no idea who made the decision to screw you and everybody else.

        "We're listening to your feedback" is generally the biggest lie in the universe. And no one can nail them on it. And they apparently don't even feel bad about it. How do you justify your job working for Facebook?

        7 votes
        1. Deimos Link Parent
          I think the justifications mostly come from a combination of the quote, "It is difficult to get a man to understand something, when his salary depends upon his not understanding it" and the median...

          I think the justifications mostly come from a combination of the quote, "It is difficult to get a man to understand something, when his salary depends upon his not understanding it" and the median Facebook salary being over $240,000.

          9 votes
        2. SourceContribute Link Parent
          And this is why co-operatives make sense; they have to act in the best interest of the users.

          I just can't stand the dishonesty. These websites endlessly preach that privacy is a virtue and the power is in the hands of the users, then they directly stab you in the back. And because there's generally a single "feedback" link, it's completely opaque, you can't contact anyone specific. There's no accountability. You have no idea who made the decision to screw you and everybody else.

          And this is why co-operatives make sense; they have to act in the best interest of the users.

          2 votes
      3. [2]
        Eva Link Parent
        Wasn't public voting history available by default on reddit for a few years? I remember having to turn mine off a few times.

        Wasn't public voting history available by default on reddit for a few years? I remember having to turn mine off a few times.

        5 votes
        1. Deimos (edited ) Link Parent
          I don't know if it was ever public by default, but you can still choose to display it, yeah. It's also only for submissions though, and doesn't include all your votes on comments.

          I don't know if it was ever public by default, but you can still choose to display it, yeah. It's also only for submissions though, and doesn't include all your votes on comments.

          6 votes
      4. [6]
        rib Link Parent
        While I do trust that you're not needlessly collecting data, aside from merely trusting you're practicing what you preach, is there any way we can know you're not collecting data? I get that...

        While I do trust that you're not needlessly collecting data, aside from merely trusting you're practicing what you preach, is there any way we can know you're not collecting data?

        I get that there's no advertising or profit driven motive to incentivize data collection, but that's not the only reason to collect data. Obviously open-source code provides some of the proof.

        The only way I can think of to provide proof of this is contracting randomized auditing by a reputable external agency.

        5 votes
        1. [2]
          Deimos Link Parent
          I don't think there's ever any way to truly prove it. You could do something like an audit where some third party verifies that the code matches the open-source code and there isn't extra...

          I don't think there's ever any way to truly prove it. You could do something like an audit where some third party verifies that the code matches the open-source code and there isn't extra data-collecting code running, and the database doesn't have extra data in it, and so on. But then for extremely skeptical people you've just moved things one level deeper—if the audit was scheduled, how do we know that I didn't just swap out the code on the server and hide the data before the audit?

          Now you have to do random, unscheduled audits, but what if I'm just doing something sneaky and not truly giving them access to the real code/data, but a fake, "clean" version that looks legitimate? What if there's something secret running in the background that the auditors can't see but is sending the data off to a different server?

          10 votes
          1. Natanael Link Parent
            Client side encryption protocols like differential privacy

            Client side encryption protocols like differential privacy

            1 vote
        2. [3]
          Natanael Link Parent
          Client side encryption protocols, stuff like differential privacy. Mixnets and other anonymization techniques. @deimos

          Client side encryption protocols, stuff like differential privacy. Mixnets and other anonymization techniques. @deimos

          1 vote
          1. [2]
            rib Link Parent
            Do you know of any real world applications implemented in a website?

            Do you know of any real world applications implemented in a website?

            2 votes
            1. Natanael Link Parent
              Apple has a limited usage of differential privacy. Don't remember exactly what for.

              Apple has a limited usage of differential privacy. Don't remember exactly what for.

              2 votes
  2. [4]
    ruspaceni Link
    That talk has a lot to chew on but the first thing thing I wondered was how he feels about the Cambridge Analytica stuff coming to light. His point about the adversarial trucker relationship is...

    That talk has a lot to chew on but the first thing thing I wondered was how he feels about the Cambridge Analytica stuff coming to light.

    His point about the adversarial trucker relationship is interesting but I'm not entirely sure I grasp it. Is it along the same lines of the 'when a measure becomes a target its no longer a good measure'?

    Also, I'm glad he didn't side with "If you've got nothing to hide..." and instead goes for a "Just don't collect it" and you seem to be more than mindful when it comes to privacy so I'm at ease. That said, I wonder if there's any reasonable way of cementing the policy in place. "Was reddit always a 'bastion of free speech'?". So that future tildes doesn't have a change of heart.

    He's also right about the radium underpants bit too. I fear a lot of the tech around us is actively toxic and in those "doesn't look like much, but fucks you up so deep that even your dna changes" kind of ways. It's like some wild west gold rush on some new found island. But in the process of chasing nuggets, we've trampled eachother and eroded the land. And of course there's always kids turning into 12 year olds who will naturally make every bad decision they possibly can. It's so easy for someone to fall into one of those feedback loops and never really even notice.

    8 votes
    1. Deimos Link Parent
      I'm not sure if there's any way to truly ensure it can't change, but the privacy policy is fairly explicit (with the edit history public), and once the code is open-source people will also be able...

      That said, I wonder if there's any reasonable way of cementing the policy in place.

      I'm not sure if there's any way to truly ensure it can't change, but the privacy policy is fairly explicit (with the edit history public), and once the code is open-source people will also be able to see if anything ever changes related to privacy. Those don't really prevent starting to collect more data, but they make it almost impossible to do it without the users knowing (and most likely objecting loudly to it).

      8 votes
    2. BuckeyeSundae Link Parent
      I think the truck example was interesting in that: A measurement that would be used to try to combat driver fatigue to promote safer driving. The measurement is also attempted to show in the...

      I think the truck example was interesting in that:

      1. A measurement that would be used to try to combat driver fatigue to promote safer driving.
      2. The measurement is also attempted to show in the aggregate "accurate" driver behavior.
      3. Drivers would be still be responsible for the longstanding, more direct metric of "amount of time it takes to get from point a to point b."

      All of that together would probably mean that the direct effect of the battle between getting to a place as fast as possible and forcing a driver to sleep results in an increase of distracted, fatigued driving, not in safer driving and not in gathering accurate information either. So it isn't just that when a measure becomes a target it's no longer a good measure, but also that the run-off effects of even trying could get you to a much worse place than before you started.

      6 votes
    3. pseudolobster Link Parent
      I think it's more along the lines of the Observer Effect in physics, where you're changing the outcome by measuring something. You're not necessarily getting a closer depiction of reality by...

      His point about the adversarial trucker relationship is interesting but I'm not entirely sure I grasp it. Is it along the same lines of the 'when a measure becomes a target its no longer a good measure'?

      I think it's more along the lines of the Observer Effect in physics, where you're changing the outcome by measuring something. You're not necessarily getting a closer depiction of reality by collecting more data, which runs contrary to the cargo cult mentality of "let's just collect all the data we possibly can, more is better". The user, knowing his data is being collected is more likely to take counter-measures to avoid it.

      That's what I got from it anyway.

      4 votes
  3. [2]
    Gyrfalcon Link
    I think to me the biggest thing is just being upfront about what data is collected and how/how long it is stored. If I started to register for a site, and it tells me in plain language "Everything...

    I think to me the biggest thing is just being upfront about what data is collected and how/how long it is stored. If I started to register for a site, and it tells me in plain language "Everything you do here will be recorded forever for our use," then I can make a decision. I can decide not to use the service, if that bothers me, or to use the service as much as I am comfortable based on what is collected. The real problem is not always that data is collected and stored indefinitely, the problem is that there is no way for anyone as a user to know what is actually going on. If people know exactly what's happening, the data problem will begin to sort itself out.

    6 votes
    1. annadane Link Parent
      And also they can't just change the TOS overnight to collect more data. "I didn't agree to this data collection!" "Ah, you're referring to the 2009 TOS. In 2018..."

      And also they can't just change the TOS overnight to collect more data. "I didn't agree to this data collection!" "Ah, you're referring to the 2009 TOS. In 2018..."

      4 votes
  4. whyarentihigh Link
    I don't have any criticisms, just wanted to say that I like where Tildes is going in this regard. Obviously we can only take your word at it and what you're doing, but you seem like a trustworthy...

    I don't have any criticisms, just wanted to say that I like where Tildes is going in this regard. Obviously we can only take your word at it and what you're doing, but you seem like a trustworthy kind of person.

    5 votes
  5. [3]
    Tenar Link
    I quite like his talks, I've been linked a few of them (the website obesity crisis post is beautiful)… and it seems like his way of operating is much like yours (wrt pinboard, at least): you know...

    I quite like his talks, I've been linked a few of them (the website obesity crisis post is beautiful)… and it seems like his way of operating is much like yours (wrt pinboard, at least): you know what you're providing, but you also know that you need money, so you work around that, by either donations or subscriptions.

    Anyways, thanks for thinking of our privacy. I've got two questions: (1) the privacy policy mentions "If you communicate with us through email, we collect your email address and the contents of the messages." but the whole recovery bit is big on not saving the email address, if possible. So if I recover my account, it will still save my email address, right?

    and (2) i don't know how non-profits work in canada, but is there a guarantee you can't/won't sell our data in the future, or sell ~ to a company that will?

    4 votes
    1. [2]
      Deimos Link Parent
      Yeah, right now I'm just not doing anything in particular with email (including password reset requests)—they go into my email inbox (which is through FastMail), and I'm not specifically deleting...

      (1) the privacy policy mentions "If you communicate with us through email, we collect your email address and the contents of the messages." but the whole recovery bit is big on not saving the email address, if possible. So if I recover my account, it will still save my email address, right?

      Yeah, right now I'm just not doing anything in particular with email (including password reset requests)—they go into my email inbox (which is through FastMail), and I'm not specifically deleting anything. I should look into it more and see if it's possible to automatically delete certain types of messages, there's really no reason that I need to keep most of them. Really, password reset emails could be deleted immediately after being handled, there's no reason to retain them at all.

      (2) i don't know how non-profits work in canada, but is there a guarantee you can't/won't sell our data in the future, or sell ~ to a company that will?

      I don't know if there's a way to truly guarantee it. There are a few barriers to being able to acquire a non-profit, and if I've been deleting the data all along I don't think a company looking for a big cache of user data would be particularly interested anyway, but I'm not sure. It might be worth looking into whether I can do certain things to the corporation's articles/bylaws/etc. to make it more difficult or impossible.

      4 votes
      1. Tenar Link Parent
        Thanks again for all the answers!

        Thanks again for all the answers!

        2 votes