27 votes

Netflix, YouTube, Amazon and Apple accused of GDPR breach

43 comments

  1. [38]
    Octofox
    Link
    Does anyone know what the requirements for data export are? The main use case for it is to allow other websites to build importers so users can freely move between platforms and JSON is the...

    Similarly Google-owned YouTube also provided files that were mostly gibberish to the average person, including in esoteric file formats like JSON, he said.

    Does anyone know what the requirements for data export are? The main use case for it is to allow other websites to build importers so users can freely move between platforms and JSON is the perfect format for that.

    If you send the JSON pretty printed its pretty much as human readable as you can get without building a whole program on top of it to display the data which is pretty much what you already have on the website.

    16 votes
    1. [22]
      vakieh
      Link Parent
      Yeah, 'esoteric' is about the opposite of what I would call JSON. Do they want it written out in formal English paragraphs in a typeset Word document? One day our governments will be tech...

      Yeah, 'esoteric' is about the opposite of what I would call JSON. Do they want it written out in formal English paragraphs in a typeset Word document?

      One day our governments will be tech natives... sometimes I hope the oldies start dying faster.

      19 votes
      1. [5]
        Duncan
        Link Parent
        JSON is esoteric to the "average" person in terms of "loading their data into Excel". Lets say you export your fitbit steps for the last 2 years - you'd like to be able to get a nice graph of your...

        JSON is esoteric to the "average" person in terms of "loading their data into Excel".

        Lets say you export your fitbit steps for the last 2 years - you'd like to be able to get a nice graph of your monthly progress, and if the data was exported as CSV then the average person might have a chance to do it.

        But honestly, if they are not programmers or don't want to download and run converters then you cannot really argue that JSON is a great format for the average user.

        Personally, I prefer JSON - but that's because I understand it. Not of my relatives would know what to do with it though.

        20 votes
        1. [2]
          Greg
          Link Parent
          I honestly can't think of a better format for the job, though - this is automatically generated and logged data, it's never going to be particularly friendly for humans to read or interpret....

          I honestly can't think of a better format for the job, though - this is automatically generated and logged data, it's never going to be particularly friendly for humans to read or interpret. Monthly progress logs from Fitbit might lend themselves well to CSV, but a dump from their database of everything they have on you is far easier to handle in JSON.

          Opening, reading, and interpreting large volumes of programatically generated data is a specialist task whichever way you slice it; beyond the occasional ctrl-f on a huge block of text (at which point format is almost irrelevant), you'll need a custom written reader or importer for each source site. JSON allows that easily, and in my opinion complies with both the letter and spirit of the law. If they were using proprietary formats, weird binaries, or images of text (all of which I can imagine some companies trying to get away with), then I'll start getting annoyed.

          12 votes
          1. frickindeal
            Link Parent
            How would an average non-tech person even open a JSON file? I'd probably try to open in Notepad.

            How would an average non-tech person even open a JSON file? I'd probably try to open in Notepad.

            4 votes
        2. [2]
          vakieh
          Link Parent
          CSV would fail miserably as well, because the data will always be hella gappy. Data giants make pictures out of 0.01% data completeness states.

          CSV would fail miserably as well, because the data will always be hella gappy. Data giants make pictures out of 0.01% data completeness states.

          2 votes
          1. Duncan
            Link Parent
            This is often true - I replied in another comment that I would prefer CSV if it was done properly but it isn't so I download as JSON

            This is often true - I replied in another comment that I would prefer CSV if it was done properly but it isn't so I download as JSON

      2. Greg
        Link Parent
        Even that's only half the battle. Native users of tech will at least offer some protection from arguments like "An Internet was sent by my staff at 10 o'clock in the morning on Friday. I got it...

        Even that's only half the battle. Native users of tech will at least offer some protection from arguments like "An Internet was sent by my staff at 10 o'clock in the morning on Friday. I got it yesterday [Tuesday].", but users aren't necessarily makers.

        Especially as the primary form of tech moves to easy to use, proprietary, walled-garden devices and ecosystems, the knowledge gap between those who build the system and those who use them grows.

        I guess the silver lining is that the more ubiquitous technology is, the more jobs there are for developers and IT staff - meaning more trained people out there in the world who might catch these mistakes before they cause actual harm.

        7 votes
      3. [4]
        Comment deleted by author
        Link Parent
        1. [3]
          vakieh
          Link Parent
          Who is paying for that?

          Who is paying for that?

          1. [2]
            Ludo
            Link Parent
            The companies collecting the data. If they don't want to, they could simply stop collecting data.

            The companies collecting the data. If they don't want to, they could simply stop collecting data.

            4 votes
            1. vakieh
              Link Parent
              This is not close to realistic. No company has the capacity to do this for more than a handful of users. You would simply see the EU geoblocked from every service with exposure to EU courts until...

              This is not close to realistic. No company has the capacity to do this for more than a handful of users. You would simply see the EU geoblocked from every service with exposure to EU courts until the rioting saw the law overturned.

      4. [8]
        Octofox
        Link Parent
        I could agree if it was all minified on a single line but JSON can be very readable. This is fine though { username: 'foobar', comment: 'some text', created_on: '2018-02-03' } Anyone could read that.

        I could agree if it was all minified on a single line but JSON can be very readable.

        This is fine though

        {
             username: 'foobar',
             comment: 'some text',
             created_on: '2018-02-03'
        }
        

        Anyone could read that.

        6 votes
        1. [2]
          Duncan
          Link Parent
          Yes agree that anyone can read it (assuming they know enough to open the JSON file they double clicked with a text editor app and not something else), but having a normal user import that into...

          Yes agree that anyone can read it (assuming they know enough to open the JSON file they double clicked with a text editor app and not something else), but having a normal user import that into another format [which doesn't explicitly import JSON] is quite difficult

          5 votes
          1. Octofox
            Link Parent
            The user doesn't do this. Competing websites build tools to import the data allowing them to easily move new customers over.

            The user doesn't do this. Competing websites build tools to import the data allowing them to easily move new customers over.

        2. [5]
          tumbzilla
          Link Parent
          I think the main complaint about JSON, and also the thing that makes it such a useful serialization format, is the fact that you can have pretty complex nested structures in JSON. Even basic...

          I think the main complaint about JSON, and also the thing that makes it such a useful serialization format, is the fact that you can have pretty complex nested structures in JSON.

          Even basic formatting such as tab-indenting to view the nested level of a field isn't obvious to someone who is tech illiterate

          4 votes
          1. [4]
            Greg
            Link Parent
            Genuine question, and I extend this to @Duncan and @nsz as well: which format would you prefer? Data is never going to be accessible as prose - at least using a well known and (more or less) self...

            Genuine question, and I extend this to @Duncan and @nsz as well: which format would you prefer?

            Data is never going to be accessible as prose - at least using a well known and (more or less) self documenting machine readable format means that any developer can write and publish their own take on a UI to browse, interpret, or import the information. If the company that controls the data made the UI, then it's actually a step harder to get out of their systems (and/or to reinterpret into a different interface).

            4 votes
            1. Duncan
              (edited )
              Link Parent
              In an ideal world I would prefer CSV exports - but this only works if the company providing the data: is a proper CSV export that handles quotes, line breaks, etc has the information split into...

              In an ideal world I would prefer CSV exports - but this only works if the company providing the data:

              1. is a proper CSV export that handles quotes, line breaks, etc
              2. has the information split into separate CSV files where needed
              sales.csv = date, customer_id, product_id, quantity
              customer.csv = name, address, blah, 
              product.csv = product_id, name, details, cost, sell_price
              

              This happens often but not always, so in reality I choose JSON formats as they are usually better than the other formats company provides. I wrote a blog post about this here How to Download your personal data from the cloud

              which shows facebook export as HTM gives you much less than their JSON export (in that case it is best to download both versions - HTM to browse through and Ctrl F, and the JSON format if you want to import it elsewhere.

              2 votes
            2. [2]
              nsz
              (edited )
              Link Parent
              I don't have anything against JASON or any format in particular, I just think apple, google or what ever should supply by default an interpreter or UI for that data. Something to present it in a...

              I don't have anything against JASON or any format in particular, I just think apple, google or what ever should supply by default an interpreter or UI for that data. Something to present it in a human readable way.

              I had some questions about battery and sleep timers on my win 10 laptop their is a great tool (built into windows) to run diagnostics and then it generates an html report, opens in any browser and is actually useful.

              1 vote
              1. Greg
                Link Parent
                That seems fair. I guess the main issue there would be the difficulty of clearly legislating what's compliant and what isn't when it comes to UI, whereas machine readable format is more...

                That seems fair. I guess the main issue there would be the difficulty of clearly legislating what's compliant and what isn't when it comes to UI, whereas machine readable format is more definitive. That said, it would certainly be nice to have as an addition (and as others have pointed out, some services do provide it).

      5. [3]
        nsz
        Link Parent
        Should you have to be a mechanic to own a car? Their is no reason the same idea should apply to computers. And this generation of digital natives … does not exist, they will be trying to swipe,...

        Should you have to be a mechanic to own a car? Their is no reason the same idea should apply to computers. And this generation of digital natives … does not exist, they will be trying to swipe, tap and open an appstore—accustomed to using the tools on offer not developing their own. And their is nothing wrong with that, it seems like a terrible waist of energy to force everyone to understand a JASON file or any other obscure format. When these enormous companies with vast resources have a very clear understanding of what it means to be user-friendly and accessible—how else would be find that purchase button.

        It's clearly a lack of effort these companies and they should be called out on it. Why defend them? to what end?

        5 votes
        1. onyxleopard
          Link Parent
          No, but if you are willing to treat a car like a black box, you have to be willing to rely on a mechanic who doesn’t treat the car like a black box to maintain and repair it, or simply not use a...
          • Exemplary

          Should you have to be a mechanic to own a car?

          No, but if you are willing to treat a car like a black box, you have to be willing to rely on a mechanic who doesn’t treat the car like a black box to maintain and repair it, or simply not use a car at all.

          If you’re going to treat the applications/services you use like black boxes, then you have to be willing to rely on the maintainers and operators of those applications/services to handle your data in a format that is suitable for those applications/services to operate.

          FYI, the file format is JSON (not JASON), an initialism for Javascript Object Notation. It is absolutely not an esoteric or obscure file format and is used to serialize vast amounts of data on a daily basis—far more data is stored as JSON than in Excel or any user-facing application that has ever existed.

          When these enormous companies with vast resources have a very clear understanding of what it means to be user-friendly and accessible—how else would be find that purchase button.

          The whole reason that internet-based applications and services are able to be user-friendly and accessible is due to a lot of stuff that is not user-facing. As I said earlier, it’s OK if you don’t want to familiarize yourself with the technologies backing the stuff you use. But, you can’t proclaim that they are purposefully obstructing you from understanding things that you expressly refuse to take an interest in. You can’t remain willfully ignorant, yet simultaneously demand operational transparency. If you required a human to go in and parse every one of your transactions on Amazon and write up a report in natural language prose, it would take years for your packages to arrive. It’s infeasible to have tech-illiterate people stipulate how technology should work.

          9 votes
        2. vakieh
          Link Parent
          No, but you should be a driver before you give input on what rules should be enforced on driving one.

          Should you have to be a mechanic to own a car?

          No, but you should be a driver before you give input on what rules should be enforced on driving one.

      6. [2]
        Comment deleted by author
        Link Parent
        1. Greg
          Link Parent
          In my experience it's a mixed bag - I posted a little further up with a similar point about some younger users not knowing the underpinnings of the tech, but I've also had plenty of conversations...

          In my experience it's a mixed bag - I posted a little further up with a similar point about some younger users not knowing the underpinnings of the tech, but I've also had plenty of conversations with older people who follow the classic Reddit not a computer person format and just will not read text or follow instructions. I've also had very good experiences with both older and younger people who are willing to learn and grateful for help - the older generally starting from "what does this do, and how?" the younger with "how does it do the things I'm already doing?".

          Some people make an effort to understand, but many don't - from what I've seen, that holds true across the generations.

          4 votes
    2. [14]
      nsz
      Link Parent
      How do you figure that? Just browsing the Wikipedia article on GDPR it seems like the goal is to protect user data by default and allow users to control and understand the data companies have on...

      The main use case for it is to allow other websites to build importers (...)

      How do you figure that? Just browsing the Wikipedia article on GDPR it seems like the goal is to protect user data by default and allow users to control and understand the data companies have on them. Making an effort to present this data seems like an obvious requirement.

      5 votes
      1. [7]
        Adys
        Link Parent
        GDPR IIRC does specifically talk about the exportability of data, and ensuring its programmatic-readability. In other words, "paragraphs of text in MS word" would 100% not fly. Furthermore, Google...

        GDPR IIRC does specifically talk about the exportability of data, and ensuring its programmatic-readability.

        In other words, "paragraphs of text in MS word" would 100% not fly.

        Furthermore, Google Takeout does include a UI which presents a bunch of the data contained in the export.

        9 votes
        1. [6]
          nsz
          Link Parent
          Right that's exactly what all these companies mentioned in the article are lacking, an interpreter for the data and that's what they are getting called out on.

          Right that's exactly what all these companies mentioned in the article are lacking, an interpreter for the data and that's what they are getting called out on.

          2 votes
          1. [2]
            Comment deleted by author
            Link Parent
            1. nsz
              Link Parent
              Thanks for these links, and bringing the discussion back, I sort of started to argue with against a position no one was really holding.

              Thanks for these links, and bringing the discussion back, I sort of started to argue with against a position no one was really holding.

              4 votes
          2. [4]
            Deimos
            Link Parent
            The GDPR specifically exempts companies from needing to maintain interpreters for the data. From Recital 68 - Right of data portability: They are required to provide you the data in an acceptable...

            The GDPR specifically exempts companies from needing to maintain interpreters for the data. From Recital 68 - Right of data portability:

            1. The data subject’s right to transmit or receive personal data concerning him or her should not create an obligation for the controllers to adopt or maintain processing systems which are technically compatible.

            They are required to provide you the data in an acceptable format, but they have no obligation to help you interpret it. It would certainly be nice if they did, but they're not required to.

            9 votes
            1. [3]
              nsz
              Link Parent
              So the 'Data was Intelligible' column would refer to machine readability? or organisation? Either way looks like I've got to eat my hat.

              So the 'Data was Intelligible' column would refer to machine readability? or organisation? Either way looks like I've got to eat my hat.

              3 votes
              1. [2]
                Deimos
                Link Parent
                It's hard to say exactly what that means, but as a guess/example of what sort of thing might be happening, say that you requested your data from Tildes. One of the files I send you has your...

                It's hard to say exactly what that means, but as a guess/example of what sort of thing might be happening, say that you requested your data from Tildes. One of the files I send you has your comment votes in it, and it includes that you voted on "comment 115221". That's the ID of my comment above that you were replying to, so it's technically correct that I provided you data about what comments you had voted on, but it's not really intelligible since the comment ID is pretty meaningless.

                I expect that's what a lot of companies are doing, just dumping raw data like "comment IDs you liked" out, which isn't really very useful or intelligible to the recipients because it relies on other data that isn't included.

                6 votes
                1. nsz
                  Link Parent
                  Ah yeah that makes sense, thanks for that.

                  Ah yeah that makes sense, thanks for that.

                  2 votes
      2. [3]
        Greg
        Link Parent
        I'd much prefer to have the data in an open format for anyone to build an interpreter for, rather than have to rely on an inherently biased UI created by the controlling company.

        I'd much prefer to have the data in an open format for anyone to build an interpreter for, rather than have to rely on an inherently biased UI created by the controlling company.

        5 votes
        1. unknown user
          Link Parent
          AFAIK Firefox can display JSON in a nice format. But maybe this will be a new space for apps that can tell you stats abot your takeouts and use heuristics to find bad patterns and privacy breaches.

          AFAIK Firefox can display JSON in a nice format. But maybe this will be a new space for apps that can tell you stats abot your takeouts and use heuristics to find bad patterns and privacy breaches.

          4 votes
        2. nsz
          Link Parent
          I don't see how these two goals are mutually exclusive. Company X can provide the data in an open format and as well as default UI to view it in.

          I don't see how these two goals are mutually exclusive. Company X can provide the data in an open format and as well as default UI to view it in.

          2 votes
      3. [2]
        Soptik
        Link Parent
        And google did it, see your activity summary - all the actions you want to know about are there. The data dump is meant as data dump so you can easily switch platforms, it's not meant to be viewed...

        And google did it, see your activity summary - all the actions you want to know about are there. The data dump is meant as data dump so you can easily switch platforms, it's not meant to be viewed by casual users - that's the activity page for.

        1 vote
        1. nsz
          Link Parent
          I found the relevant Wikipedia snippet. It talks about giving access rights to the individual user as well as the ability to transfer the data.

          I found the relevant Wikipedia snippet.

          It talks about giving access rights to the individual user as well as the ability to transfer the data.

          The right of access (Article 15) is a data subject right.[17] It gives citizens the right to access their personal data and information about how this personal data is being processed. (…)

          A data subject must be able to transfer personal data from one electronic processing system to and into another, without being prevented from doing so by the data controller. (…)

          1 vote
      4. Octofox
        Link Parent
        The GDPR on a whole is about protecting user data but the export part is called the "data portability" part which allows you to easily leave one bad service and take your data over to an alternative.

        The GDPR on a whole is about protecting user data but the export part is called the "data portability" part which allows you to easily leave one bad service and take your data over to an alternative.

        1 vote
    3. Deimos
      Link Parent
      The relevant GDPR section is Article 20 - "Right to data portability", which says: And Recital 68 - "Right of data portability" has some relevant info as well: JSON is definitely structured,...

      The relevant GDPR section is Article 20 - "Right to data portability", which says:

      The data subject shall have the right to receive the personal data concerning him or her, which he or she has provided to a controller, in a structured, commonly used and machine-readable format [...]

      And Recital 68 - "Right of data portability" has some relevant info as well:

      1. To further strengthen the control over his or her own data, where the processing of personal data is carried out by automated means, the data subject should also be allowed to receive personal data concerning him or her which he or she has provided to a controller in a structured, commonly used, machine-readable and interoperable format, and to transmit it to another controller.

      2. Data controllers should be encouraged to develop interoperable formats that enable data portability.

      JSON is definitely structured, commonly used, machine-readable, and interoperable, so it's a reasonable choice of format.

      4 votes
  2. [4]
    patience_limited
    Link
    Again, losing the forest for the trees here. We might be able to read and interpret the file, but the content doesn't comply with GDPR requirements. The problem is, a log file in whatever format...

    Again, losing the forest for the trees here. We might be able to read and interpret the file, but the content doesn't comply with GDPR requirements.

    The problem is, a log file in whatever format isn't going to provide you with user-friendly information about the entities accessing your data. An IP address doesn't map to a unique commercial enterprise.

    Presuming that those enterprises might be data brokerages, there's nothing to tell you how they might be collecting and disseminating your data further.

    13 votes
    1. unknown user
      Link Parent
      I think the key quote in the article is (cc @Octofox): So these data are gibberish in that sense, not the format. If the data was never ever shared, this entire thing would be more of a data...

      I think the key quote in the article is (cc @Octofox):

      When tested, none of these systems provided the user with all relevant data. In most cases, users only got the raw data, but, for example, no information about who this data were shared with. (emphasis mine)

      So these data are gibberish in that sense, not the format. If the data was never ever shared, this entire thing would be more of a data security movement rather than a quest for privacy. These companies are trying to hide that behind the data they are providing. So patience_limited is right that we're made to miss the forest for the trees.

      13 votes
    2. [2]
      Deimos
      Link Parent
      Another issue that I've seen brought up a few times is that the companies will send you the raw data, but not any information about things like machine-learning systems that they've fed it into,...

      Another issue that I've seen brought up a few times is that the companies will send you the raw data, but not any information about things like machine-learning systems that they've fed it into, and the implications that they've generated from those, based on your data.

      For example, Netflix will send you a list of what you've watched and your thumbs-up/thumbs-downs, but not any information about how they're using that data to infer what types of shows/movies you like or dislike, the demographic(s) they believe you fit into, and so on.

      It's a bit of a weird situation, because it's not really data that you gave them, but they created it based on what you did give them, and then they're using it as data about you.

      8 votes
      1. [2]
        Comment deleted by author
        Link Parent
        1. Deimos
          Link Parent
          I think it's just not necessarily clear exactly what's covered. Continuing the Netflix example, they may not even have the data in a format that's as interpretable as "we think they like...

          I think it's just not necessarily clear exactly what's covered. Continuing the Netflix example, they may not even have the data in a format that's as interpretable as "we think they like martial-arts movies set in modern times". It might be basically a machine-learning model that just gets applied to each show that generates a "there's a X% chance they'll like this" result without any ability to explain why, but then what can they provide you? A blob of binary data that represents "your preferences" for their recommendation system won't do any good without all of the other infrastructure and data around it.

          So it's just kind of a weird situation overall, and I think we probably won't necessarily know how it's supposed to be handled until some precedents get set through some of these complaints about companies not complying.

          5 votes
  3. patience_limited
    Link
    My domain of knowledge is based on U.S. HIPAA, not GDPR, and they cover very different data sets and rights. I'm not really in a position to opine in detail here, but I picked my spouse's brains,...

    My domain of knowledge is based on U.S. HIPAA, not GDPR, and they cover very different data sets and rights.

    I'm not really in a position to opine in detail here, but I picked my spouse's brains, since his company provides web middleware for multinational clients.

    The middleware doesn't store any collected data persistently. But the clients' sites can deploy all kinds of cookies, web beacons and other tools that disseminate personally identifiable data to third parties - Facebook, Twitter, and other visible sharing buttons, Google Analytics, and so on.

    The clients are subject to GDPR requirements, but those third-party components don't necessarily allow the clients to provide all the data GDPR requires about what those tools are collecting. The general public may think of "Company X", whose website they've visited, as the custodian of the data, but this simply isn't the case. Again, not an expert, but this seems to be a blind spot in the GDPR requirements.

    Short of requiring everyone who holds personally identifiable data to register their holdings publicly, in a manner that the general public can search, it's still a black box as to who's doing what with their knowledge of you.

    3 votes