I just had an infuriating one of my own. My stereo has an option to opt-out of data collection. But it's worded as 'Rejecting the privacy policy.' It will nag you on every restart, even if you...
The experience of installing, setting up and using the operating systems is an exercise in finding new novel ways to decline things or simply capitulating to see if it makes the nagging go away.
I just had an infuriating one of my own. My stereo has an option to opt-out of data collection. But it's worded as 'Rejecting the privacy policy.' It will nag you on every restart, even if you accept and then turn off manually later.
All client-side data collection should be opt-in only, with no prompts to do so unless an account is logged into. If that's not deemed acceptable, at the very least user selections to opt-out should be persisted just as much as the opt-in.
Agree completely on the last part, but I wonder how to properly balance this with the need for not-so-sensitive not-so-identifying information for telemetry purposes. If I have my own hard- and...
All client-side data collection should be opt-in only, with no prompts to do so unless an account is logged into. If that's not deemed acceptable, at the very least user selections to opt-out should be persisted just as much as the opt-in.
Agree completely on the last part, but I wonder how to properly balance this with the need for not-so-sensitive not-so-identifying information for telemetry purposes. If I have my own hard- and software out there, I might want to receive logs from rare crashes, and maybe I am not able to reliably scrub identifying info from those logs, or maybe the crash logs are practically unidentifying. Should I hope that the user goes into the settings and enables telemetry?
I don't know why your comment has unearthed some deep primal rage within me, but I'm going to take a super short rant. Up front notice that none of this anger is directed at you and this comment...
I don't know why your comment has unearthed some deep primal rage within me, but I'm going to take a super short rant. Up front notice that none of this anger is directed at you and this comment should be marked as noise.
My last job was very very closely tied to customer support. I was the only engineer on a cash-cow project, and had one customer support person dedicated to fielding issues with that product. Dealing with crashes or bugs from customers without telemetry is the fucking woooooorrrsssssstttttt. People come to you angry and disrespectful as all hell, and then even when you point them to the exact location of the log files and ask them to send them to you, it still takes 5 emails of "We're working on it but if you'd send us those logs we asked for it would help us a lot" to get them to send them over, and the whole time they're complaining about how long it is taking to fix while ignoring every question and request you send them. At one point we got fed up and started adding telemetry. We hated the idea, but it was better than the goddamn pulling teeth of getting logs from people. I got so burnt out from dealing with customers I don't think I'll ever work another job where I interact with customers unless I have a way to circumvent them to get the data I need. I don't think that is the reason or way most companies are using telemetry. This is very unrelated to your post so go ahead and mark this as noise, but something about the logging statement has really pointed out some unresolved issues I might need to bring up with my therapist this week....
Yea, most end users don't submit logs, so sure have detailed, local telemetry logging enabled (heck I would love this as an end user) but don't upload it automatically without opt-in. Lower...
Yea, most end users don't submit logs, so sure have detailed, local telemetry logging enabled (heck I would love this as an end user) but don't upload it automatically without opt-in. Lower priority of any ticket/issue that doesn't have that log.
Somewhat tangential rant to your rant as well:
We're working on it but if you'd send us those logs we asked for it would help us a lot" to get them to send them over, and the whole time they're complaining about how long it is taking to fix while ignoring every question and request you send them.
I've had to interact with vendors professionally, and the number of them that don't even glance at the ticket or provided logs before saying 'please provide logs' and bouncing the ticket back is too damn high. Then we do as they say, then they dodge and ignore every request until we involve a VP who escalates to higher levels of their org on the business side, which finally ends up getting the ticket routed appropriately and solved in < 8 hours with no new information provided.
Yeah that's fair, I suppose it is easy for me to be frustrated because I know my team wasn't doing that. I'm best friends with that person so I knew that they only asked for things that were...
Yeah that's fair, I suppose it is easy for me to be frustrated because I know my team wasn't doing that. I'm best friends with that person so I knew that they only asked for things that were necessary, but our small 30-something person company isn't indicative of how the rest of the industry acts. Especially at a customer support level.
I hear you on the information being useful. Did you ever think of having a feature in the app where the user could press a button and it would send you the logs instead of collecting telemetry...
I hear you on the information being useful. Did you ever think of having a feature in the app where the user could press a button and it would send you the logs instead of collecting telemetry continuously? (Sorry if that’s not what you were doing, but it sounded like it from your description.)
I have dealt with logs that showed where a crash occurred, but it was useless without one more bit of info. When a crash occurred, we let the user enter a brief description of what they were doing. About 1 in 100 had a useful description that helped us figure out the problem. The other 99 usually had swear-filled rants if they had anything at all. It’s certainly a hard problem but there are less invasive ways to solve it than always collecting info from all users without their consent, which is what most businesses end up doing.
So its been a number of years but when I left what we had landed on was multi-prong approach: The most minimal of minimal telemetry auto-reporting to us. EDIT: I should note that when I worked...
So its been a number of years but when I left what we had landed on was multi-prong approach:
The most minimal of minimal telemetry auto-reporting to us. EDIT: I should note that when I worked there the team size meant I was on all sales calls as technical rep so I made a point to explain to all customers what telemetry we collected and why we collected it, and how to opt-out in the app.
When a crash occurs, pop up a dialog that said "hey, looks like something went wrong. Do you want us to auto-generate a support ticket for you? You can do this with or without sending your log file to us, but sending the logs will greatly speed up how quickly we can resolve your issue" and put a checkbox below that allowed the users to decide if they wanted us to have their logs or not (it use to not have the checkbox but people got pissed about that so we made it optional).
The two extra little wrinkles that make this extra tricky are:
The product I worked on was an excel plugin for doing NLP. Which means sometimes when things shit the bed its because of us, but sometimes it is microsoft, and users aren't always the best at determining which is which.
There isn't a good way to auto-send logs to yourself on bugs that don't cause crashes, like starting a job to get sentiment analysis of 10 million tweets but the job hangs infinitely. Without getting super deep into implementation details, there wasn't a good way to figure out when these bugs occurred from the program's perspective, so we were 100% reliant on customer's reporting to us, and no matter how many places we put an "attach log files" checkbox to do the work for them, very few people click that checkbox.
Often it’s the case where some Very Important User used the system and it failed, then someone else goes to report it to support but neither the reporter or us can replicate the issue. Sometimes...
Often it’s the case where some Very Important User used the system and it failed, then someone else goes to report it to support but neither the reporter or us can replicate the issue. Sometimes we have resorted to screen sharing sessions and found it was the users anti data leak software ruining things.
Automatic error reporting would save so much time and frustration in these situations as we don’t have to involve the user at all when investigating.
I was mostly thinking of websites where the options are 'Click the obvious button to opt-in' or 'Click the non-obvious button to fill out a form to prtially opt-out.' It's their abuse of the...
I was mostly thinking of websites where the options are 'Click the obvious button to opt-in' or 'Click the non-obvious button to fill out a form to prtially opt-out.'
It's their abuse of the popover which trains users to auto-click the opt-in making it more of a opt-out formality than a proper opt-in. I guess a standardized popup which just says 'May we track everything you do on this website' and an option to say 'Never grant this permission.' Do Not Track was supposed to do this. Look how well that ended.
Should I hope that the user goes into the settings and enables telemetry?
Ultimately, yes if you want it to be automatic, though I suppose one un-checked box to opt in on install would be ok. I also approve of the classic "This program crashed, click this button to send detailed information about what you were doing to the developer so they can try to fix it".
Major parts of consent are that you understand what you are agreeing to, and that no means no. Most websites and software do not give tthe slightest respect to either of these, especially when it is well known that users do not read anything that impedes their desired usage. For a user to consent, that consent must be given in a non-blocking manner.
I think I agree with you, though I'm not sure. Here's my thought: Both options of the consent form should be treated at least equally. Putting one in a graphically suggestive form, "forgetting"...
I think I agree with you, though I'm not sure. Here's my thought: Both options of the consent form should be treated at least equally. Putting one in a graphically suggestive form, "forgetting" one setting but not the other, making rejection harder than consent; all of these are invalid in my book.
What is valid consent (though maybe shitty UI design, depending on how critical your telemetry is) in my book though, is a simple, blocking form. One button to reject all, one for accept all, one for detailed settings if need be. Buttons are clearly labeled, do as they should and none is given preferential treatment using e.g. UI design. Info is provided on what data is sent and stored, whether it's anonymous and whether it might be deanonymizable. If we can't have that, we might be going a bit too far here in my opinion. Consider for example the plenty of websites that we expect to store some amount of personal data.
I'd say yes, we're mostly on same page. However, the reason I say non-blocking is because users don't read if it's interfering with what they're doing. I've personally fielded support calls like...
I'd say yes, we're mostly on same page.
However, the reason I say non-blocking is because users don't read if it's interfering with what they're doing.
I've personally fielded support calls like this:
User: I cant log in, is the app broken?
Me: What does the error say?
User: What error?
This is followed by a remote session, where the user clicks through the prompt almost instinctively. The user was recieving a popup that their password was wrong, and they would be locked out after 5 attempts. They had already locked their account before even calling.
You could put a random popup on a desktop app that says 'By clicking OK, you consent for this app to send every password you copy or type to me, so that I can extract $50 from your bank account whenever I want' and I'll bet you a nickle you get at least a 40% clickthrough rate.
If the button says OK or Accept, and they were trying to do something, they will click that button and not read another word.
Yeeeah, I can certainly see that. Maybe the key word should be "consent" and "decline" here? I suspect GDPR cookie popups have had a good bit of training effect. That way, the default option would...
Yeeeah, I can certainly see that. Maybe the key word should be "consent" and "decline" here? I suspect GDPR cookie popups have had a good bit of training effect. That way, the default option would at least be safe.
Also a bit of a "fuck you if you don't read what you're accepting". I can appreciate the art of building software such that functionally brain dead people can operate it, but I don't have to compromise other needs to ensure that they will be protected from their own stupidity. So if someone ends up clicking consent before reading, too bad.
I don't like the no-reading thing either. But people are creatures of habit. So 'boy who cried wolf' syndrome for things like permissions and security alerts means the majority just get trained to...
I don't like the no-reading thing either. But people are creatures of habit. So 'boy who cried wolf' syndrome for things like permissions and security alerts means the majority just get trained to click 'allow all' no matter what. And when 90% or more of all agreements require a few years of law school to decipher and the choice is always 'accept or no functionality for you,' it's unsurprising that nobody reads.
And yes, it's annoying to need to work around these problems, but being aware of them is what makes me take the pro-consumer stance by default. If it is well known that 90% of the population will just accept the default option, the default option should be the actual better option for the consumer.
The problem is that it’s really difficult to determine whether or not something is identifying. For example, there was a case recently where taxis in New York had their route information made...
I wonder how to properly balance this with the need for not-so-sensitive not-so-identifying information for telemetry purposes.
On macOS if you look at a crash log, you can see that even with your username removed, it contains a list of every running application on your computer and what libraries they have loaded at the time any app crashed. In my case, I often run my own programs that are unique to me and my machine. And just like browser fingerprinting, the collection of loaded libraries on my machine often includes plugins that can probably uniquely identify me.
It is very difficult to anonymize data properly. (I would argue it’s probably impossible.)
Yes, it's very difficult to do it in a way that makes it safe to make anonymized data public. I think it's a good safeguard for internal data though. It guards against mistakes where some people...
Yes, it's very difficult to do it in a way that makes it safe to make anonymized data public.
I think it's a good safeguard for internal data though. It guards against mistakes where some people don't understand company policy.
You might compare with password storage. They are typically hashed and kept private. One or the other isn't enough.
Absolutely. I was thinking about that while posting. Properly anonymizing things is hard, and things as simple as your hardware spec can give a lot of information and/or decrease the number of...
Absolutely. I was thinking about that while posting. Properly anonymizing things is hard, and things as simple as your hardware spec can give a lot of information and/or decrease the number of possible subjects of that piece of info by a factor of 100 or more.
So I completely agree. But I also think it's possible to make telemetry for technical purposes reasonably close to properly anonymous. And there are theoretical considerations for proper anonymization that involve statistics and information theory and should (if you model your assumptions properly) give properly anonymous results. These are, however, quite restrictive in the sense that even things that no reasonable person (or 4chan sleuth) would find deanonymizable, are considered not anonymous enough.
To give a naive shot at the macOS case, without having read the literature I mentioned in passing above, you'd ask for every library and app loaded "how much information does this give to an attacker?". At the least, a boolean feature will cut your subject list in half (at worst, it'll rule in only one individual because it's unique to that person). You'll quickly realize that only after a few dozen boolean features, you've limited the subject list to a single person, even starting from the world population.
FWIW, the taxi database smells immediately vulnerable to me. It's got someone's adress right in there, basically.
Definitely agree on all points with the author. What I found most interesting is the live reader count metric on the top right of the article page. When you click on "magic enabled", it displays...
Definitely agree on all points with the author.
What I found most interesting is the live reader count metric on the top right of the article page. When you click on "magic enabled", it displays small shapes that represents how far along other readers are through the article. Never seen something like this before!
I just had an infuriating one of my own. My stereo has an option to opt-out of data collection. But it's worded as 'Rejecting the privacy policy.' It will nag you on every restart, even if you accept and then turn off manually later.
All client-side data collection should be opt-in only, with no prompts to do so unless an account is logged into. If that's not deemed acceptable, at the very least user selections to opt-out should be persisted just as much as the opt-in.
Agree completely on the last part, but I wonder how to properly balance this with the need for not-so-sensitive not-so-identifying information for telemetry purposes. If I have my own hard- and software out there, I might want to receive logs from rare crashes, and maybe I am not able to reliably scrub identifying info from those logs, or maybe the crash logs are practically unidentifying. Should I hope that the user goes into the settings and enables telemetry?
I don't know why your comment has unearthed some deep primal rage within me, but I'm going to take a super short rant. Up front notice that none of this anger is directed at you and this comment should be marked as noise.
My last job was very very closely tied to customer support. I was the only engineer on a cash-cow project, and had one customer support person dedicated to fielding issues with that product. Dealing with crashes or bugs from customers without telemetry is the fucking woooooorrrsssssstttttt. People come to you angry and disrespectful as all hell, and then even when you point them to the exact location of the log files and ask them to send them to you, it still takes 5 emails of "We're working on it but if you'd send us those logs we asked for it would help us a lot" to get them to send them over, and the whole time they're complaining about how long it is taking to fix while ignoring every question and request you send them. At one point we got fed up and started adding telemetry. We hated the idea, but it was better than the goddamn pulling teeth of getting logs from people. I got so burnt out from dealing with customers I don't think I'll ever work another job where I interact with customers unless I have a way to circumvent them to get the data I need. I don't think that is the reason or way most companies are using telemetry. This is very unrelated to your post so go ahead and mark this as noise, but something about the logging statement has really pointed out some unresolved issues I might need to bring up with my therapist this week....
Yea, most end users don't submit logs, so sure have detailed, local telemetry logging enabled (heck I would love this as an end user) but don't upload it automatically without opt-in. Lower priority of any ticket/issue that doesn't have that log.
Somewhat tangential rant to your rant as well:
I've had to interact with vendors professionally, and the number of them that don't even glance at the ticket or provided logs before saying 'please provide logs' and bouncing the ticket back is too damn high. Then we do as they say, then they dodge and ignore every request until we involve a VP who escalates to higher levels of their org on the business side, which finally ends up getting the ticket routed appropriately and solved in < 8 hours with no new information provided.
Yeah that's fair, I suppose it is easy for me to be frustrated because I know my team wasn't doing that. I'm best friends with that person so I knew that they only asked for things that were necessary, but our small 30-something person company isn't indicative of how the rest of the industry acts. Especially at a customer support level.
I hear you on the information being useful. Did you ever think of having a feature in the app where the user could press a button and it would send you the logs instead of collecting telemetry continuously? (Sorry if that’s not what you were doing, but it sounded like it from your description.)
I have dealt with logs that showed where a crash occurred, but it was useless without one more bit of info. When a crash occurred, we let the user enter a brief description of what they were doing. About 1 in 100 had a useful description that helped us figure out the problem. The other 99 usually had swear-filled rants if they had anything at all. It’s certainly a hard problem but there are less invasive ways to solve it than always collecting info from all users without their consent, which is what most businesses end up doing.
So its been a number of years but when I left what we had landed on was multi-prong approach:
The two extra little wrinkles that make this extra tricky are:
That sounds like a really reasonable way to do it! Good on your company for being so up front about it.
Often it’s the case where some Very Important User used the system and it failed, then someone else goes to report it to support but neither the reporter or us can replicate the issue. Sometimes we have resorted to screen sharing sessions and found it was the users anti data leak software ruining things.
Automatic error reporting would save so much time and frustration in these situations as we don’t have to involve the user at all when investigating.
I was mostly thinking of websites where the options are 'Click the obvious button to opt-in' or 'Click the non-obvious button to fill out a form to prtially opt-out.'
It's their abuse of the popover which trains users to auto-click the opt-in making it more of a opt-out formality than a proper opt-in. I guess a standardized popup which just says 'May we track everything you do on this website' and an option to say 'Never grant this permission.' Do Not Track was supposed to do this. Look how well that ended.
Ultimately, yes if you want it to be automatic, though I suppose one un-checked box to opt in on install would be ok. I also approve of the classic "This program crashed, click this button to send detailed information about what you were doing to the developer so they can try to fix it".
Major parts of consent are that you understand what you are agreeing to, and that no means no. Most websites and software do not give tthe slightest respect to either of these, especially when it is well known that users do not read anything that impedes their desired usage. For a user to consent, that consent must be given in a non-blocking manner.
I think I agree with you, though I'm not sure. Here's my thought: Both options of the consent form should be treated at least equally. Putting one in a graphically suggestive form, "forgetting" one setting but not the other, making rejection harder than consent; all of these are invalid in my book.
What is valid consent (though maybe shitty UI design, depending on how critical your telemetry is) in my book though, is a simple, blocking form. One button to reject all, one for accept all, one for detailed settings if need be. Buttons are clearly labeled, do as they should and none is given preferential treatment using e.g. UI design. Info is provided on what data is sent and stored, whether it's anonymous and whether it might be deanonymizable. If we can't have that, we might be going a bit too far here in my opinion. Consider for example the plenty of websites that we expect to store some amount of personal data.
I'd say yes, we're mostly on same page.
However, the reason I say non-blocking is because users don't read if it's interfering with what they're doing.
I've personally fielded support calls like this:
This is followed by a remote session, where the user clicks through the prompt almost instinctively. The user was recieving a popup that their password was wrong, and they would be locked out after 5 attempts. They had already locked their account before even calling.
You could put a random popup on a desktop app that says 'By clicking OK, you consent for this app to send every password you copy or type to me, so that I can extract $50 from your bank account whenever I want' and I'll bet you a nickle you get at least a 40% clickthrough rate.
If the button says OK or Accept, and they were trying to do something, they will click that button and not read another word.
Yeeeah, I can certainly see that. Maybe the key word should be "consent" and "decline" here? I suspect GDPR cookie popups have had a good bit of training effect. That way, the default option would at least be safe.
Also a bit of a "fuck you if you don't read what you're accepting". I can appreciate the art of building software such that functionally brain dead people can operate it, but I don't have to compromise other needs to ensure that they will be protected from their own stupidity. So if someone ends up clicking consent before reading, too bad.
I don't like the no-reading thing either. But people are creatures of habit. So 'boy who cried wolf' syndrome for things like permissions and security alerts means the majority just get trained to click 'allow all' no matter what. And when 90% or more of all agreements require a few years of law school to decipher and the choice is always 'accept or no functionality for you,' it's unsurprising that nobody reads.
And yes, it's annoying to need to work around these problems, but being aware of them is what makes me take the pro-consumer stance by default. If it is well known that 90% of the population will just accept the default option, the default option should be the actual better option for the consumer.
The problem is that it’s really difficult to determine whether or not something is identifying. For example, there was a case recently where taxis in New York had their route information made publicly available. It contained no information about who was picked up or what they were doing, but it didn’t take long for internet sleuths to show that you could pretty easily identify celebrities, how well they tipped, and where they were going.
On macOS if you look at a crash log, you can see that even with your username removed, it contains a list of every running application on your computer and what libraries they have loaded at the time any app crashed. In my case, I often run my own programs that are unique to me and my machine. And just like browser fingerprinting, the collection of loaded libraries on my machine often includes plugins that can probably uniquely identify me.
It is very difficult to anonymize data properly. (I would argue it’s probably impossible.)
Yes, it's very difficult to do it in a way that makes it safe to make anonymized data public.
I think it's a good safeguard for internal data though. It guards against mistakes where some people don't understand company policy.
You might compare with password storage. They are typically hashed and kept private. One or the other isn't enough.
Absolutely. I was thinking about that while posting. Properly anonymizing things is hard, and things as simple as your hardware spec can give a lot of information and/or decrease the number of possible subjects of that piece of info by a factor of 100 or more.
So I completely agree. But I also think it's possible to make telemetry for technical purposes reasonably close to properly anonymous. And there are theoretical considerations for proper anonymization that involve statistics and information theory and should (if you model your assumptions properly) give properly anonymous results. These are, however, quite restrictive in the sense that even things that no reasonable person (or 4chan sleuth) would find deanonymizable, are considered not anonymous enough.
To give a naive shot at the macOS case, without having read the literature I mentioned in passing above, you'd ask for every library and app loaded "how much information does this give to an attacker?". At the least, a boolean feature will cut your subject list in half (at worst, it'll rule in only one individual because it's unique to that person). You'll quickly realize that only after a few dozen boolean features, you've limited the subject list to a single person, even starting from the world population.
FWIW, the taxi database smells immediately vulnerable to me. It's got someone's adress right in there, basically.
Definitely agree on all points with the author.
What I found most interesting is the live reader count metric on the top right of the article page. When you click on "magic enabled", it displays small shapes that represents how far along other readers are through the article. Never seen something like this before!
That was a pleasure to read.