-
9 votes
-
Facebook charged with misleading users on health data visibility
8 votes -
Data privacy bill unites Charles Koch and Big Tech
6 votes -
Why humanitarians are worried about Palantir’s new partnership with the UN
8 votes -
Even years later, Twitter doesn’t delete your direct messages
4 votes -
Millennial life: How young adulthood today compares with prior generations
10 votes -
Telcos sold highly sensitive customer GPS data
4 votes -
Millions are on the move in China, and Big Data is watching
9 votes -
How ontologies help data science make sense of disparate data
3 votes -
Data on discrimination
5 votes -
I tried to block Amazon from my life. It was impossible
13 votes -
What cities are getting wrong about public transportation
7 votes -
VOIPO.com data leak
7 votes -
Pew study: 74% of Facebook users did not know Facebook was maintaining a list of their interests/traits, 51% were uncomfortable with it, and 27% felt the list was inaccurate
21 votes -
Transparency-seeking OPEN Government Data Act signed into law
7 votes -
I made a program that creates the colour palette of a film
I saw these things originally on Reddit that extracted the average colour of frames from films and put them together to make a colour palette for said film, the original creator has a site called...
I saw these things originally on Reddit that extracted the average colour of frames from films and put them together to make a colour palette for said film, the original creator has a site called The Colors of Motion. I thought it would be cool to try and create a simple PowerShell script that does the same thing.
Here are a few examples:
Finding Nemo: https://i.imgur.com/8YwOlwK.png
The Bee Movie: https://i.imgur.com/umbd3co.png
Harry Potter and the Philosopher's Stone: https://i.imgur.com/6rsbv0M.pngI've hosted my code on GitHub so if anyone wants to use my PowerShell script or suggest some ways to improve it feel free. You can use pretty much any video file as input as it uses ffmpeg to extract the frames.
GitHub link: https://github.com/ArkadiusBear/FilmStrip
17 votes -
Open standards may finally give patients control of their data and care via Electronic Health Records
6 votes -
How Google tracks your personal information
7 votes -
At Blind, a security lapse revealed private complaints from Silicon Valley employees
13 votes -
Amazon sends 1,700 Alexa voice recordings to a random person
17 votes -
Facebook says new bug allowed apps access to private photos of up to 6.8m users
33 votes -
Remember backing up to diskettes? I’m sorry. I do, too.
11 votes -
"Mischievous responders" have been tainting the data about health disparities between LGBT youth and their peers
13 votes -
Google CEO Sundar Pichai testifies before the House Judiciary Committee on Data Collection
15 votes -
Your apps know where you were last night, and they’re not keeping it secret
23 votes -
Marriott admits hackers stole data on 500 million guests; passports and credit card info included
21 votes -
Where would a beginner start with data compression? What are some good books for it?
Mostly the title. I have experience with Python, and I was thinking of learning more about data compression. How should I proceed? And what are some good books I could read, both about specifics...
Mostly the title. I have experience with Python, and I was thinking of learning more about data compression. How should I proceed? And what are some good books I could read, both about specifics and abstracts of data compression, data management, data in general.
15 votes -
Amazon admits it exposed customer email addresses, but refuses to give details
14 votes -
Unsecured database of millions of SMS text messages exposed password resets and two-factor codes
19 votes -
DeepMind’s move to transfer health unit to Google stirs data fears
11 votes -
Period-tracking apps are not for women
28 votes -
Fallout 76 bug accidentally deletes entire 50GB beta
18 votes -
Tim Cook's keynote address at the 40th International Conference of Data Protection and Privacy Commissioners
8 votes -
What are the best practices regarding personal files and encryption?
Over the past year I have done a lot to shore up my digital privacy and security. One of the last tasks I have to tackle is locking down the many personal files I have on my computer that have...
Over the past year I have done a lot to shore up my digital privacy and security. One of the last tasks I have to tackle is locking down the many personal files I have on my computer that have potentially compromising information in them (e.g. bank statements). Right now they are simply sitting on my hard drive, unencrypted. Theft of my device or a breach in access through the network would allow a frightening level of access to many of my records.
As such, what are my options for keeping certain files behind an encryption "shield"? Also, what are the potential tradeoffs for doing so? In researching the topic online I've read plenty of horror stories about people losing archives or whole drives due to encryption-related errors/mistakes. How can I protect against this scenario? Losing the files would be almost as bad as having them compromised!
I'm running Linux, but I'm far from tech-savvy, so I would either need a solution to be straightforward or I'd have to learn a lot to make sense of a more complicated solution. I'm willing to learn mainly because it's not an option for me to continue with my current, insecure setup. I do use a cloud-based password manager that allows for uploading of files, and I trust it enough with my passwords that I would trust it with my files, though I would like to avoid that situation if possible.
With all this in mind, what's a good solution for me to protect my personal files?
26 votes -
XML Data Munging Problem
Here’s a problem I had to solve at work this week that I enjoyed solving. I think it’s a good programming challenge that will test if you really grok XML. Your input is some XML such as this:...
Here’s a problem I had to solve at work this week that I enjoyed solving. I think it’s a good programming challenge that will test if you really grok XML.
Your input is some XML such as this:
<DOC> <TEXT PARTNO="000"> <TAG ID="3">This</TAG> is <TAG ID="0">some *JUNK* data</TAG> . </TEXT> <TEXT PARTNO="001"> *FOO* Sometimes <TAG ID="1">tags in <TAG ID="0">the data</TAG> are nested</TAG> . </TEXT> <TEXT PARTNO="002"> In addition to <TAG ID="1">nested tags</TAG> , sometimes there is also <TAG ID="2">junk</TAG> we need to ignore . </TEXT> <TEXT PARTNO="003">*BAR*-1 <TAG ID="2">Junk</TAG> is marked by uppercase characters between asterisks and can also optionally be followed by a dash and then one or more digits . *JUNK*-123 </TEXT> <TEXT PARTNO="004"> Note that <TAG ID="4">*this*</TAG> is just emphasized . It's not <TAG ID="2">junk</TAG> ! </TEXT> </DOC>
The above XML has so-called in-line textual annotations because the XML
<TAG>
elements are embedded within the document text itself.Your goal is to convert the in-line XML annotations to so-called stand-off annotations where the text is separated from the annotations and the annotations refer to the text via slicing into the text as a character array with starting and ending character offsets. While in-line annotations are more human-readable, stand-off annotations are equally machine-readable, and stand-off annotations can be modified without changing the document content itself (the text is immutable).
The challenge, then, is to convert to a stand-off JSON format that includes the plain-text of the document and the XML tag annotations grouped by their tag element IDs. In order to preserve the annotation information from the original XML, you must keep track of each
<TAG>
’s starting and ending character offset within the plain-text of the document. The plain-text is defined as the character data in the XML document ignoring any junk. We’ll define junk as one or more uppercase ASCII characters[A-Z]+
between two*
, and optionally a trailing dash-
followed by any number of digits[0-9]+
.Here is the desired JSON output for the above example to test your solution:
{ "data": "\nThis is some data .\n\n\nSometimes tags in the data are nested .\n\n\nIn addition to nested tags , sometimes there is also junk we need to ignore .\n\nJunk is marked by uppercase characters between asterisks and can also optionally be followed by a dash and then one or more digits . \n\nNote that *this* is just emphasized . It's not junk !\n\n", "entities": [ { "id": 0, "mentions": [ { "start": 9, "end": 18, "id": 0, "text": "some data" }, { "start": 41, "end": 49, "id": 0, "text": "the data" } ] }, { "id": 1, "mentions": [ { "start": 33, "end": 60, "id": 1, "text": "tags in the data are nested" }, { "start": 80, "end": 91, "id": 1, "text": "nested tags" } ] }, { "id": 2, "mentions": [ { "start": 118, "end": 122, "id": 2, "text": "junk" }, { "start": 144, "end": 148, "id": 2, "text": "Junk" }, { "start": 326, "end": 330, "id": 2, "text": "junk" } ] }, { "id": 3, "mentions": [ { "start": 1, "end": 5, "id": 3, "text": "This" } ] }, { "id": 4, "mentions": [ { "start": 289, "end": 295, "id": 4, "text": "*this*" } ] } ] }
Python 3 solution here.
If you need a hint, see if you can find an event-based XML parser (or if you’re feeling really motivated, write your own).
4 votes -
Twitter makes datasets available containing accounts, tweets, and media from accounts associated with influence campaigns from the IRA and Iran
8 votes -
UK Biobank data on 500,000 people paves way to precision medicine
8 votes -
Advance (Reddit and Condé Nast owner) acquires majority stake in games and E-sports analytics firm Newzoo
6 votes -
Microsoft re-releases Windows 10 October 2018 update with explanation of data loss bug
23 votes -
Facebook Isn’t Sorry — It Just Wants Your Data
15 votes -
Data for good: Wonderful ways that big data is making the world better
7 votes -
Alphabet to shut Google+ social site after user data exposed
18 votes -
Did Facebook lLearn anything from the Cambridge Analytica debacle? An even bigger data breach suggests it didn’t.
14 votes -
What does big data look like when cross-referenced?
Google knows a lot about its users. Facebook knows a lot about its users. FitBit knows a lot about its users. And so on. But what happens when these companies all sell their data sets to one...
Google knows a lot about its users. Facebook knows a lot about its users. FitBit knows a lot about its users. And so on.
But what happens when these companies all sell their data sets to one another? It'd be pretty trivial to link even anonymized users from set to set by looking for specific features. If I went for a run, Google tracked my location, FitBit tracked my heart rate, and Facebook tracked my status about my new best mile time, for example. Thus, Google can narrow down who I am in the other sets using pre-existing information that coincides with theirs. With enough overlap they can figure out exactly who I am fairly easily. Furthermore, each additional layer of data makes this discovery process from new data sets even easier, as it gives more opportunities to confirm or rule out concurrent info. So then when, say, Credit Karma, Comcast, and Amazon's data enter the fray, my online identity stops looking like an individual egg in each different basket but a whole lot of eggs in all in one. And they can do this across millions/billions of users--not just me!
I don't know for certain that this is a thing that happens, but... I have to assume it definitely is happening, right? How could it not? With how valuable data is and how loose protections are, this seems like a logical and potentially very lucrative step.
Right now, is there an aggregate version of "me" that exists in a data store somewhere that is a more comprehensive and accurate picture than my own self-image? After all, my memory and perception are imperfect and biased, but data stores aren't.
6 votes -
Data Factories
6 votes -
No cash needed at this cafe. Students pay the tab with their personal data.
31 votes -
The Opportunity Atlas
5 votes -
iPhone iOS passcode bypass hack exposes contacts, photos
8 votes -
China's Social Credit system: The first modern digital dictatorship
8 votes -
Introduce a Stats page?
Would love to see different live data on things like user counts, post/comment frequency, etc
12 votes