Activity

Votes

Comments

New

All activity

Showing only topics with the tag "data". Back to normal view

Prev Next

Your apps know where you were last night, and they’re not keeping it secret
~tech
- privacy
Article 2735 words
5 comments

The New York Times

December 10, 2018

23 votes
Marriott admits hackers stole data on 500 million guests; passports and credit card info included
~tech
- privacy
- security.cyber
Article 585 words
10 comments

Forbes

November 30, 2018

21 votes
Where would a beginner start with data compression? What are some good books for it?

~comp Ask (advice)

Mostly the title. I have experience with Python, and I was thinking of learning more about data compression. How should I proceed? And what are some good books I could read, both about specifics...

Mostly the title. I have experience with Python, and I was thinking of learning more about data compression. How should I proceed? And what are some good books I could read, both about specifics and abstracts of data compression, data management, data in general.

10 comments

Wolf

November 27, 2018

15 votes
Amazon admits it exposed customer email addresses, but refuses to give details
~tech
- amazon
- privacy
- security
Article 601 words
1 comment

TechCrunch

November 21, 2018

14 votes
Unsecured database of millions of SMS text messages exposed password resets and two-factor codes
~tech
- security
Article 726 words
4 comments

TechCrunch

November 16, 2018

19 votes
DeepMind’s move to transfer health unit to Google stirs data fears
~health
- healthcare
Article 669 words
4 comments

ft.com

November 14, 2018

11 votes
Period-tracking apps are not for women

~health Article 3534 words

26 comments

Vox

November 14, 2018

28 votes
Fallout 76 bug accidentally deletes entire 50GB beta

~games Article 151 words

6 comments

Kotaku

October 31, 2018

18 votes
Tim Cook's keynote address at the 40th International Conference of Data Protection and Privacy Commissioners
~tech
- privacy
Video 22:12
1 comment

YouTube: European Data Protection Supervisor

October 24, 2018

8 votes
What are the best practices regarding personal files and encryption?
~tech
- privacy
- linux
- security
Ask
Over the past year I have done a lot to shore up my digital privacy and security. One of the last tasks I have to tackle is locking down the many personal files I have on my computer that have...

Over the past year I have done a lot to shore up my digital privacy and security. One of the last tasks I have to tackle is locking down the many personal files I have on my computer that have potentially compromising information in them (e.g. bank statements). Right now they are simply sitting on my hard drive, unencrypted. Theft of my device or a breach in access through the network would allow a frightening level of access to many of my records.

As such, what are my options for keeping certain files behind an encryption "shield"? Also, what are the potential tradeoffs for doing so? In researching the topic online I've read plenty of horror stories about people losing archives or whole drives due to encryption-related errors/mistakes. How can I protect against this scenario? Losing the files would be almost as bad as having them compromised!

I'm running Linux, but I'm far from tech-savvy, so I would either need a solution to be straightforward or I'd have to learn a lot to make sense of a more complicated solution. I'm willing to learn mainly because it's not an option for me to continue with my current, insecure setup. I do use a cloud-based password manager that allows for uploading of files, and I trust it enough with my passwords that I would trust it with my files, though I would like to avoid that situation if possible.

With all this in mind, what's a good solution for me to protect my personal files?

11 comments

kfwyre

October 2, 2018

26 votes

XML Data Munging Problem

~comp

programming

Text 568 words

Here’s a problem I had to solve at work this week that I enjoyed solving. I think it’s a good programming challenge that will test if you really grok XML. Your input is some XML such as this:...

Here’s a problem I had to solve at work this week that I enjoyed solving. I think it’s a good programming challenge that will test if you really grok XML.

Your input is some XML such as this:

<DOC>
<TEXT PARTNO="000">
<TAG ID="3">This</TAG> is <TAG ID="0">some *JUNK* data</TAG> .
</TEXT>
<TEXT PARTNO="001">
*FOO* Sometimes <TAG ID="1">tags in <TAG ID="0">the data</TAG> are nested</TAG> .
</TEXT>
<TEXT PARTNO="002">
In addition to <TAG ID="1">nested tags</TAG> , sometimes there is also <TAG ID="2">junk</TAG> we need to ignore .
</TEXT>
<TEXT PARTNO="003">*BAR*-1
<TAG ID="2">Junk</TAG> is marked by uppercase characters between asterisks and can also optionally be followed by a dash and then one or more digits . *JUNK*-123
</TEXT>
<TEXT PARTNO="004">
Note that <TAG ID="4">*this*</TAG> is just emphasized . It's not <TAG ID="2">junk</TAG> !
</TEXT>
</DOC>

The above XML has so-called in-line textual annotations because the XML <TAG> elements are embedded within the document text itself.

Your goal is to convert the in-line XML annotations to so-called stand-off annotations where the text is separated from the annotations and the annotations refer to the text via slicing into the text as a character array with starting and ending character offsets. While in-line annotations are more human-readable, stand-off annotations are equally machine-readable, and stand-off annotations can be modified without changing the document content itself (the text is immutable).

The challenge, then, is to convert to a stand-off JSON format that includes the plain-text of the document and the XML tag annotations grouped by their tag element IDs. In order to preserve the annotation information from the original XML, you must keep track of each <TAG>’s starting and ending character offset within the plain-text of the document. The plain-text is defined as the character data in the XML document ignoring any junk. We’ll define junk as one or more uppercase ASCII characters [A-Z]+ between two *, and optionally a trailing dash - followed by any number of digits [0-9]+.

Here is the desired JSON output for the above example to test your solution:

{
  "data": "\nThis is some data .\n\n\nSometimes tags in the data are nested .\n\n\nIn addition to nested tags , sometimes there is also junk we need to ignore .\n\nJunk is marked by uppercase characters between asterisks and can also optionally be followed by a dash and then one or more digits . \n\nNote that *this* is just emphasized . It's not junk !\n\n",
  "entities": [
    {
      "id": 0,
      "mentions": [
        {
          "start": 9,
          "end": 18,
          "id": 0,
          "text": "some data"
        },
        {
          "start": 41,
          "end": 49,
          "id": 0,
          "text": "the data"
        }
      ]
    },
    {
      "id": 1,
      "mentions": [
        {
          "start": 33,
          "end": 60,
          "id": 1,
          "text": "tags in the data are nested"
        },
        {
          "start": 80,
          "end": 91,
          "id": 1,
          "text": "nested tags"
        }
      ]
    },
    {
      "id": 2,
      "mentions": [
        {
          "start": 118,
          "end": 122,
          "id": 2,
          "text": "junk"
        },
        {
          "start": 144,
          "end": 148,
          "id": 2,
          "text": "Junk"
        },
        {
          "start": 326,
          "end": 330,
          "id": 2,
          "text": "junk"
        }
      ]
    },
    {
      "id": 3,
      "mentions": [
        {
          "start": 1,
          "end": 5,
          "id": 3,
          "text": "This"
        }
      ]
    },
    {
      "id": 4,
      "mentions": [
        {
          "start": 289,
          "end": 295,
          "id": 4,
          "text": "*this*"
        }
      ]
    }
  ]
}

Python 3 solution here.

If you need a hint, see if you can find an event-based XML parser (or if you’re feeling really motivated, write your own).

4 votes

Twitter makes datasets available containing accounts, tweets, and media from accounts associated with influence campaigns from the IRA and Iran
~tech
- social media
Tweet
0 comments

Twitter

October 17, 2018

8 votes
UK Biobank data on 500,000 people paves way to precision medicine
~health
- healthcare
- medicine
Article 751 words
3 comments

Nature

October 15, 2018

8 votes
Advance (Reddit and Condé Nast owner) acquires majority stake in games and E-sports analytics firm Newzoo

~games Article 1027 words

1 comment

newzoo.com

October 14, 2018

6 votes
Microsoft re-releases Windows 10 October 2018 update with explanation of data loss bug
~tech
- microsoft
Article 1123 words
6 comments

windows.com

October 10, 2018

23 votes
Facebook Isn’t Sorry — It Just Wants Your Data
~tech
- facebook
- privacy
Article 1262 words
1 comment

BuzzFeed News

October 9, 2018

15 votes
Data for good: Wonderful ways that big data is making the world better

~tech Article 2210 words

8 comments

Hewlett Packard Enterprise

October 3, 2018

7 votes
Alphabet to shut Google+ social site after user data exposed
~tech
- social media
- google.plus
Article 729 words
4 comments

Reuters

October 9, 2018

18 votes
Did Facebook learn anything from the Cambridge Analytica debacle? An even bigger data breach suggests it didn’t.
~tech
Article 785 words
1 comment

The New York Times

October 7, 2018

14 votes
What does big data look like when cross-referenced?
~tech
- privacy
Ask
Google knows a lot about its users. Facebook knows a lot about its users. FitBit knows a lot about its users. And so on. But what happens when these companies all sell their data sets to one...

Google knows a lot about its users. Facebook knows a lot about its users. FitBit knows a lot about its users. And so on.

But what happens when these companies all sell their data sets to one another? It'd be pretty trivial to link even anonymized users from set to set by looking for specific features. If I went for a run, Google tracked my location, FitBit tracked my heart rate, and Facebook tracked my status about my new best mile time, for example. Thus, Google can narrow down who I am in the other sets using pre-existing information that coincides with theirs. With enough overlap they can figure out exactly who I am fairly easily. Furthermore, each additional layer of data makes this discovery process from new data sets even easier, as it gives more opportunities to confirm or rule out concurrent info. So then when, say, Credit Karma, Comcast, and Amazon's data enter the fray, my online identity stops looking like an individual egg in each different basket but a whole lot of eggs in all in one. And they can do this across millions/billions of users--not just me!

I don't know for certain that this is a thing that happens, but... I have to assume it definitely is happening, right? How could it not? With how valuable data is and how loose protections are, this seems like a logical and potentially very lucrative step.

Right now, is there an aggregate version of "me" that exists in a data store somewhere that is a more comprehensive and accurate picture than my own self-image? After all, my memory and perception are imperfect and biased, but data stores aren't.

4 comments

kfwyre

October 3, 2018

6 votes
Data Factories
~tech
- facebook
- google
- privacy
Article 2371 words
0 comments

Stratechery

October 2, 2018

6 votes
No cash needed at this cafe. Students pay the tab with their personal data.
~tech
- privacy
Article 847 words
18 comments

NPR

September 29, 2018

31 votes
The Opportunity Atlas

~life Link

0 comments

opportunityatlas.org

October 1, 2018

5 votes
iPhone iOS passcode bypass hack exposes contacts, photos
~tech
- smartphones
- ios
- security
Link
1 comment

threatpost.com

September 29, 2018

8 votes
China's Social Credit system: The first modern digital dictatorship

~tech Article 1554 words, published Sep 17 2018

1 comment

ABC

September 27, 2018

8 votes
Introduce a Stats page?
~tildes
- suggestions
Text 16 words
Would love to see different live data on things like user counts, post/comment frequency, etc

1 comment

meghan

September 26, 2018

12 votes
Introducing Firefox Monitor, helping people take control after a data breach
~tech
- browsers
- security
Article 466 words
5 comments

Mozilla

September 25, 2018

24 votes
NCIX data breach - The WAN Show Sept 21, 2018

~tech Video 1:00:43

6 comments

YouTube: Linus Tech Tips

September 22, 2018

7 votes
A life insurance company wants to track your fitness data
~tech
- privacy
Article 1085 words
1 comment

Vox

September 21, 2018

10 votes
NCIX Data Breach - after bankruptcy, terabytes of unencrypted customer/company data have been sold to multiple buyers
~tech
- security
- privacy
Link
1 comment

privacyfly.com

September 20, 2018

20 votes
How Game Apps That Captivate Kids Have Been Collecting Their Data
~tech
- privacy
Article 1944 words
0 comments

The New York Times

September 13, 2018

11 votes
A call for principle-based international agreements to govern law enforcement access to data
~tech
- microsoft
- privacy
Article 618 words
2 comments

microsoft.com

September 11, 2018

7 votes
A year later, Equifax lost your data but faced little fallout
~tech
- privacy
Article 1049 words
2 comments

TechCrunch

September 8, 2018

17 votes
Who controls your data? Nine reporters in London, Paris, New York & San Francisco filed more than 150 requests for personal data to 30+ popular tech companies
~tech
- privacy
Article 3731 words
0 comments

Engadget

September 8, 2018

8 votes
Fitbit's 150 billion hours of heart data reveal secrets about health

~health Article 2247 words, published Aug 27 2018

0 comments

Yahoo

September 5, 2018

11 votes
Should Grindr users worry about what China will do with their data?
~tech
- privacy
- security
Article 1202 words, published Aug 31 2018
1 comment

The Conversation

September 3, 2018

16 votes
Google and Mastercard cut a secret ad deal to track retail sales
~tech
- google
- privacy
Article 92 words
29 comments

Bloomberg

August 31, 2018

26 votes
How do you back up your data?

~talk Ask (survey)

...you do back up your data, don't you?

43 comments

geosmin

August 21, 2018

36 votes
The tech industry is lobbying for federal data & privacy regulation that is friendly to the tech industry, but hostile to users' interests
~tech
- privacy
Article 1471 words
4 comments

The New York Times

August 27, 2018

11 votes
Venmo's public API exposes millions of transactions, startling users
~tech
- privacy
- security
Article 747 words, published Jul 18 2018
1 comment

programmableweb.com

August 27, 2018

10 votes
Verizon throttled fire department’s “unlimited” data during California wildfire

~tech Article 1840 words

7 comments

Ars Technica

August 24, 2018

17 votes
California wildfires: Verizon throttled data during crisis

~tech Article 652 words

9 comments

BBC

August 22, 2018

24 votes
The Data Detox Kit- An 8 day challenge to clean up your online data.
~tech
- privacy
Link
2 comments

myshadow.org

August 7, 2018

16 votes
Health effects of overweight and obesity in 195 countries over twenty-five years
~health
- medicine
Link
0 comments

nejm.org

August 16, 2018

4 votes
CCleaner provokes fury over Active Monitoring, user data collection
~tech
- privacy
Article 751 words
15 comments

ZDNet

August 3, 2018

28 votes
Facebook in talks with banks to add your financial information to Messenger
~tech
Article 262 words
31 comments

CNBC

August 6, 2018

18 votes
538 shares largest dataset of Russian troll tweets, compiled by two professors at Clemson University
~tech
- social media
Article 1373 words
10 comments

FiveThirtyEight

July 31, 2018

17 votes
Training frequency for strength development: What the data say
~health
- fitness
Article 5714 words
2 comments

strongerbyscience.com

July 31, 2018

4 votes
The tragedy of the data commons
~tech
- privacy
Article 1458 words
2 comments

battellemedia.com

July 23, 2018

3 votes
Google, Facebook, Microsoft, and Twitter partner for ambitious new data project
~tech
Article 952 words
2 comments

The Verge

July 21, 2018

7 votes

Prev Next