One thing they mention in the article but don't do a great job focusing on is that a typical track that feeds into data science starts with being an analyst. Business intelligence is mentioned,...
One thing they mention in the article but don't do a great job focusing on is that a typical track that feeds into data science starts with being an analyst. Business intelligence is mentioned, but really almost any analyst that touches data - reporting analyst, financial analyst, marketing analyst, etc. Even if an organization doesn't officially hire data scientists, these jobs will get you experience with pulling, organizing, transforming, and cleaning data which realistically is 80-90% of the work of a data scientist.
While having some basic knowledge of statistics is a requirement, I disagree with the need to understand matrix algebra or really the deeper understanding of regressions. Most tools nowadays, especially data mining tools, will take a data set and run it through a bunch of different algorithms and spit out some key values for you (probability value, correlation value, etc.). Having a basic understanding of how to interpret these values to understand best fit will work 95% of the time to get you a good predictive algorithm. Having the next level knowledge to know when to disagree and choose another algorithm helps, but realistically is not that important, especially if there are other more statistically minded people in the organization. I also expect that with time these data mining tools will get more sophisticated and the next level statistical knowledge for outlier cases will become less and less important.
As someone who works in health care, I heavily disagree. In finance, maybe. In other sectors some of these jobs will be replaced, but the reality is that making decisions about data cleanliness...
As someone who works in health care, I heavily disagree. In finance, maybe. In other sectors some of these jobs will be replaced, but the reality is that making decisions about data cleanliness and data relevance is simply not something that AI will be able to do. AI will be able to assist someone (therefore less available jobs) in making these decisions, but human interpretation will be necessary for some time.
As a practising data scientist, I can tell you that while human interpretation is and will remain VERY necessary, there are required levels of skill in that interpretation that keep going up. What...
As a practising data scientist, I can tell you that while human interpretation is and will remain VERY necessary, there are required levels of skill in that interpretation that keep going up. What I'm saying here is:
Having a basic understanding of how to interpret these values to understand best fit
Is fine for a job right now - but won't be enough for the next 10 years. The current state of the art in data science is about being able to explain things in a human readable way. So instead of a graph, say what that graph means in words.
From there it goes in 2 directions: data science, and domain expertise. Data science it's all about the "next level knowledge to know when to disagree and choose another algorithm". Domain expertise isn't data science at all, but rather is taking the outputs of data science and understanding them in the context of a given field. In health care that would be a doctor or medical PhD.
I've worked on predictive models in health care. Whether you consider that a data scientist or not is up to you. In health care it's more than fine, it's actually somewhat difficult to even find...
I've worked on predictive models in health care. Whether you consider that a data scientist or not is up to you.
Is fine for a job right now
In health care it's more than fine, it's actually somewhat difficult to even find this because there's a necessary amount of domain expertise that even the tech heavy people need.
domain expertise... In health care that would be a doctor or medical PhD
This is not how it works in health care at all, and it likely never will. The time of individuals like this is quite valuable and it's extremely rare for these individuals to have "extra" time outside of their practice for needs such as data. In fact the only docs I've ever experienced that have the majority of their time available for this typically are very high in an organization (such as the CMIO, CNIO, CMO, CNO) and their time is really high level interpretation and is split between that and their business responsibilities. The reality is that they rely on people who mostly have zero clinical knowledge but can translate between technical and clinical to inform them of what's going on so that they can help to make decisions. Because of this domain knowledge is a tricky thing in health care.
You typically have two camps of domain knowledge in health care - tech turned health and health turned tech. Tech turned health is your typical CS student who's found their way into health care. They almost always have zero health knowledge from the start, and need to take some basic courses in medical terminology (or eventually pick it up on the job) so that they can help translate tech speak to health speak and vise versa. Health turned tech are almost always nurses, but rarely you'll see some particularly tech inclined docs (and researchers) take some classes on basic CS principles and databases. Usually health turned tech are still practicing clinicians and can only devote a small portion of their time to essentially be business analysts.
Typically speaking the larger and longer term projects involved teams of tech turned health individuals on the IT side interfacing with health turned tech individuals on the business side.
The real diamond in the rough in health care is someone who has a few years of clinical training in addition to a few years of tech training. These individuals will almost always be employed, even if neither their tech nor clinical expertise is very deep simply because they are so rare.
Most tools nowadays, especially data mining tools, will take a data set and run it through a bunch of different algorithms and spit out some key values for you (probability value, correlation value, etc.).
I have limited exposure so take my recommendations with a grain of salt but I like rapidminer on the open source side and SPSS and SAS on the closed source side. I've heard good things about knime...
I have limited exposure so take my recommendations with a grain of salt but I like rapidminer on the open source side and SPSS and SAS on the closed source side. I've heard good things about knime (open source) as well but never actually used it. Same with sisense (closed source) - good things but no experience with it.
One thing they mention in the article but don't do a great job focusing on is that a typical track that feeds into data science starts with being an analyst. Business intelligence is mentioned, but really almost any analyst that touches data - reporting analyst, financial analyst, marketing analyst, etc. Even if an organization doesn't officially hire data scientists, these jobs will get you experience with pulling, organizing, transforming, and cleaning data which realistically is 80-90% of the work of a data scientist.
While having some basic knowledge of statistics is a requirement, I disagree with the need to understand matrix algebra or really the deeper understanding of regressions. Most tools nowadays, especially data mining tools, will take a data set and run it through a bunch of different algorithms and spit out some key values for you (probability value, correlation value, etc.). Having a basic understanding of how to interpret these values to understand best fit will work 95% of the time to get you a good predictive algorithm. Having the next level knowledge to know when to disagree and choose another algorithm helps, but realistically is not that important, especially if there are other more statistically minded people in the organization. I also expect that with time these data mining tools will get more sophisticated and the next level statistical knowledge for outlier cases will become less and less important.
Not having that knowledge will see you replaced by AI within 10 years.
As someone who works in health care, I heavily disagree. In finance, maybe. In other sectors some of these jobs will be replaced, but the reality is that making decisions about data cleanliness and data relevance is simply not something that AI will be able to do. AI will be able to assist someone (therefore less available jobs) in making these decisions, but human interpretation will be necessary for some time.
As a practising data scientist, I can tell you that while human interpretation is and will remain VERY necessary, there are required levels of skill in that interpretation that keep going up. What I'm saying here is:
Is fine for a job right now - but won't be enough for the next 10 years. The current state of the art in data science is about being able to explain things in a human readable way. So instead of a graph, say what that graph means in words.
From there it goes in 2 directions: data science, and domain expertise. Data science it's all about the "next level knowledge to know when to disagree and choose another algorithm". Domain expertise isn't data science at all, but rather is taking the outputs of data science and understanding them in the context of a given field. In health care that would be a doctor or medical PhD.
I've worked on predictive models in health care. Whether you consider that a data scientist or not is up to you.
In health care it's more than fine, it's actually somewhat difficult to even find this because there's a necessary amount of domain expertise that even the tech heavy people need.
This is not how it works in health care at all, and it likely never will. The time of individuals like this is quite valuable and it's extremely rare for these individuals to have "extra" time outside of their practice for needs such as data. In fact the only docs I've ever experienced that have the majority of their time available for this typically are very high in an organization (such as the CMIO, CNIO, CMO, CNO) and their time is really high level interpretation and is split between that and their business responsibilities. The reality is that they rely on people who mostly have zero clinical knowledge but can translate between technical and clinical to inform them of what's going on so that they can help to make decisions. Because of this domain knowledge is a tricky thing in health care.
You typically have two camps of domain knowledge in health care - tech turned health and health turned tech. Tech turned health is your typical CS student who's found their way into health care. They almost always have zero health knowledge from the start, and need to take some basic courses in medical terminology (or eventually pick it up on the job) so that they can help translate tech speak to health speak and vise versa. Health turned tech are almost always nurses, but rarely you'll see some particularly tech inclined docs (and researchers) take some classes on basic CS principles and databases. Usually health turned tech are still practicing clinicians and can only devote a small portion of their time to essentially be business analysts.
Typically speaking the larger and longer term projects involved teams of tech turned health individuals on the IT side interfacing with health turned tech individuals on the business side.
The real diamond in the rough in health care is someone who has a few years of clinical training in addition to a few years of tech training. These individuals will almost always be employed, even if neither their tech nor clinical expertise is very deep simply because they are so rare.
What is your favorite tool for this?
I have limited exposure so take my recommendations with a grain of salt but I like rapidminer on the open source side and SPSS and SAS on the closed source side. I've heard good things about knime (open source) as well but never actually used it. Same with sisense (closed source) - good things but no experience with it.