With the ever-increasing volume, variety, and velocity of available data, scientific disciplines have provided us with advanced mathematical tools, processes, and algorithms enabling us to use this data in meaningful ways. Data science (DS), machine learning (ML), and artificial intelligence (AI) are three such disciplines. A question that frequently comes up in many data-related discussions is what the difference between DS, ML, and AI is? Can they be compared? Depending on who you talk to, how many years of experience they have had, and what projects they have worked on, you may get widely different answers to the above question. In this blog, I will attempt to answer this based on my research, academic, and industry experience; and having facilitated numerous conversations on the topic. It is still, however, one person’s opinion and should be considered as such. I will also mention that this write-up intends to provide a conceptual differentiation between the three fields; hence generalizations are made, and there will certainly be edge cases.
If you consider DS, ML, or AI as a set of tools and technologies, it will be near impossible to distinguish them aptly. They overlap; however, they are not proper subsets of each other. For example, if someone uses the “clustering” algorithm, they may be doing either DS, ML, or AI work or maybe even a combination ML+DS, DS+AI, ML+AI, or all three! I would like you to consider an alternative way of defining these areas by considering them separate from the tools and technologies, and tying them to the end goal instead.
Even though they may employ overlapping skillset, tools, technologies, and algorithms, DS, ML, and AI can be differentiated by their focus on achieving different end goals.
Here are the generalized focus areas:
Data science is about using data to provide value (money, growth, reputation, etc.) to an organization.
Machine learning is about using data to make optimized inferences and predictions.
Artificial intelligence is about using data to impart human-like decision making to machines.
With these definitions, it is easy to see that these fields overlap quite interestingly. For example, being able to
make human-like decisions may involve making
better inferences, among other things.
Providing value to an organization may involve creating digital agents with
human-like decision making. Similarly, while creating learning models to make
better predictions, one may want to work on metrics that will provide the
most value to the organization. As you can imagine, the lines between these three disciplines get muddled, and we often use one in service of another. It is really the “why” you are doing, what you are doing with the data, that can help determine whether your current work should be classified under data science, machine learning, or artificial intelligence. Another point to keep in mind is that there is almost always a human agent involved in data science. You may hear “this computer is using machine learning to better identify pictures” or “this digital assistant is exhibiting artificial intelligence,” but you will not hear “this machine is doing data science using clustering.” Data science is almost always done by a human being.
Below we consider a simplified example to bring the concepts together.
Consider a fictional health care facility that is researching creating assistive robots for elderly patients. The robots’ task is to support the elderly patient while walking when human care is not available. The robot needs to know when the person is getting up so that it can spin into action. This determination can be made by observing the hand and leg movements. The health care facility may outsource this project to another company, asking them to devise an algorithm (model) that can make accurate predictions on a person’s intent to stand. This could be done by training on images and videos to predict which hand and leg movements may indicate the person is getting up. This is a machine learning project.
Once the person is up, the robot’s task is to assist them in walking. What is the best way to help? Well, what would a well trained human caregiver have done in this situation? They would have likely stepped closer to the person, offered one or both arms or hands to lean on based on how much assistance the patient needs to walk. Also, the caregiver would have a gentle grip for a frail person and a firmer grip with feet firmly placed on the ground to support an obese person. Enabling the robots to mimic the behavior of a well trained human caregiver is the realm of artificial intelligence.
Now consider the health care facility would like to determine whether to continue investing in this project. This determination can be made by collecting data from various sources such as the injury by fall rates of the elderly, the human caregiver working hours and wages, the fall rate reduction by using the new robots, the cost to train the robots, the technology adoption rate, the savings in medical expenses due to reduced injuries, etc. Once the data is integrated, modeled, and analyzed, several recommendations can be made to the health care facility, e.g., the assistive robots are resulting in 80% (made up number) lesser falls, and the facility can recover its investment in 5 years (made up number). This process that starts with data and ends on valuable insights for decision makers is data science.
I hope the next time you look at these terms, you will look deeper than the tools, technologies, and algorithms they use. Tools, technologies, and algorithms evolve with time; the intent persists.
I am grateful to Anna Sheets, Ken Hu, Liz Widman, Mark Plutowski, Chris Frame, Rahul Ahuja, Lauren Ford, and Zach Gongaware, for their valuable feedback on this piece.
Thank you, Hannah Gambino, for the illustrations.
First version of this article appeared in Medium. It has been published on the Doximity Technology Blog with the author’s permission.
Be sure to follow @doximity_tech if you'd like to be notified about new blog posts.