As with any fairly new term which is closely related to several others, data science means different things to different people. Experts and institutions use broader or narrower definitions and emphasize different aspects of data science. At ASU, data science is an interdisciplinary blend of mathematics, statistics and computer science, which applies scientific methods to extract information and provide an insight from (often large and noisy) data. An estimated shortage of up to 190,000 data analysts in the U.S. is creating high demand for data scientists with the know-how to use data to make effective decisions.
In a broader meaning, data science also includes capturing, preparing and exploring data (getting a first insight, visualizing), prior to applying mathematical and statistical methods on these data.
The relationship between math, statistics and computer science existed decades before “data science” became a new buzzword. Modern technology and increased computer power enabled these three disciplines to use methods that were earlier only theorized and could not be broadly used due to limitations of earlier computers.
More powerful computers enabled automation of collecting huge amounts of data - big data that could not be grasped by humans due to its size. With the help of more powerful computers, the three disciplines developed computationally more demanding new methods, and got blended even more tightly. According to some experts, a new discipline emerged - Data Science.
How do companies define Data Science?
Search the words 'data science' and see how some of the top tech companies define the term.
IBM: Data science combines math and statistics, specialized programming, advanced analytics, artificial intelligence (AI), and machine learning with specific subject matter expertise to uncover actionable insights hidden in an organization’s data. These insights can be used to guide decision making and strategic planning.
Amazon: Data science is the study of data to extract meaningful insights for business. It is a multidisciplinary approach that combines principles and practices from the fields of mathematics, statistics, artificial intelligence, and computer engineering to analyze large amounts of data. This analysis helps data scientists to ask and answer questions like what happened, why it happened, what will happen, and what can be done with the results.
Oracle: A data scientist’s duties can include developing strategies for analyzing data, preparing data for analysis, exploring, analyzing, and visualizing data, building models with data using programming languages, such as Python and R, and deploying models into applications.
Misconceptions
- Data Science program is not a pure statistics nor business analytics program.
- The program topics are not just about observing data, but also utilizing them to make predictions by mathematical/statistical modeling.
Reporting baseball statistics is not data science.
Making predictions utilizing historical trends with large data sets (training data) is more in line with data science than reporting sports statistics.
- Plotting and visualizing data is an important part of data science and exploratory data analysis (EDA), which we focus on in classes like DAT 301 Exploring Data in R and Python. However, there are other important aspects of data science that require some math and statistics. The ultimate goal of data science is to analyze collected data for the purpose of making predictions in future. For that, we need mathematical and statistical models and often use programming tools to deploy them. So, a data scientist should have a good background in mathematics and statistics, as well as good programming skills.