What is data science?
The science of data promises to change the world, but many of us do not know what data science is. This segement gives you our answer.
Data science is the science of data analytics in the context of scientific query. The data scientist first asks scientific questions, then gathers data relevant to those questions, and finally applies a variety of computer processes and statistical and probabilistic theories to derive answers to those questions. The science itself involves more than simply analyzing data because of the huge body of theory and process that can be applied. Theory in data science can range from simple statistical categorizations to elaborate mathematical models. Processes can include simple manipulations of structured or unstructured data and much more complex algorithmic applications, like machine learning and other predictive models.
What kind of question do data scientists ask?
The questions data scientists ask are not new. What will be the effect of a change in prices on revenue? What prices by region will optimally move the most old inventory before the target date? Which public policy has proven more popular in which regions? Which has been the most effective? What will be the effect of an increase in wages on output?
What changes in the science daily is the amount of available data, and the number of theories and processes that can be applied.
How big is big
Big data, or the application of data science to data sets with millions or even billions of observations and/or records, is the newest and fastest growing field in data science. In big data, new data sets are growing at an unprecedented rate, while newer and smarter proprietary algorithms are constantly evolving to give organizations the edge they need to prosper in competitive markets. To visualize the 'big' in big data, consider Eric Schmidt's estimate that 'something like 5 exabytes of data,' or 0.66 gigabytes per person, is generated by people and machines globally every two days. By this estimate, the Canadian population of 36 million, or roughly 0.5% of the world's population, is on average responsible for ~ 10 million gigabytes of new data per day. That's a lot of data.