Tuesday, October 23, 2012

Data-centric science - My initial thoughts

"The scientific method is built around testable hypotheses. These models, for the most part, are systems visualized in the minds of scientists. The models are then tested, and experiments confirm or falsify theoretical models of how the world works. This is the way science has worked for hundreds of years... But faced with massive data, this approach to science — hypothesize, model, test — is becoming obsolete."

This quote is from a 2008 article in Wired Magazine called "The End of Theory: The Data Deluge Makes the Scientific Method Obsolete" by Chris Anderson. In this article, Anderson addresses our ability to solve scientific problems by looking for correlations in data without the need to form models. This ability has been enabled by the huge amount of searchable data that the internet has generated over the past two decades, which has led us into the so-called Petabyte Age.

This approach, sometimes referred to as analytics, has been successfully employed to translate between written languages, sequence genomes, match advertising outlets to customers, and provide better healthcare to people. Now, Anderson argues, it may be applied to problems across the full range of sciences. This is a welcome evolution, partly because many fields now possess too many theories and lack the experiments to validate or deny their predictions. Take particle physics or molecular biology, for example. There are arguably more theories and models now about these systems  than ever before, and many of them can not be verified. A data-centric approach could solve this problem.

This is a very interesting idea and I've been thinking about it for a few weeks now. I think that, to make any sense of it, I need to address several issues and assumptions. Questions to consider include:
  • What is a model? When is it useful and when is it not?
  • Are only certain fields of science able to benefit from a data-centric approach?
  • What is the human component to research? How would it change if this approach was implemented?
  • What has already been done to solve scientific problems with data-driven solutions?
  • What are the philosophical implications to changing our idea of science? The scientific method has existed in some form or another for almost 2000 years (I'm referring all the way back to Aristotle, even if his ideas contained flaws). A significant change to the scientific method, especially given its importance to modern society, could have major sociological consequences.
I'll consider these questions in future posts.