Thursday, February 7, 2013

The debate between data-centric science and hypothesis testing in soil microbiology

I've recently explored the topic of data-centric science, i.e. the art of answering scientific questions using data mining instead of generating hypotheses and testing them. I was therefore very interested by an article in this week's issue of Nature entitled "Microbiology: The life beneath our feet." The article was written by two scientists who study the relation between the microbial content of soil and its encompassing environment. One of them, Janet K. Jansson, promotes the use of data-mining from "omic" studies while the other, James I. Prosser, argues more in favor of hypothesis-driven experiments. Omic studies is jargon for the practice of identifying microbial species through detection of DNA (genomics), RNA (transciptomics), proteins (proteomics), and metaboloites (metabolomics).

Dr. Jansson argues that data-mining reveals new species of microbes and provides a sufficient base from which to carry out further experiments and tests of hypotheses. She claims that the primary critique of omic studies (which is that it provides only descriptive data) is weak since we simply don't know enough about the different species of microbes to begin with. Data mining from omics fills those gaps in our knowledge. Finally, she provides several examples where omic data mining has led to new discoveries and understanding around the world.

Dr. Prosser, on the other hand, claims that better value is obtained from hypothesis-driven research since it provides new concepts and logical frameworks for understanding the microbe-environment relationship. One quote from him that I particularly liked was the following:
Hypotheses lack value, however, if they are based solely on observations, or if they are relevant only to the data used to construct them. They are worthwhile if they incorporate novel ideas and flashes of inspiration; they can propose (ideally universal) explanations and mechanisms; and they generate predictions that can be tested by experimentation. It is this process, and not the initial observations, that truly increases understanding. Hypothesis-driven research can thus provide counter-observational, non-intuitive predictions and conceptual frameworks, and can indicate which techniques are, and are not, needed to test them.
He furthermore argues that the information obtained from data mining can generate new hypotheses but cannot be used to test these hypotheses, an argument that I have never before considered but believe to be true.

One final sentiment worth noting is delivered by his statement "In practice, purely descriptive studies of microbial communities are rare." I believe that he is arguing that, while one would argue there is value in using both approaches, the payoff from hypothesis testing is much greater and should therefore receive more resources.

I am quite pleased to see someone arguing against the use of data mining for no other reason than to balance out the arguments, but since I believe that scientific man power is growing faster than the number of hypotheses that can be generated, I see no reason to abandon data-mining as tool in this regard.