Hypotheses or data first?

The recent nature issue had an interesting point-counterpoint series of 2 articles as a reflection on 10 years of the human genome project. The first article argues that hypotheses should come first because little progress has been made in the last 10 years as a result of the abundance of data from these different genome projects. The second article argues the exact opposite, that this data-driven approach has resulted in a series of break-troughs that could not have been possible with a hypothesis-driven approach.
This discussion is today still very relevant, even in our department because it mimics some of the interactions and discussion points between I.B. scientists in the science complex and in the Biodiversity institute. Here are some of my thoughts on these two articles and this discussion:
- of course data come first, hypotheses do not develop in a vacuum. All these modern tools are just an extension of old-school natural history, something that we do not include enough in our university education, I think.
- but the money quote is actually in the second article

Without comprehensive cancer genome data sets it will be difficult to distinguish signal from noise.
This implicitly says that hypotheses are necessary, because how can you get comprehensive (genome) data sets, how will define “signal”? Golub thus actually acknowledges that there is no such thing as useful data-driven research!
- The problem in both articles is that because of the nature of an opinion article, they do not use references to back up their claims, so that makes evaluating these opinions more difficult.

Avatar
Karl Cottenie
Associate Professor in Community Ecology

I am a community ecologist with a broad interest in data analysis.