A Few Arguments for Open Science

In science, I am sometimes challenged to justify my affection for open source software and open access publishing. The warm, fuzzy feeling, of community ownership and control, is not sufficient to convince people to switch to open source models. Here I reiterate some arguments that open science is morally and ethically favorable.

Scientific knowledge, especially publicly funded knowledge, should be returned freely to society. Sadly, current solutions for allocating funding conflict with this ideal. To fund research, we enforce artificial scarcity using the concept of 'Intellectual Property'. Artificial scarcity allows us to adjust the value of knowledge to match the costs associated with its creation. Intellectual property concerns are similar in academia, industry, and the arts, but I restrict reflection to the area which most affects me : Science.

Academia runs on reputation, and requires accurate attribution of intellectual property. This discourages behaviors that benefit science at large. Collaboration is avoided out of fear that it could cause one to lose ownership of ideas. Data are not shared if there is a possibility that said data cold yield more publications. Useful intermediate results that are too small to publish are kept secret. Techniques may be kept 'trade secrets'. These behaviors are bad for science, but academics are pressured into them to survive. Similar can be said for all industries : protecting a product ensures that the market rewards those who make the original investment, but this problem is not one of free-markets. All economic models must solve the problem of funding the most capable individuals.

Can open science resolve this conflict? Not yet, but it suggests a few immediately useful steps. Resource allocation is simplified when resources are abundant.

  • Open science lowers the cost of research by reducing artificial scarcity.
  • Open access publishing frees researchers from large, expensive, library subscriptions. 
  • Open source scientific software frees researchers from expensive proprietary software. 
  • Open source operating systems and software reduces the cost of computing. 
  • Open source hardware and small scale fabrication technologies reduce material costs.

When grants no longer support library subscriptions and matlab licenses, the same amount of money can support more researchers, and more research. For theorists, this can mean reducing the cost of research to the cost of living, and university affiliation is no longer a feudal necessity. With these simple steps, we can confine competition for resources to covering the cost of living and physical material costs of experiments.

The remaining costs of competition can be lessened by following an open science etiquette. Share data freely. Let people know what you're working on, if not the details. Publish often so that useful results are not withheld. If you find someone working on a similar problem, collaborate and discuss how to redirect efforts to reduce overlap. Clearly state the nature of contributions to a manuscript. Give attribution liberally, and acknowledge sources of inspiration explicitly. Take advantage of self publishing and non-peer reviewed repositories like arxiv, which, while they do not provide a stamp of legitimacy to the research, preserve attribution. Encourage people to build upon your work, and accept that they have many of the same ideas, goals, and competencies, as you. When someone does an experiment you had planned, be excited that you get to see the results earlier, and talk to the authors to look for possible collaborations and ways to reduce future overlap. There are many interesting problems to solve before we can eliminate anti-collaborative pressures in science.

Open science may be a social good, but can it also be a selfish good? Publishing open access can lead to broader dissemination of your results, bolstering reputation. Releasing code under open licenses provides and opportunity to demonstrate competence, and demonstrates your value to the scientific community. Using open source languages for scientific computing make your code available to students, educators, and outsiders, who would otherwise be unable to use your code. Source code licenses that preserve attribution can improve reputation and awareness of your work. Using open document formats for collaboration creates positive feedback to improve free productivity tools, which will lead to lower costs in the long run.

There is much more to be said on this topic, here and elsewhere. Democratization of science is something that I think about daily and is an ideal that I strive to uphold in my own research as much as I am able.


  1. Hear Hear!

    As the resident science policy wonk (as opposed to actual scientist) around here, I feel like I should add my two cents. I absolutely agree with the sentiment of this post. Science is useful only to the extent that it can be taken up by others, and lowering arbitrary access barriers like journal subscriptions and software licenses is a key step to expanding the community of scientists.

    One reason why Open Science is discourage, and which is alluded to in this post, is the priority race. There are no Nobel Prizes for being the second discoverer. But a more significant force against Open Science is the false idea that scientific material is fragile and must be protected from the untrained.

    This is true in some circumstances; I've been reading a lot of STS studies of neuroendocrinology, and a a few grams of a a substance purified from 50,000 cow brains is genuinely rare and expensive, and the team that purifies is justified in not sending samples to anybody who asks. But this is an edge case, and quality scientific material is increasingly available.

    It's an open secret that journal articles are almost never reproducible, too much tacit knowledge goes into the construction of a scientific claim to be described in a 10 page article. And while the apprenticeship model of PhD programs is one way to transmit that tacit knowledge, the Open Source community has developed another model based on continuous public discourse.

  2. There were several good articles on this very topic in the recent Nature ( ironically, some of these may be behind paywalls? ).




    Also note: we actually have largely reproducible results from a lot of the primate motor work. Some of the more "fancy" analyses become hard to reproduce, but, overall, the effects appear to be robust and reproducible. The main reason that this is possible in our field is that, once you have an array working in the brain, you can very quickly duplicate other experiments. For animal work, you usually are not allowed to simply reproduce a previous experiment for ethical reasons -- but with the chronically implanted arrays, we can reproduce earlier results as a byproduct while still testing new results and costing no additional harm to animals.

  3. Mathematics I think has a significant advantage in this direction over laboratory science. The closest thing to the "open science" ideal that I am aware of is the Polymath projects run by Tim Gowers. (http://polymathprojects.org/) A close second is mathoverflow. In both cases, mathematicians can massively collaborate on problems of importance, in a context in which participating is generally viewed as its own reward. In the math overflow context, an individual mathematician may ask a "small" technical question or ask for something like a pointer to a survey or a literature reference for a proof he's heard but doesn't know the reference info anymore. The polymath projects are more ambitious; the organizers try to pick a "medium-sized" question, large enough to be significant, small enough that it might be solved quickly by the group, and hopefully with enough early branchings of the plausible lines of investigation that massive parallelism will be helpful. They've gotten better at this with several tries.

    I haven't participated in any of these, although it would probably be a good learning experience, and I know people who have tried to participate. One of the problems is that you need significant maturity to really get a lot out of it, although the organizers try to pick problems without a large amount of background material required. A different problem is that without a clear guarantee of how an individual would be credited and benefit from it, it could be too great a risk for a young person early in their career to take. Probably the lack of young risk takers would significantly alter the character of the investigation as well.

    I'm not sure that its feasible that real science could ever quite have that same feeling of Caltech undergrads hammering away on their problem sets together. But I think if you could think of small steps to help make this kind of thing more workable for young mathematics researchers, it would also yield dividends for open science generally. I don't doubt that this kind of thing will become more and more mainstream with time, but academic inertia can be quite significant.