Repeatability vs Reproducibility
Research is most impactful when it is reproducible
Science is how we communicate our understanding of the world around us. Rigorous science – good communication – includes verifying that our observations and findings are real. We assess this using repeatability, which is doing each experiment a number of times to ensure you get the same result. However, researchers are advocating for a change to produce results that are also reproducible.
In 2016, a frustrated scientist proposed that research is going through a “reproducibility crisis“. Fundamentally, if a scientific result is true, the result should be repeatable when tested by other group of scientists. However, this is not always the case. Recent studies have shown that staggeringly few studies are repeatable. This problem, called the reproducibility crisis, is especially detrimental. It indicates that some scientific claims are misinterpreted or too broad. Further, scientists who try to use previous results waste precious time and money only to find out that results cannot be repeated.
The reproducibility crisis study quickly became high-profile and served as a call-to-action. Scientists from biomanufacturing and pharmaceutical industry lamented about not getting the same results from the academic labs that published the work. In a study about cancer therapeutics, Begley and Ellis from the drug company Amgen found that only 11% of 53 cancer research studies could be repeated. In another study, Bayer scientists tried to repeat 67 studies, primarily cancer research. Shockingly, 75-80% of the studies could not be repeated. These attempts to reproduce results became prohibitively costly and time consuming, and in the end Bayer terminated most of these projects.
Here, we will define repeatability vs reproducibility, discuss why these problems arise in science, and list strategies for making your research more reproducible.
What repeatability is:
A result is repeatable if doing the same experiment over and over again produces the same answer. The mathematical language that describes repeatability is statistics. Statistical tests ask, how likely is it to get this result by repeating the same experiment 3 times? 10 times? 100 times? The more times an experiment produces the same results, the more repeatable it is, and the more likely these results are true and meaningful.
Scientists commonly assess repeatability using a statistical P value. Basically, a P value describes how likely it is to get the same result (within a margin) when an experiment is repeated. Low P values imply repeatability. If a P value is 0.01, this means if you repeat the experiment 100 times, only one of the results would lie outside a confidence interval. Generally, scientists have high confidence in results with a P value smaller than 0.05 (95% confident in the results). However, whether P values are appropriate and meaningful is unclear, as more than 800 scientists recently argued. Scientists and statisticians alike are pushing to move beyond using P values to interpret significance.
What repeatability isn’t:
Say, for example, researchers in a lab repeat an experiment enough times to get a credible P value. This repeatable result is most likely not a fluke. However, this also doesn’t mean that this result is meaningful outside of their lab. What if other labs do the same experiment using the exact methods but cannot get the same result? Another metric, reproducibility, can describe this.
What reproducibility is:
A result is reproducible if other labs can follow the experimental protocol and get the same results. The longstanding expectation for scientific results is that they need to be repeatable. However, there is growing concern about the impact of results that are repeatable but cannot be reproduced by other labs. Of course, results are much more impactful when proven reproducible by other labs, researchers, collaborators, and drug companies. Reproducibility is more stringent than repeatability because it takes other factors out of the equation. However, several factors keep labs from verifying reproducibility. Most prominently, testing reproducible costs labs time and money, and projects on new findings are more alluring, exciting, and, most importantly, more fund-able.
This issue has become what some researchers call the reproducibility crisis. As stated above, a staggering amount of research is not reproducible. This causes wasted money and time, and pressure on scientists to publish hot results as fast as possible. Now, scientists are debating about whether our scientific standards should include reproducibility.
How can research be repeatable but not reproducible?
In this digital age, data are growing in size and speed. Several factors can make research repeatable but not reproducible, also described in this Scientific American article.
Fields with the least reproducibility, such as biology and medicine, have higher natural variation. Life is incredibly complex. Factors such as cell health, microorganism strain or genetics can cause a different response to a treatment or condition. Sometimes the lab doesn’t realize how lab-specific factors affect their results. When a lab fails to notice, document and report these factors, they can affect the reproducibility of results.
To address these problems, many journal requirements are becoming increasingly strict about how to report statistics, methods and protocols. Also, companies which distribute cell lines, such as ATCC, are implementing cell authentication methods to test and verify exact characteristics about strains they cell. Find out more about how to make your research more reproducible by using an improved lab notebook, LIMS system, or automation, such as through the GenoFAB platform.