How to increase research reproducibility
What is reproducibility? And why you should care
Having a ‘eureka!’ moment in research is good. Even better? When published results prove reliable and repeatable. Perhaps less exciting, this step is essential for generating successful science. Nobody wants to face failure to reproduced the results published papers. Find out tips and tricks to increase the reliability of results.
Nothing is more frustrating to scientists than to do the same experiment twice and get different outcomes. A scientific result that can’t be repeated can’t be trusted. Even if the result is exciting. Further, sometimes data is repeatable – repeating the experiment gives the same result – but not reproducible by other labs. This, too, is extremely problematic. Results that only work for one lab lose their impact and credibility. Over the past few years, this phenomenon turned into a discussion point among scientists, who are calling it a “reproducibility crisis”.
Reproducibility is all about being transparent about exactly what was done in an experiment and what the results were. To achieve this, make methods and protocols descriptive and complete. There are several steps scientists can take to improve the repeatability and reproducibility of their data.
8 steps to make your lab research more reproducible
1. Automate data analysis
How data is analyzed can greatly impact values from a data set. What are the chosen thresholds and cutoffs? How was the data processed? What was the order of data analysis steps performed? For complicated analyses, these steps can affect the results.
To circumnavigate this problem, automate as much of the data analysis as you can. Use quantitative measurements or analyses over qualitative whenever possible. Avoid any steps that involve manually processing the data. Examples of this include setting parameters cell-by-cell, drawing boundaries by hand, or judging by eye.
Write coding scripts and macros for processing data to avoid these problems. This standardizes how relevant information is extracted from a dataset, processed, and exported in a standard format. Write README.txt files to store all data analysis parameters and outputs, including file locations and timestamps. Also, document your code well enough for someone else to understand and use.
Lastly, use version control to identify any changes to your data analysis automation. If you ever modify a script for a repeated analysis, run all other data through it.
2. After automating data analysis, publish all code (public access)
Publishing all code, scripts, and macros used to analyze and process data is important because it allows someone else to inspect precisely how results were obtained. Further, they can download your code and use it on their data to see if they can get the same results.
Make sure to annotate your code well enough that someone else could run it. Consider writing a usage file when your code is quite complicated or difficult to use. Then, publish your code in a public repository on Docker, Bitbucket, GitHub or your lab’s website. Include a link to your code in your publication. GitHub is a very popular code repository to use because it includes built-in version control.
3. Publish all data (public access)
Since data processing can affect results, it is becoming increasingly standard procedure to publish all data for public access. This includes showing whole gel images in the supplement of a paper. Another example is uploading a repository of all microscope images. Also, make all raw sequencing data public—not just the graphs and figures.
Several data repository websites exist for this exact purpose. Examples include Figshare / Dryad / re3data (for data sets and images), Genbank (for nucleic acid sequences), and GEO (for raw high-throughput sequencing data). Find out which data repository your desired journal recommends through their website or see this list.
4. Standardize and document experimental protocols
Small differences in a procedure can cause dramatic changes in results. Standardize protocols for all experiments. To accomplish this, I find it helpful to print out a copy of my procedure from my Electronic Lab Notebook (ELN), mark any changes or notes, and update my ELN entry when I’m done. Make sure to carefully record all intermediate steps in your lab notebook. Also, when using a specific time-frame or concentration, it may be helpful to do a calibrate the results at different times or concentrations. This way, you can anticipate how changes in these parameters can impact results.
5. Track samples and reagents
Where did the reagents come from? Were they from different lots? Who created a stock solution? Was it you? Or your lab tech? How old is the stock? Could it be expired? These small details can contribute to differences in results. Carefully labeling and tracking these factors can help you improve reproducibility of results.
Take very detailed notes on everything that goes into reactions. You should be able to know which master stocks, working stocks, and chemicals/reagents were used in each experiment. If you have varying results, knowing the exact components that went into a reaction will help with troubleshooting what went wrong.
Additionally, changes in cell lines can contribute to variation of results. For mammalian cell lines, verify that they have the correct genetics (this can be done through companies like ATCC). Also, confirm that cell cultures are growing in the optimal media and are not contaminated. Contaminating microorganisms like mycobacteria can grow in your mammalian cells unnoticed. Kits are available to test for various types of contamination.
6. Disclose negative or convoluted results
Though negative results are often low-impact, they are still important. Hypotheses aren’t always supported by data. Unfortunately, sometimes results are too complicated or convoluted to be easily interpreted. This doesn’t mean the results are wrong, though. It is important to publish positive and negative results.
Formalize how you will do a protocol, generate and interpret results, and statistically analyze the significance before beginning an experiment. If you get results that are negative or complicated, don’t ignore them. Instead, either do control experiments to figure out why an outcome happened or acknowledge the results in your publication.
If the results are difficult to interpret, that’s OK. In fact, the discovery of Green Fluorescent Protein (GFP) was first published as a footnote, “by the way, we saw this weird result,” in a paper, only to later earn Shimomura a Nobel Prize.
7. Increase transparency of data and statistics
Being as forthright as possible with your data is at the methods level and the data visualization level. Your graphs and plots are an essential communication tool and are most effective when they are transparent as possible. Write clear explanations of what is visualized and how. An increasingly popular way to do this is by showing each data point, not just an average in a bar chart or box and whisker, or line plot, as described here. I accomplish this by generating a scatter plot with each individual data point, generating a box and whisker plot or bar chart, and then overlaying the two.
Another important aspect of data transparency is describing how the statistics were calculated. First, ensure that you are using the correct statistical test. Next, keep in mind that reporting only a P value doesn’t describe how or why the data is significant. Since P values can vary dramatically, describe which statistical test generated the P value, mentioned in detail here.
Lastly, make sure that the data and statistics make sense. Commonly, I see scientists claiming a result based on using a statistical test that technically is significant. However, what if the significant difference is too small to really be important or relevant? A theoretical example: if scientists found that taking a high dose of a drug significantly reduces your propensity for getting cancer by 0.0001%. Avoid this problem by considering using a statistical method that takes into account the magnitude of the different result, such as effect size. Don’t oversell data – be transparent with what the results actually mean.
8. Remove or reduce sources of bias
Humans are inherently biased – we can’t help it. Even scientists are. But we can take precautions to remove bias from our data analysis. And since bias in science can misconstrue results and create problems, bias leads to reproducibility issues. We should do everything we can to remove and reduce bias.
Decide on thresholds and parameters for data collection, so as not to accidentally cherry pick results. Whenever possible, blind data collection and analysis. Lastly, automate data collection or analysis (see the first bullet point) to remove human bias and error.