Lean Data Collection and Analysis

There’s a tendency in some research organizations to view data analysis and data collection as activities designated to separate domains. A lean data collection plan bring the two together.

Data collection, the thinking goes, comes first, since it’s only after data has been generated via experiments that analytical insights can be derived. Accordingly, any value that can be obtained in data analysis is determinant on an organization’s ability to generate and collect data. And, following from that, it’s only worthwhile to invest in “data analytics” if data generation can occur in a qualified manner.

The perspective is simple common sense. It’s also fraught with danger.

That’s because, while data analysis is inherently dependent on data collection, data collection is also dependent upon data analysis, and both are dependent upon the project’s objectives. The reality is that data collection is not an independent variable in the way it is often perceived to be. Instead, these phases of research are integrated.

What does that look like – and what does it mean for research processes?

A series of three questions can help us to envision a better approach.

1. What are your scientific and / or business objectives?

Research is always driven by the need to reach scientific or business objectives. These, really, are the closest things to independent variables within the research process; objectives will influence approaches to data generation, collection, and analysis, while these subsequent activities likely won’t impact research objectives.

Properly formulated, a scientific objective will lead to a question that can be answered by data.

2. What data should or could be collected to answer this question?

Here, the goal is to specify a dataset that has the necessary statistical power to produce significant results while ensuring that the data collection is still practical. This step is crucial and often underestimated; figuring out what data to collect is key to avoid the all too common mistake of collecting data that simply don’t have the answer to the question asked.

In particular it is important to keep in mind that many areas of biology are data-poor. It may sound paradoxical in the age of the $1000 genome and Big Data everything. Yes, some biologists generate petabytes of data but not all the biological specialties have the same data generation capabilities. And volume of data needs to be compared to the complexity of the question asked. It’s fair to say that the collection of a data point in life science is orders of magnitude more expensive than the collection of a data point used the marketing industry or other industry where Big Data approaches have been so successful.

In biology, sample sizes tend to be smaller than what we would like, the number of variables that can be observed tend to be limited, and data points tend to be expensive and not always of very good quality.

With this context in mind, it’s important to note that any operational constraints related to data acquisition are likely to lead to the use of a particular analytical method. Properly designed datasets make analysis more viable, and analytical methods can help to inform data collection.

3. What data analysis methods will be applicable to the specified dataset?

In regard to this third question, it’s important to note that the scientific or business problem being solved largely determines the types of data analysis that are available. In other words, the answer to the first question will impact the answer to the third question, and answers to questions two and three will likely impact each other as well.

The three questions are tightly related – and, consequently, should not be addressed in isolation.

How Integrated Experiment Design Impacts Our Approach

The major takeaway is this: because all three considerations within the research process are tightly integrated, the design of a research project should integrate all three phases – it shouldn’t only be relegated to specific phases.

Put more simply: it’s less effective to outsource data analysis as a standalone service than it is to pursue a consultative engagement across a project’s lifecycle.

It is not as unusual than one would think that scientists collect data just because they can, and scientists feel good when they collect data. That’s part of the job description, right? However, it’s because one can do something that’s necessarily the best use of limited resources. After they have collected the data, they turn to someone with quantitative skills to get the answers to ill-formulated questions in poorly designed dataset. In this context, there is only so much the consultant can do if the answer to the question is simply not in the data.

Because, as we’ve seen, data analysis is best provided through an integrated engagement in the entire research process. The best consultants aren’t simply statistical analysts brought in to review existing datasets; they’re experts with insight into all parts of the experiment design process. They need to understand the question, the operational constraints, and the analytics method.

Working with consultants in this way can help to negate the risk of inefficient data collection. It promotes a leaner approach in which a project goes through several iterations of a cycle of data collection and data analysis aiming at optimizing the fit between the questions, the data collection strategy, and the data analysis method.

Ready to Empower Your Experiments with an Integrated Approach?

Don’t separate data analysis from data collection. If you’re ready to empower your research by taking an integrated approach, let’s talk.

At GenoFAB, we equip research teams with technology solutions and computational services customized to streamline flows of scientific data. This infrastructure supports the adoption of lean management methods and helps life science organizations to reach scientific milestones on schedule and within budgets so that R&D investments don’t go to waste.

And it ensures that the considerations of data analysis and data generation are addressed in an integrated manner that creates more effective experiments.

Want to improve a specific project? Schedule a consultation today.