Passive data collection leads to a number of problems in statistical
modeling. Observed changes in a response variable may be correlated
with, but not caused by, observed changes in individual *factors* (process
variables). Simultaneous changes in multiple factors may produce interactions
that are difficult to separate into individual effects. Observations
may be dependent, while a model of the data considers them to be independent.

Designed experiments address these problems. In a designed experiment, the data-producing process is actively manipulated to improve the quality of information and to eliminate redundant data. A common goal of all experimental designs is to collect data as parsimoniously as possible while providing sufficient information to accurately estimate model parameters.

For example, a simple model of a response *y* in
an experiment with two controlled factors *x*_{1} and *x*_{2} might
look like this:

$$y={\beta}_{0}+{\beta}_{1}{x}_{1}+{\beta}_{2}{x}_{2}+{\beta}_{3}{x}_{1}{x}_{2}+\epsilon $$

Here *ε* includes both experimental error
and the effects of any uncontrolled factors in the experiment. The
terms *β*_{1}*x*_{1} and *β*_{2}*x*_{2} are *main effects* and
the term *β*_{3}*x*_{1}*x*_{2} is
a two-way *interaction effect*.
A designed experiment would systematically manipulate *x*_{1} and *x*_{2} while
measuring *y*, with the objective of accurately estimating *β*_{0}, *β*_{1}, *β*_{2},
and *β*_{3}.

Was this topic helpful?