Pseudoreplication


I am not a statistician but you may find the following explanation of pseudoreplication mostly correct and useful.

Assume that you want to determine which of two potato varieties (A and B) produces higher yields. You divide your experimental field in half, and you plant variety A on the northern half and variety B on the southern half. Each variety is represented by 100 plants. At harvest, you determine that yield was greater for B than for A.

In the manuscript you prepare, you indicate that the there were two treatments (two varieties), each of which was represented by 100 replicates. The ANOVA (analysis of variance) indicates that yield was greater for B than for A. You conclude that growers in your area should plant variety B.

You submit the paper to a journal, and it is rejected because you used pseudoreplication rather than true replication. This makes you unhappy.

Meanwhile, a soil scientist conducts a study in the same field and determines that the northern half of the experimental field drains poorly. This reminds you that variety A, which was planted in the northern half of the field and which yielded poorly, suffered from substantial root rot. Perhaps B would have also yielded poorly on the northern half of the field, but you cannot know this because your experiment was not replicated. It was pseudoreplicated.

You do not understand the difference between replication and pseudoreplication, and you conclude that you were unlucky. How could you have known that drainage differed in the northern vs. half of the field? You are not omniscient.

But rather than being unlucky, you were ignorant. Any researcher who has critically examined experiments knows that all experiments have unknown, uncontrolled sources of variance. In your case, one unknown source of variance was a difference in drainage. There were probably others. To deal with unknown, uncontrolled sources of variance, researchers assign replicates randomly (or randomly within blocks) to experimental fields, to greenhouse benches, to laboratory benches, and even to growth chambers. In this way, any unknown source of variance has a reduced probability of affecting one treatment more than another.

In performing ANOVAs, researchers use replication to measure variance. They determine how much of the total variance in some variable (like yield) can be attributed to independent variables (treatment effects) and how much cannot. If a substantial proportion of the total variance in a variable can be explained by treatment effects, the researcher infers that the treatment effects were statistically significant.

This analysis of variance assumes that uncontrolled sources of variance (like differences in drainage) are randomly distributed among the treatments, i.e., that all treatments have an equal probability of being affected by unknown, uncontrolled sources of variance.

Most researchers attempt to randomize replicates but such randomization can be difficult or impossible for large-scale field experiments. For example, assume that you want to determine the difference in carbon storage in pine forests vs. an oak forests. Your institute owns one 50-ha pine forest and one 50-ha oak forest. You collect 20 soil samples from each forest and measure carbon content. An ANOVA with 20 replicates indicates that the carbon content is greater in the pine than in the oak forest. You conclude that carbon content is greater in pine than in oak forests, and your conclusion is incorrect because you used pseudoreplication rather than true replication. To obtain true replication, you would have to sample many different oak and pine forests or you would have to plant a forest that contained randomly distributed replicates of oak and pine trees. Obtaining true replication can be difficult or even impossible.

How should you handle pseudoreplication?

1. Avoid it. Understand what it is and only use it if there is no choice.
2. Recognize it as an important limitation in your research. In your paper, state that the comparisons were based on pseudoreplicates rather than on true replicates and that this forces you to be conservative in interpreting the results. At least some reviewers and readers will accept pseudoreplication if it is explicitly recognized and if obtaining true replication is difficult.

Here are some other examples of pseudoreplication:
treating multiple leaves from the same plant as replicates;
treating multiple plants from the same pot or flat as replicates;
treating multiple samples from the same plot as replicates.

Am I a statistician? No. Do I know what I am talking about? Maybe.

 

 

Home or Revision Guide Index