Experiments are not valid.

May 29th, 2007

Dr. Donald CampbellJust as beauty is in the eye of the beholder, so is experimental validity in the mind of the inferrer.*

If this sounds like nonsense, I heartily recommend the following lit review by Albright and Malloy.**

Albright, L., Malloy, T.E. (2000) Experimental validity: Brunswik, Campbell, Cronbach, and enduring issues. Review of General Psychology, Vol 4(4), Dec 2000. pp. 337-353.

Many psychologists are trained since undergrad to think of “validity” as a property of experiments that must be protected, via randomization, from a constant barrage of “threats.” That view, like most things we learn as undergrads, is useful but simplistic. (An experiment is a historical event–how can events be invalid? And what, exactly, have we chosen to randomize?)

When the goal is to explore uncharted territory, where causal structures are murky and randomization is impossible, it becomes clear that there’s more to validity than just independent variables, randomization tables, and threat checklists.

Okay, I’m being diplomatic. Albright and Malloy are more direct. “Most social scientists do not understand internal validity,” they say in their abstract.

The dominant view of validity in psychology comes from the pioneering theoretical work of Donald Campbell, pictured above, and his colleagues (Stanley, Cook, Shadish, etc.), some of the brightest bulbs psychology has ever produced. For decades, Campbell engaged in spirited debates–most notably with Lee Cronbach–about what it really means to make conclusions based on empirical evidence.

Over the course of this introspection, Campbell made decisions about the types of inferences most social scientists were likely to find most compelling, and how experiments might be designed to support these conclusions. He was quite plainspoken about his assumptions; in fact, his papers carefully explain how his emphasis on “internal validity” is based on his philosophy of social change and scientific responsibility.

Unfortunately, there are plenty of social scientists who, rather than recognize Campbell’s work as a great philosopher of science who had a distinct viewpoint, take his heuristics as dogma. Then they make silly statements like:

  1. “Random assignment is necessary for a valid study.”
  2. “Experiments are more valid because we manipulate the independent variable.”
  3. “It’s just a case study, so it’s not very valid.”
  4. “We don’t have to worry about validity because we sampled every member of the population.”
  5. “It’s an (artificial lab / uncontrolled field) study, so the results aren’t valid.”
  6. “We’re applied, so we don’t have time to do a valid study.”
  7. “(Our results / Your results don’t) matter because the study (was / was not) statistically significant.”

———–

* This is not to take the extreme social constructionist position that we should all give up science and bathe ourselves in patchouli oil because there’s no such thing as a meaningful, well-crafted experiment. I like experiments. My point is that many social scientists treat validity as an objective property of a study, when it’s actually an expert judgement call about the fit between methodology and one out of an infinite universe of inferences. Technically, experiments aren’t valid. Inferences are.

** If it still sounds like nonsense afterwards, consider the source.

Okay, here’s why they’re silly:

  1. First off, experiments are the worst possible way of learning about new phenomena; you can’t manipulate something until after you’ve already studied it. Secondly, only a few experimenters make generalizations after having randomly sampled (let alone assigned) all relevant variables. And most of them are lying. When did you last see a study that manipulated the independent variable AND all of the other variables that might interact with it? It would be impossible to do, and so when we conduct an experiment, we’re really just taking a snapshot of our construct, and make inferences about how it really works. Observational and experimental studiesĀ  produce different kinds of data that are ultimately subjected to the same frail human judgment processes.
  2. Similar story here. Validity is more complicated than just a peeing contest between experimental and observational studies. The benefits gained by isolating an independent variable have to be weighed against the costs of decreasing sensitivity to the effects of other potentially more important variables. (Sometimes the forest matters more than a single tree.) That’s why, in theory, we shouldn’t make broad conclusions from experiments without sufficient replications. How many? Well, it’s not an exact science. I mean, science isn’t exact. No, what I really mean is that science isn’t perfectly objective. That’s the beauty of it: it guesses and then self-corrects.
  3. All historical events are valid, provided they actually happened. The question is whether we can make a reasonable inference as to how they happened. A follow-up experiment can often help, but sometimes case studies are the only way to study really interesting freak events. No lab can ever repeat measures as often (or cruelly) as nature.
  4. Sampling every member in the population is awesome. But the units are only one component in a study. Did you sample every setting, treatment, and type of observation? For certain conclusions, these are even more important than the population.
  5. Again, it’s all about the fit between method and conclusion. Besides, as psychologists, it’s pretty rare to be in a situation where the phenomena are interesting only in the lab or only in the field.
  6. And this one’s just lazy. Limited resources should spur creativity, or at least result in more careful inferences.
  7. Statistical significance is not a substitute for judgment. It absolutely does not guarantee internal or external validity, causation, generalizability, value, clinical significance, etc. Instead, it is a tool that provides information about the likelihood that an observed outcome was due to chance, based on certain assumptions. Many statistically significant results are entirely pointless. Many studies that are not statistically significant are totally meaningful. Only by understanding a study’s methodology and larger context can we apply statistical methods usefully.

Think I’m dead wrong? Leave me a note explaining why.

Leave a Reply