A Blog by

My new story on psychology’s problem with replications

I have a new feature out in Nature looking at two big problems within the field of psychology. First, the field is almost entirely dominated by positive results, while negative ones languish unpublished in personal file drawers. Second, there are few incentives to replicate old results and negative replication attempts face a lengthy gauntlet of obstacles. In the story, I look at why these problems exist and why some psychologists are starting to take them very seriously.

The piece has its origins in an incident that regular readers will already know. In January, Stephane Doyen and colleagues had unsuccessfully tried to repeat a classic experiment where people walk more slowly down a corridor after being unconsciously primed with age-related words. I wrote about their research. Two months later, the man behind the original study – John Bargh of Yale University – wrote a scathing attack on Doyen’s team, me, and the journal that published the study. I responded.

The ensuing discussion opened my eyes to an undercurrent of unrest. Many psychologists came out of the woodwork to mention experiments that were hard to replicate, common practices that they deemed to be dodgy, and a growing willingness to turn a critical eye upon their own field. For every comment that appeared on the blog and Twitter, I’ve got another that was sent confidentially to me via email.  This was clearly something worth writing about.

From the outset, I was very clear that this was not going to be a hit piece, against any individual researcher, or psychology as a field, and I’d like to think that the result is a fair assessment. I look at issues such as: conceptual replication (where different but thematically similar experiments are said to replicate each other); the difficulties of repeating studies of abstract concepts; the role of almost magical ‘knacks’; and how tiny, innocuously-made choices that pull statistically significant results from the ether. I also look at the extent to which these problems are unique to psychology, compared to other sciences.

And perhaps most importantly, I look at the triggers that have provoked many psychologists into recognising and discussing these problems, from controversial papers about psychic abilities to high-profile frauds like Diederik Stapel. It is not my contention that such misconduct is common but I think that the field’s cultural norms have created an environment where Stapels could go unnoticed.

But enough background. Do read the piece, and let me know what you think. I might post a bit more about it, depending on reactions.

And finally, many thanks to Brendan Maher for seeing the potential in the story and editing it into shape, and Helen Pearson for taking it over the finishing line.


I’m going to collect some other good resources on this issue:


19 thoughts on “My new story on psychology’s problem with replications

  1. A thoughtful piece, Ed. Got me thinking about replicability – if it is so hard to replicate all the parameters (down to the colour of the room) of a psychology experiment of this kind, maybe we do have to do B in order to show A is true (or not).

    I mean, rather than trying to repeat A (in the knowledge that a negative finding proves nothing outright anyway), perhaps other groups should build a new hypothesis: “If A is true, we would expect to observe X, so let’s do B to test that”. This approach, rather than being limited to supporting or not supporting some suspiciously convenient or attention-grabbing results, would have the potential to further the field if A is true and might lead others to new, non-A-dependent hypotheses should X turn out not to be observed. Feels more constructive than straight replication, which carries all the problems you identify in your piece….

  2. If the dodgy psychologists had been working on the SETI project the existence of alien life would be irrefutable from the “Wow!” signal and anybody not finding their own Wow! signal designed their experiment wrong.


  3. A lovely piece.

    To be honest, I almost feel uncomfortable calling psychology a science right now. And the weirdest thing is that the tools to change that are right there. Papers should be pre-registered for publishing (including experimental design etc.), as Wagenmakers says, and studies should be considered unreliable unless replicated. It’s a disgrace that journals won’t accept replications, and it should be standard practice to regard publishing a paper as the starting point rather than all there is to say on a particular subject.

  4. This is an obvious generalization, specially Iota’s comment. The problem that the author describes is real, if and where it exists, but I’d bet it’s not limited to Psychology and doesn’t extend to all classical problems and clearly not to all Psychology.

    Though I also agree that negative replication results have a really hard time getting published, I wouldn’t say they that they “languish unpublished in personal file drawers”, some do get published and that’s obvious from doing a literature review in almost any field of Scientific Psychology. I also must add that most replications are not replications per se as they generally introduce changes in the conditions either accidently or in order to test something else. What it means is: keep replicating.

    In the end, I’d ask the author to do some real research on the case that could show how big the problem is (specially sending it to APA and Experimental Psychology Associations). As I’m sure you know, the author’s statements don’t have much to back them out other than that they ring true to most scientists’ ears. The fact that you’ve heard about the experience of many Psychologists who can’t get published is of course a question of sampling: how do you know how many Psychologists in general can’t get published? How many have non-replications and do get published? Well, as I hope everyone reading this knows, you need the full table to be able to draw the conclusions the author is defending: http://en.wikipedia.org/wiki/Contingency_table

  5. Thanks for the comments folks. I stress that the point of the piece is *not* “psychology is rubbish” and should not be interpreted as such. It’s that the field has problems – many shared, some specific – and that people are recognising it. Before slating psychologists in general, do note that every single person that I interviewed – including all the ones calling for change – are psychologists! Let’s not tar everyone with the same brush.

    And @Cristina – “the author” suggests that you actually *read the piece* which includes (a) what data there is on the scale of the problem, (b) acknowledgments of what we still don’t know, (c) details of attempts to get more data.

  6. Interesting stuff Ed, and nicely written! Bit of a step into controversy, hope it generates discussion rather than heat.

    I find the proportion of significant publications in psychology a bit of a smoking gun; you’d have to twist a long way to argue that the results reflect reality. Maybe psychologists are so damned good they never test an untrue hypothesis 😉 Depressingly, my field (genetics) also has high rates of ‘positive’ publications – I suspect for different reasons though.

    I did groan aloud at the paragraph about psychology experiments being like theatre direction, that there is a knack to getting positive results and the suggestion that there is ‘secret knowledge’ in these experiments that can be difficult to impart. If these suggestions hadn’t been from quotes, I’d have sugested this was a most crude and cruel mischacterisation of the field (NB. not for one moment do I think all psychologists think this way).

  7. “the field is almost entirely dominated by positive results, while negative ones languish unpublished in personal file drawers”.
    Not just psychology, surely?

  8. @Eleanor – I admit that I was quite shocked by that too.

    @Anne – “Not just psychology, surely” – Indeed not, as I point out repeatedly in the piece

  9. @Michael – That logic works in theory, but in practice, it only produces a solid body of evidence if each individual strand is strong. And if you factor in publication bias, the drive for flashy new results, the “tricks” that get used, and so on… you have the potential for individually weak results to back each other up. Bottom line: you need to do both. Extend *and* replicate.

  10. It sure isn’t just psychology. “The statistical practice of most physicists, not to mention other scientists, is crude and often seriously flawed.” [ http://geocalc.clas.asu.edu/html/Inferential.html ] All of science has been harmed by the early 20th c. replacement of its rational inferential foundations with the “irrational on a key intuitive notion of rationality” orthodox inference. Unfortunately, as I think this: http://www.thepsychologist.org.uk/archive/archive_home.cfm?volumeID=25&editionID=213&ArticleID=2059 article clearly illustrates, the half-baked ‘Bayesian’ (as opposed to ‘Jaynesian’) approach isn’t an effective remedy.

  11. Hello Ed,
    first, thanks so much for all your efforts in creating this site.

    In the area of medical clinical trials there is now a requirement to document all results positive, negative or even nil.
    Eg down here “Australian New Zealand Clinical Trials Registry”, and in the US the govt “Clinical Trials”.

    Are some pysch eperiments akin to these trials?
    Then when publishing papers you include a reference to this type of database.

    This might avoid the scenario of airlines employing psychics to predict crashes instead of engineers working to avoid them?

  12. We’re addressing some of these problems in the study of Human-Computer Interaction too – see http://repliCHI.org – we’re trying to push our conference (main publication venue in our field) to focus more on this topic. im going to read your paper/links in ernest.


  13. @Ed Young: I’m sorry I got lost on the sensationalist words that were used. Otherwise, I think your point is valid and that it should be pushed further, as I stated already, good luck!

    @Larry Moran: this is a result of how scientific (and pseudo-scientific) information dissiminates (or not). You wouldn’t see much bad Psychology mixed in with the good one in those few amazing Psychology Journals that keep their standards. Some bad stuff gets published from time to time and sometimes good one is withheld, but I still must reeinforce the idea that this is not a systemic error of Psychology and it has no a priori reason to be bigger than in any other field.

    As far as Evolutionary Psychology is concerned, if people who are interested enough in it to read they probably can realize by themselves that their articles are actually damned stupid. Though I agree with you, at least some shouldn’t be published in the first place. Did you read an Evolutionary Psychology theory on homosexuality?

  14. What I am missing from the hype surrounding replication at the moment is a clear argument for its special status in science.

    After all, we are not testing tens or even hundreds of people for fun. We don’t run statistical analyses (classical, Bayesian or other) because we don’t know what else to do.

    I do believe replication is special, because it is the only approach which can overcome the human factor, i.e. the subjective side of science. One can argue for a long time about whether Daryl Bem analysed his data correctly or not. But one cannot argue about the value of a straight replication.

    To put it shortly:
    If science is to be objective, it must be replicable.

    So, I don’t see you piece as a story about Psychology (Being a Psychologist, I am biased of course), the value of alternative data analyses or personal failures (à la good Psychologists vs. bad ones). I think it is just a case study of a field which is rediscovering some fundamental approach in science.

  15. “If science is to be objective, it must be replicable.”

    But of course the reverse implication doesn’t hold: it’s necessary that an experiment be replicable, yes, but replication isn’t sufficient to overcome “the human factor” [cf. Millikan: http://en.wikipedia.org/wiki/Oil_drop_experiment#Millikan.27s_experiment_as_an_example_of_psychological_effects_in_scientific_methodology ] I don’t think replication is of itself the fundamental approach which characterises good, objective science. Having a Feynmannian integrity motivating the replication is what really matters.

    Bem’s analysis was “not even incorrect”. The ‘Bayesian’ critiques also conspicuously failed to identify, and point out the consequences of, what was by far the most serious error: the fatal flaw that makes such experiments /in principle/ incapable of generating results interpretable as evidence for retrocausal phenomena (even if retrocausal phenomena exist)¹. As Sean Carroll has written in a post somewhere in his Cosmic Variance blog here:

    “If parapsychologists followed the methodology of scientific inquiry, they would look [at] what we know about the laws of physics, realize that their purported subject of study had already been ruled out, and within thirty seconds would declare themselves finished. Anything else is pseudoscience, just as surely as contemporary investigation into astrology, phrenology, or Ptolemaic cosmology. Science is defined by its methods, but it also gets results; and to ignore those results is to violate those methods.”

    Psi retrocausality pseudo-experiments are spectacular violations. Medical science has its clinical trials of homeopathy and ‘energy medicines’. Such things are futile cargo cult science – grotesque parodies of genuine science. The motivation of those attempting replications of Bem is good but replication isn’t the appropriate tool in the science toolbox for dealing with that sort of stuff.

    (1) http://www-biba.inrialpes.fr/Jaynes/cc05e.pdf

Leave a Reply

Your email address will not be published. Required fields are marked *