A Crisis in Social Psychology—by Shira Gabriel

I am writing about a crisis in social psychology.

But, before I address the crisis, some background: Recently, a lot of press has been given to the Science article reporting that many psychological effects, in particular social psychological effects, did not replicate in a large scale replication project. Since deciding what replicated and what did not isn’t as easy as it sounds (does it need to be significant to be successful? In the right direction? Consistent with the theory?), the exact percentage of studies isn’t clear, but, to be safe, let’s say over 50%. In other words, over 50% of studies didn’t replicate when another researcher (not the original one who published the original paper) tried to replicate them. That sounds really bad. But it isn’t the crisis I want to write about.

Let me go back a little further. A few years back a giant jerk of a man made up a bunch of data and then had the nerve to write a book about it. He was, unfortunately for us, a social psychologist. Around the same time, another psychologist published a paper with some studies that seemed to support the existence of ESP. He is also a social psychologist. Both of those things got some people pretty worked up. Also around the same time, the explosion of the internet and so many websites led to, if possible, an even greater reliance on fresh and sexy new stories every day. With so many websites looking for fresh click bait, it seems like headlines (and even stories) value shock appeal rather than truth. In other words, a headline “psychologist turns people into wizards!” is more likely to get clicks than “psychologists make small, incremental change in their knowledge about how narratives affect identity.”

Because that is what each study, if we are lucky, provides us with – small incremental change. Each study provides evidence in support of something. That is all — just evidence, not fact. It sounds much less impressive, but it is the truth. Our research builds on other research, each study suggests something, and if enough studies agree, then we can be more and more certain of something. Of course, we can never be 100% sure. Sometimes something crazy comes around and knocks over all our assumptions. That can be exciting – scary and bizarre – but exciting. But mostly we plod along. People find some papers that support a theory and then others do not, and we come to slowly build support for something. Or, support for something falters, and we have to admit that our theories are incorrect. That can be hard.

So let’s get back to over 50% of studies not replicating. It is not surprising that some of our studies don’t replicate. We wouldn’t expect them all to. But, why so many studies? There are people with much more statistical knowledge than me arguing for all kinds of complicated reasons why studies should (or should not) replicate and why we should (or should not) be concerned about that. I can’t do that. But I can give some ideas of my own.

First, as I already mentioned, it is ok (and even good) that some studies are not supported. Science grows from study to study. In some cases, the effects were simply due to chance and are not real. In other words, they didn’t replicate because the first experimenters found something that happened just due to chance (like how you can flip a coin and get heads 10 times in a row just by chance). That can happen when you run a study as well.

Second, when we run studies to see if we have support for our ideas, we tend to be very careful in how we run things. We make sure everything is exactly the same for every participant (except for our independent variable, of course). We make sure our research assistants (RAs ) are carefully trained, that they report anything odd at all, and that they make sure the lab is quiet around them. We give them careful scripts to follow, train them carefully, and make sure RAs don’t even wear shirts with things written on them. We do this because we are highly motivated to make our studies work and we know that small things can alter how participants respond. If we weren’t so interested in finding our results, it would be much easier to run a lab. Because I am motivated to find my effects, I work hard to minimize all the extraneous variables. But what if I was motivated, instead, to NOT find effects. Unfortunately, this is often the case with reproductions. This is because a failure to replicate gets much more attention than a successful replication. It is “sexier” because it calls something into question and has the intriguing possibility that someone did something “wrong.” Unfortunately, failing to find an effect is way easier than finding an effect. Even if you follow the letter of someone else’s study, you no longer have the motivation to do all those special things to minimize any outside influence. Yes, psychology is a science, but there is also an art to running an elegant experiment with a great chance of getting what you predict. If you aren’t interested in actually getting that result, then the art of it is gone and you reduce your chances.

A third reason that the studies may not have replicated is that instead of pretesting materials that would work in their populations, replicators used the exact same materials. This may make sense in other science, because chemical compounds and particles are going to be exactly the same in every lab across the world. However, people are not going to be the same across the world or time or even from town to town and school to school. However, we often assume that general processes are going to be similar for most people. For example, we assume that people, in general, will evaluate their ingroups (groups to which they belong) more positively than their outgroups (those other “scary” groups). However, if one wanted to research that in particular context, as any stigma or stereotyping researcher will tell you, one first needs *a lot* of pretesting to learn which outgroups are relevant, which ingroups are relevant, what the status of each group is, and what the stereotypes are. If, instead, one just used the materials from one sample to test another, it might look as though only one group of people in the world is prejudiced, or worse, like the people who first discovered prejudice “cooked” their findings. For example, although I think we would agree that (almost) all people are susceptible to prejudice — only a portion of them hate people from red states (or blue states, or blue-eyed people, etc). So a test of prejudice in one sample would not look the same as a test in another — even though they are both prejudiced.

A fourth reason that the studies may not have replicated is that there could be a moderator that we don’t know about. A moderator is a variable that leads you to find effects in some conditions but not others. For example, when people are feeling bad about themselves, they are more prejudiced towards others. Thus, feeling bad about oneself is a moderator for prejudice. If people ran a study looking at prejudice and they ran it in the few days after the school’s football team suffered a crushing defeat, prejudice effects would be higher. If someone tried to replicate it and ran the study at another school a week after their football team won, prejudice scores would be really low or non-existent. Thus, a replication like that would make it appear that the first effect was not real, even though it would just mean that the effect was moderated by self-esteem.

A fifth reason why the effects would not replicate is that some effects may be specific to some culture or people. Although in social psychology we tend to gloss over cultural differences (for the most part) and assume that basic psychological processes are similar everywhere, our colleagues in anthropology and sociology think quite differently. And it might be that they are sometimes correct. Perhaps some of our findings are true in Tulsa but not in Taiwan. To be sure, there are social psychologists doing amazing work looking at cultural differences, but they are the exception and not the norm. Perhaps that is something we should think about changing.

Finally, a sixth reason why effects might not replicate is that people are doing sloppy and possibly unethical things in their labs and that is leading to false results. There are some things people in the field have been doing over the years — things that aren’t cheating, manipulating, or making up data — but things that may help get to that statistical holy grail valued and worshiped P < .05 such as nit-picking variables and not including all information in a write-up. There has been a lot of discussion in Social Psychology about those things recently and a large movement toward discouraging those activities and encouraging openness in data sharing that would make them impossible. In my opinion, that is a really useful and important step to improving science… as long as we match our desire for more openness with an acceptance that data on human beings does not need to be “perfect” to be publishable, I am all in favor.

So there are many reasons why these studies might not have replicated. Just the few I mentioned here: it could be that some effects simply were not true and were found by chance; it could be that some replications weren’t quite as careful and loving as the originals were in eliminating “noise”; it could be that the materials used were not appropriate to that sample; it could be that moderators exist that we were not aware of; it could be that the effects are culturally specific; and it could be that researchers used sloppy practices. Most likely it is some combination of the above factors and not due to any one reason alone.

To be clear, the authors of the Science paper describing the failure to replicate the studies were careful to state that there were many possible explanations for the failure to replicate. But the media, being the click bait lovers that they are, have not been careful. Instead we get stories about how unreliable psychological findings are. We get stories about the end of social psychology. We get blogs gleefully announcing the need to disregard an entire discipline. We get an excuse for people to discard research findings that are inconvenient to them. And we have mumblings of people arguing for less funding to the social sciences and less hiring of social psychologists.

Why does this matter? Because I strongly believe that social psychology has a lot to add to what we know about the world. I strongly believe that social psychology can contribute toward making the world a better place. And I strongly believe that much of what we have found is real and important and exciting.

For example, take the prejudice example I previously used. A growing and exciting literature in social psychology suggests how prejudice can operate in subtle, yet impactful ways. It suggests that even those of us who mean well sometimes treat others unequally. It suggests that when we see a flash of silver in someone’s hand, we are more likely to think it is a gun when the person is black. And it suggests that there are ways to combat prejudice. In other words, social psychology has a voice to add in one of the most important issues gripping, and dividing, America right now.

Are there people making up data and doing other truly unethical things in social psychology? Probably. But there are people out there doing unethical things in every profession in the world. And as with most of those other professions, the truly unethical are a tiny minority and setting up rules to govern everyone else isn’t going to change their behavior. Instead, the mistrust and suspicion will only hurt everyone else, creating an atmosphere where people are willing to destroy the reputations of other researchers without hard evidence. Indeed, it already feels like open season on researchers out there, with accusations of fraud and carelessness everywhere one looks.

So, for me, this is the crisis in Social Psychology right now. It isn’t a crisis of a field filled with false findings. It is a crisis that has been building – a few bad apples, controversial findings, and inaccurate press clippings led to a replication movement which led, I am afraid, to public mistrust of a disturbing magnitude. The ramifications of that mistrust worry me a great deal because I have confidence in my field. More than that, I love my field. I love what we study. I love the potential of our science to help humans understand love and prejudice and the little moments that make life meaningful and special – the social moments. I love the way we disagree. I love the way our science moves and builds in fits and starts and reboots and tiny incremental and elegant steps. I love what I do and I really, really believe in it and believe that it is important. So the crisis in social psychology that worries me is the crisis in confidence — our confidence in our data, our confidence in one another, and the world’s confidence in us. There are many reasons why replications fail and ignoring those reasons will only lead to failure for us all.