I used the concept of peer review heavily when I wrote my award-winning science textbooks. Even though I have a PhD in nuclear chemistry, I don’t know everything there is to know about chemistry. Thus, in order to ensure that my chemistry text was accurate, I had other PhD chemists (and one high school teacher) review the book to catch errors so that I could correct them. As I started writing textbooks that were further and further from my field of expertise, I had to rely on peer review more heavily.
While the concept of peer review is an excellent one, the execution of it in modern science has been questioned in many different ways. Some scientists think that peer review tends to enforce orthodoxy, making it very difficult for new and revolutionary ideas to be published. Others see peer review as a way for the reviewers to keep people they don’t like from getting published. Others say that is a way for reviewers to punish their rivals.
Over the years, several studies have tried to address the validity of the peer-review process, and unfortunately, the results have not been very good.
One of the most direct studies on the efficacy of the peer review process in medical literature was done by Fiona Godlee and her colleagues in 1998. The study tried to determine whether or not leaving the authors’ names off a study and requiring reviewers to sign their names to their reports would make a difference. The study found that such steps seemed to make no difference. However, what I found interesting about the study were its details.
Essentially, the authors added “8 areas of weakness” to a study that had already been accepted for publication. In other words, they took a study deemed to be good and added what amounts to 8 errors that the peer-review process should catch. However, out of 221 reviewers from whom reports were received, on average, only two of the errors were found. Worse yet, only 10% of the reviewers caught more than 4 errors, and 16% of the reviewers didn’t catch any of the errors.1 My conclusion is that at least in this case, the peer-review process failed miserably.
A couple of years later, Peter M. Rothwell and Christopher N. Martyn decided to study whether or not reviewers tended to agree with one another on a given paper. They looked at a clinical neuroscience journal as well as a clinical neuroscience conference. In order to get your paper accepted by either of these outlets, it had to be reviewed by multiple reviewers, and the ratings given by those reviewers had to meet certain criteria. The study focused on how well the multiple reviewers agreed on a given paper. You would expect that excellent papers would be recognized as such by all reviewers, as would very poor papers. Disagreement should come mostly on papers that were mediocre. However, the study found that the authors rarely agreed. Indeed, what agreement did exist was only slightly more than you would expect from chance alone.2 From this study, I conclude that either most of the papers submitted were mediocre (which is hard to believe), or the peer-review process once again didn’t work very well.
What led me to do this brief review is a brand new paper that was just recently published. Rather than testing the actual peer-review process for a given journal or a given subject, the authors produced a mathematical model to determine how important the actual reviewers are in the process. Now you might think this is an odd question, since without peer reviewers, the peer-review process can’t exist at all. However, the question is really fundamental, because peer-reviewers are human. Thus, they make mistakes, they can have bad days, they can sometimes be lazy, etc. I have been a peer-reviewer for many scientific journals, and I can attest that sometimes I did not do my best when it came to reviewing a paper. So the question these authors tried to address is how good do the reviewers have to be for the peer-review process to work well?
The authors came up with a rather ingenious model to answer this question. They started with a bunch of hypothetical scientists who both did research and acted as peer reviewers. For their role as peer reviewers, the model assigned them to one of five categories:
1. Correct reviewers who will always accept the good and reject the bad
2. Altruistic reviewers who accept all papers regardless of quality
3. Misanthropic reviewers who reject all papers regardless of quality
4. Rational reviewers who reject papers that might cast doubt on their own work.
5. Random reviewers who accept papers at random because they really aren’t qualified or don’t put forth the proper effort
For their role as scientists, the model assigned the same people a level of quality that was independent of the peer-review category they were assigned. The quality level followed a Gaussian distribution, which means that most of the reviewers were nearly average in their quality of research, a few were excellent, and a few were poor. The model then assumed that these scientists would produce papers that reflected their quality level. In other words, the excellent scientists would produce papers that should always be published, and the poor scientists produced papers that should never be published.
So…think about what the model has. It has a group of people who all function in two roles: scientist and peer reviewer. The model assumes that the scientists all produce papers for publication, and each paper is sent to two people chosen at random from the other scientists in the model. Those two act as peer reviewers for the paper. The model says the paper will get published if both reviewers approve it, and it has a 50% chance of getting published if only one reviewer approves it. In the end, then, we have hypothetical scientists producing papers getting reviewed by hypothetical reviewers for a hypothetical journal.
What did the model find? Surprisingly, the model found that when even 10% of the scientists are put in categories other than “correct,” the quality of the papers that end up getting published dropped noticeably. In addition, if you spread the reviewers evenly between “rational,” “random,” and “correct,” the review process is no better than choosing papers at random. Thus, this study shows that even a few “bad apples” in the peer-review process can harm the quality of published science significantly, and unless a large fraction of the peer reviewers behave correctly, the value of peer review is zero.3
Based on these and other studies, then, it is not clear peer review does anything to increase the quality of published science. Does that mean we should get rid of peer review? I think not. Peer review certainly has its place, and I would not want to see it go away. However, I do think that studies such as the ones mentioned in this post tell us that being publihed in a peer-reviewed journal doesn’t mean as much as some would have you think. Indeed, I have read non-peer-reviewed blogs that contain better science than some peer-reviewed journals.
1. Fiona Godlee, Catharine R. Gale, and Christopher N. Martyn, “Effect on the Quality of Peer Review of Blinding Reviewers and Asking Them to Sign Their Reports,” Journal of the American Medical Association 280:237-240, 1998. (Available online)
Return to Text