Peer review is one of the most important concepts in scientific publishing. When I submit a scientific article to a serious scientific journal, it is generally reviewed by people who are experts in the relevant subject in order to determine whether or not the article is worthy of publication. These people are called “peer reviewers,” and it’s their job to determine whether or not there are any errors in the paper, whether or not the paper’s conclusions are reasonable based on the bulk of the data, whether or not the conclusions and/or data are novel or interesting enough to merit publication, etc. In short, the peer-review process is supposed to ensure that only “quality” scientific articles get published in the scientific literature.
I used the concept of peer review heavily when I wrote my award-winning science textbooks. Even though I have a PhD in nuclear chemistry, I don’t know everything there is to know about chemistry. Thus, in order to ensure that my chemistry text was accurate, I had other PhD chemists (and one high school teacher) review the book to catch errors so that I could correct them. As I started writing textbooks that were further and further from my field of expertise, I had to rely on peer review more heavily.
While the concept of peer review is an excellent one, the execution of it in modern science has been questioned in many different ways. Some scientists think that peer review tends to enforce orthodoxy, making it very difficult for new and revolutionary ideas to be published. Others see peer review as a way for the reviewers to keep people they don’t like from getting published. Others say that is a way for reviewers to punish their rivals.
Over the years, several studies have tried to address the validity of the peer-review process, and unfortunately, the results have not been very good.
One of the most direct studies on the efficacy of the peer review process in medical literature was done by Fiona Godlee and her colleagues in 1998. The study tried to determine whether or not leaving the authors’ names off a study and requiring reviewers to sign their names to their reports would make a difference. The study found that such steps seemed to make no difference. However, what I found interesting about the study were its details.
Essentially, the authors added “8 areas of weakness” to a study that had already been accepted for publication. In other words, they took a study deemed to be good and added what amounts to 8 errors that the peer-review process should catch. However, out of 221 reviewers from whom reports were received, on average, only two of the errors were found. Worse yet, only 10% of the reviewers caught more than 4 errors, and 16% of the reviewers didn’t catch any of the errors.1 My conclusion is that at least in this case, the peer-review process failed miserably.
A couple of years later, Peter M. Rothwell and Christopher N. Martyn decided to study whether or not reviewers tended to agree with one another on a given paper. They looked at a clinical neuroscience journal as well as a clinical neuroscience conference. In order to get your paper accepted by either of these outlets, it had to be reviewed by multiple reviewers, and the ratings given by those reviewers had to meet certain criteria. The study focused on how well the multiple reviewers agreed on a given paper. You would expect that excellent papers would be recognized as such by all reviewers, as would very poor papers. Disagreement should come mostly on papers that were mediocre. However, the study found that the authors rarely agreed. Indeed, what agreement did exist was only slightly more than you would expect from chance alone.2 From this study, I conclude that either most of the papers submitted were mediocre (which is hard to believe), or the peer-review process once again didn’t work very well.
What led me to do this brief review is a brand new paper that was just recently published. Rather than testing the actual peer-review process for a given journal or a given subject, the authors produced a mathematical model to determine how important the actual reviewers are in the process. Now you might think this is an odd question, since without peer reviewers, the peer-review process can’t exist at all. However, the question is really fundamental, because peer-reviewers are human. Thus, they make mistakes, they can have bad days, they can sometimes be lazy, etc. I have been a peer-reviewer for many scientific journals, and I can attest that sometimes I did not do my best when it came to reviewing a paper. So the question these authors tried to address is how good do the reviewers have to be for the peer-review process to work well?
The authors came up with a rather ingenious model to answer this question. They started with a bunch of hypothetical scientists who both did research and acted as peer reviewers. For their role as peer reviewers, the model assigned them to one of five categories:
1. Correct reviewers who will always accept the good and reject the bad
2. Altruistic reviewers who accept all papers regardless of quality
3. Misanthropic reviewers who reject all papers regardless of quality
4. Rational reviewers who reject papers that might cast doubt on their own work.
5. Random reviewers who accept papers at random because they really aren’t qualified or don’t put forth the proper effort
For their role as scientists, the model assigned the same people a level of quality that was independent of the peer-review category they were assigned. The quality level followed a Gaussian distribution, which means that most of the reviewers were nearly average in their quality of research, a few were excellent, and a few were poor. The model then assumed that these scientists would produce papers that reflected their quality level. In other words, the excellent scientists would produce papers that should always be published, and the poor scientists produced papers that should never be published.
So…think about what the model has. It has a group of people who all function in two roles: scientist and peer reviewer. The model assumes that the scientists all produce papers for publication, and each paper is sent to two people chosen at random from the other scientists in the model. Those two act as peer reviewers for the paper. The model says the paper will get published if both reviewers approve it, and it has a 50% chance of getting published if only one reviewer approves it. In the end, then, we have hypothetical scientists producing papers getting reviewed by hypothetical reviewers for a hypothetical journal.
What did the model find? Surprisingly, the model found that when even 10% of the scientists are put in categories other than “correct,” the quality of the papers that end up getting published dropped noticeably. In addition, if you spread the reviewers evenly between “rational,” “random,” and “correct,” the review process is no better than choosing papers at random. Thus, this study shows that even a few “bad apples” in the peer-review process can harm the quality of published science significantly, and unless a large fraction of the peer reviewers behave correctly, the value of peer review is zero.3
Based on these and other studies, then, it is not clear peer review does anything to increase the quality of published science. Does that mean we should get rid of peer review? I think not. Peer review certainly has its place, and I would not want to see it go away. However, I do think that studies such as the ones mentioned in this post tell us that being publihed in a peer-reviewed journal doesn’t mean as much as some would have you think. Indeed, I have read non-peer-reviewed blogs that contain better science than some peer-reviewed journals.
REFERENCES
1. Fiona Godlee, Catharine R. Gale, and Christopher N. Martyn, “Effect on the Quality of Peer Review of Blinding Reviewers and Asking Them to Sign Their Reports,” Journal of the American Medical Association 280:237-240, 1998. (Available online)
Return to Text
2. Peter M. Rothwell and Christopher N. Martyn, “Reproducibility of peer review in clinical neuroscience,” Brain 123:1964-1969, 2000. (Available online)
Return to Text
3. Stefan Thurner and Rudolf Hane, “Peer-review in a world with rational scientists: Toward selection of the average,” arXiv, 2010 (Available online)
Return to Text
Though it’s not peer review, but rather grading by instructors at my college. I find that only in courses involving mathematics the grading of my work is determinable precisely. But with such courses as “introduction to the humanities” and “human uses of the environment” the grades might as well be completely arbitrary.
Good point, Ben. Actually, peer review is like grading, but the grade is given by a peer. The studies seem to indicate it can be as arbitrary as a humanities grade.
“The model says the paper will get published if both reviewers approve it, and it has a 50% chance of getting published if only one reviewer approves it.”
This is not the way many journals work. On the contrary, papers accepted by only one reviewer (or 2 oe 3) usually are declined.
How might the results change if the stated model is made more accurate?
Great question, Arthur. I am not sure how it would affect the model. I will try to contact one of the authors to see if they ran that scenario.
Arthur, I wrote one of the authors, asking if they ran the model assuming only papers approved by both reviewers would be published. He replied:
Dr. Wile-
Could you respond to the claims being made about the new “Goldilocks” planet that astronomers have found? What does the actual science justify at this point?
Hi Marshall. Thanks for your question. For those who do not understand, a “Goldilocks” planet is one that is thought to be “just right” for life. In order to be able to support life, it must be the right distance from its star so that it is not too warm and not too cold. Most think it must also be a rocky planet that is at least somewhat similar to earth so that it can have an atmosphere, a magnetic field, and the other things necessary for life. A star known as Gliese 581 is a red dwarf star, which is relatively small and cool. There are at least six planets that orbit the star, and the sixth one, called Gliese 581 g, is thought to have 3-4 times the mass of earth, be rocky, and be at the right distance from the star for parts of it to be at the right temperature to support life. Thus, it is being called a “Goldilocks” planet.
The problem is that the data are VERY new (Keck reported the observations on September 29, 2010), and they haven’t been analyzed by a lot of others. This can be a problem. Back in 1996, another planet, 70 Virginis b, was discovered, and the initial analysis of the data indicated that it was a “Goldilocks” planet. Later analysis, however, seems to indicate that it is too warm.
So the first question is whether or not the planet really is at the right distance from its star to produce the correct temperature for life. Even if it turns out that is the case, there are still lots of other issues. For example, it is currently thought that Gliese 581 g is tidally locked, which means the same side of the planet always faces the sun. That could be a problem for life, since the dark side would be ridiculously cold, and lots of the side facing the star would be ridiculously hot. There would essentially be only a “band” of the planet that would have the right temperature for life. It is hard to understand how weather would work on a planet like that, and it is not clear whether or not any planet with just a “band” of habitable area could really support life. Speaking of weather, we don’t even know if it has an atmosphere. Its mass indicates it could hold one, but we don’t know that it has one. Also, the atmosphere could be like that of Venus, which would make it uninhabitable. In addition, we have no idea about whether or not it has a magnetic field, which is also crucial.
The bottom line, then, is that currently, it is THOUGHT that the planet is at the right distance from its star to support life. Astronomers have been wrong about that before, however. Even if they are right, we don’t know about ANY of the other aspects that are necessary for life, and at minimum, the fact that it is thought to be tidally locked is a problem.