Abstract
Summary: After 30 years of practicing peer review and 15 years of studying it experimentally, I’m unconvinced of its value. Its downside is much more obvious to me than its upside, and the evidence we have on peer review tends to support that jaundiced view. Yet peer review remains sacred, worshiped by scientists and central to the processes of science — awarding grants, publishing, and dishing out prizes. It would be a bold funding body or journal that abandoned peer review, but could we at least do better? I want here to explore peer review — from a rather personal point of view — and ask questions about what would be the best system for the Journal of Participatory Medicine.
Sidebars:
Reputation systems: A New Vision for Publishing and Peer Review” by Peter Frishauf
and “Peer Review and Reputation Systems: A Discussion” with Peter Frishauf, Richard Smith, Liz Wager, Alex Jadad and Thomas (Bo) Adler
Keywords: Participatory medicine.
Citation: Smith RW. In search of an optimal peer review system. J Participat Med. 2009(Oct);1(1):e13.
Published: October 21, 2009.
Competing Interests: The author has declared that no competing interests exist.
After 30 years of practicing peer review and 15 years of studying it experimentally, I’m unconvinced of its value. Its downside is much more obvious to me than its upside, and the evidence we have on peer review tends to support that jaundiced view. Yet peer review remains sacred, worshiped by scientists, and central to the processes of science—awarding grants, publishing, and dishing out prizes. It would be a bold funding body or journal that abandoned peer review, but could we at least do better? I want here to explore peer review—from a rather personal point of view—and ask questions about what would be the best system for the Journal of Participatory Medicine.
The Misery of Peer Review
Let me begin with my immediate frustrations. For 25 of the past 30 years I’ve been editor (of the BMJ), and now I’m a reviewer and an author. I have just completed a review for the BMJ—of a paper that had interesting new data on an important topic but doubtful methods. Despite my scepticism about peer review, I usually accept requests to review, although I always wonder why. It’s time-consuming and unpaid and usually my comments disappear into the void. The BMJ, as I expected, rejected the article, primarily on the grounds that the paper wasn’t right for its audience. Like most major journals, the BMJ rejects over 90% of the studies it receives, many of them after hours of scrutiny and comment by reviewers. The reviewers’ time is largely wasted because many authors, recognizing the arbitrariness of the "publishing game," simply send the paper elsewhere without revision.
I think that it would make much more sense simply to publish the paper—on a university website or in an electronic journal with a low threshold—with my comments and those of the other reviewer, and let the world decide what it thinks. That is anyway what happens in that many peer-reviewed papers disappear without trace after publication, some are torn to pieces, and a few flourish and are absorbed into the body of science. The paper rejected by the BMJ, which may well not surface for another year, contained data that would fascinate some and inform a current and important debate. I can’t see that any harm would result from it being available to all.
At the moment, I’m also waiting for an opinion on a paper that tells a complicated story of what we see as scientific misconduct on the part of a publisher. Four of us on three continents wrote the paper rapidly because it suddenly became topical after a major news story. We asked the BMJ to fast track the paper, and it was rapidly rejected with some thoughtful reviews. We did, unusually, revise the paper and submit it to another journal with a request for rapid review. That was about two months ago, and the only thing we’ve heard has been from a reviewer, who happens both to be a friend and to have written a review on our paper for the BMJ. He wanted to know if he could simply send the same review, but I told him that—perhaps unfortunately for him—we had revised the paper in the light of his opinion. So he’ll have to review it again. Our chances of getting published in the second journal are perhaps 30%. If the paper is rejected, we’ll either get fed up and abandon it or continue our way down the food chain—because you can get virtually anything published if you persist long enough.
Again, I think that much would be gained and really nothing lost if our paper was simply posted on a website with the reviewers’ comments attached.
Peer Review Is a Deeply Flawed System
The Sixth International Congress of Peer Review in Biomedical Publication was held in September 2009, and dozens of scientific studies were presented on the subject. The First Congress was in Chicago in 1989, when many of the presentations were opinion rather than new data—but that was about the beginning of studies of peer review. Until then, it was unstudied despite being at the core of how science is conducted. Sadly, in my experience, most scientific editors know little about the now large body of evidence on peer review. So paradoxically, the process at the core of science is based on faith rather than experimental evidence.
If editors were to examine this body of literature, they would discover that evidence on the upside of peer review is sparse, while evidence on the downside is abundant. We struggle to find convincing evidence of its benefit, but we know that it is slow, expensive, largely a lottery, poor at detecting error, ineffective at diagnosing fraud, biased, and prone to abuse.[1][2][3] Sadly we also know—from hundreds of systematic reviews of different subjects and from studies of the methodological and statistical standards of published papers—that most of what appears in peer-reviewed journals is scientifically weak.[4][5][6][7]
The evidence on peer review has been gathered together in a book specifically on peer review,[1] and I have summarized the evidence on its many problems in a book and an article.[2][3] Let me quote here just three studies.
Two of them are Cochrane reviews of the evidence to support peer review in both scientific publishing and grant giving.[8][9] Cochrane reviews, as most readers probably know, are widely regarded as the highest-quality systematic reviews. The paper on peer review in journals concludes: "At present, little empirical evidence is available to support the use of editorial peer review as a mechanism to ensure quality of biomedical research.[8]" And the review on grant giving: "There is little empirical evidence on the effects of
grant-giving peer review. No studies assessing the impact of peer review on the quality of funded research are presently available.[9]" Both reviews point out that "absence of evidence" is not the same as "evidence of ineffectiveness," and the first paper says that studying peer review is methodologically difficult.[8] So there may be benefit in peer review, as most scientists believe, but we haven’t yet been able to show it convincingly in empirical studies.
One way that we studied peer review at the BMJ was by inserting deliberate errors into short papers and then asking reviewers to review the papers without telling then that they contained the inserted errors.[10] These studies consistently showed that reviewers spotted only a minority of errors and that many reviewers spotted none. The table shows how few of 607 reviewers spotted nine major errors and five minor errors that had been inserted into papers describing randomized trials, which are arguably one of the easiest types of study to review because expected standards are so explicit.
Table: Proportion of reviewers identifying each error by group for the three papers.[10]
|
Paper 1 |
Paper 2 |
Paper 3 |
|
Control |
Self-taught |
Face-to-face |
MAJOR ERRORS |
|||
Poor justification for study |
31 |
36 |
36 |
Biased randomization procedure |
49 |
58 |
53 |
No sample size calculation |
21 |
24 |
21 |
Unknown reliability and validity of outcome measure |
13 |
19 |
21 |
Failure to analyze the data on an intention-to-treat basis |
22 |
18 |
22 |
Poor response rate |
34 |
36 |
37 |
Unjustified conclusions |
43 |
40 |
41 |
Discrepancy between abstract & results |
23 |
25 |
28 |
Inconsistent denominator |
38 |
45 |
53 |
MINOR ERRORS |
|||
No ethics approval |
18 |
14 |
14 |
No explanations for ineligible or non-randomized cases |
50 |
48 |
58 |
Inconsistency between text & tables |
5 |
2 |
2 |
Word reversal |
13 |
9 |
10 |
No mention of Hawthorne effect |
21 |
12 |
19 |
Improving Peer Review
As people have understood better the many defects of peer review, they have tried ways to improve it. One of the first developments was to try blinding reviewers to the identity of authors. A randomized trial conducted by Bob and Suzanne Fletcher with others before they became editors of the Annals of Internal Medicine, using the quality of the opinion as the outcome measure, showed that blinding did improve quality.[11] But then two much bigger trials found no evidence of improvement, and about 10% to 20% of reviewers could anyway identify the authors.[12][13]
We conducted one of those trials at the BMJ and then decided that we would try the opposite—allowing the authors to know the identity of the reviewers. In a large trial of open peer review, we found no difference in the quality of the reviewers’ opinions.[14] At this point we introduced open peer review—with the authors but not the readers knowing the identity of the reviewers—on ethical grounds, arguing that reviewers should be accountable for their judgments and receive credit. A judgment by an unknown judge seemed totalitarian.
Interestingly, when we asked a sample of reviewers whether they would review openly about half said yes and half no. When we conducted the trial, very few people declined to review openly and when we introduced the policy only a handful of reviewers in a database of around 5,000 refused to sign reviews. As a professor of economics said to me recently: "Economists have no interest in people’s views but only in what they do."
Our next step was to conduct a trial of telling reviewers that if the papers they reviewed were published then all the background peer review material—including reviewers’ and editors’ comments and authors’ responses—would be published. We completed the trial approximately five years ago, but the results have not yet been published, partly, ironically, because of delays in peer review. The trial did not find any meaningful difference in the quality of opinions. We’ve also tried training reviewers—but again, this had little impact on the quality of their reviews, perhaps because we were trying to teach "old dogs new tricks" or because the dose of training was inadequate.[15]
The plan at that stage at the BMJ was to proceed to open up the whole process—placing submitted papers online, asking reviewers to comment, and allow anybody, but particularly authors, to comment as the process proceeded. Peer review would thus be transformed from a black box to an open scientific discourse.
This development hasn’t happened, and it seems to many an impossibly radical step. The main objection is that "low quality, possibly dangerous" material will be released. My response is that this happens already. We know that lots of poor quality research appears because of the hopelessness of peer review, because papers are regularly presented at major conferences with virtually no peer review, and because many researchers present the results of their studies to the press. In the latter two cases there is usually no detailed description of methods and results, meaning that it’s impossible for even the well-informed to evaluate the study. With the system we proposed, the full data would be available.
Acquiring Better Evidence on Peer Review
Most scientists continue to believe in peer review despite the lack of evidence to support it. This is partly because most are unaware of the evidence, but some think that we are simply not studying peer review in the right way. Could it be that more sophisticated or different methods could show the utility of peer review? This question was raised at the end of the Fourth Congress on Peer Review in Biomedical Publications, but there has been progress with methods. Most of those who have studied peer review have been from epidemiological and statistical backgrounds. Perhaps social scientists using qualitative methods could find evidence to support the belief of scientists that peer review is beneficial.
"The Job of the Many Not the Few"
Web 2.0, the social web, may hold the key to the future of peer review. Peer review will become the job of the many rather than the few, and we know that the many can solve problems better than the few[16]—that must, indeed, be part of the philosophy of participatory medicine. We need to move, says Charles Leadbeater, one of the gurus of the Web, from "I think" to "we think" and find ways to harvest the thinking of thousands.[17] Instead of filtering and then publishing because publishing is expensive, we can now, says Clay Shirky, another thought leader on social media, "publish and then filter.[18] " It is this radical thinking that has created the magnificence of Wikipedia.
Peter Frishauf, founder of Medscape and another forward thinker, has suggested that we can use reputation systems—rather as eBay does—to filter material.[19] In a sense, this happens already: after publication a process ensues whereby most studies disappear but a few flourish and have consequences in the real world. It must in some way be the trusted, those with reputations, who drive this process.
Might we be able to automate the process, asking scientists and others to score studies—so allowing those that are the most important to rise to the top? This has been tried—for example, by the Public Library of Science (PLoS)—but so far scientists seem reluctant to score studies. PLoS has, however, recently added "article metrics" to all its studies. These metrics include article usage statistics (in graphic form), citations from scholarly literature, social bookmarks, comments left by readers, notes left within articles, blog posts, and ratings. When combined, these metrics will surely give a clear picture of which studies are of the most importance. Article metrics can be useful, however, only after publication.
And could we formalize and give validity to the process of attributing reputation? Currently, reputations are won dubiously and, as Mark Twain said, "Once you have a reputation for being an early riser, you can sleep in ’til noon every day." B. Thomas Adler and Luca de Alfaro have described a system whereby a reputation is built mathematically for Wikipedia contributors. It involves scoring them positively for making contributions that persist and negatively (lost points) for contributions that are rapidly removed.[20] Could we find an equivalent for peer review? (See Frishauf sidebar to this article.)
Conclusion
We already have some provisional answers to these questions that have come from a discussion that is available as a podcast in this journal (see “Peer Review and Reputation Systems: A Discussion”). The overall conclusion was that the present system of peer review is badly broken and that something new is needed. Most of those in the discussion would favor moving to a system of "publish and let the world decide," preferably with systems of reputation and article metrics. We are at an early stage with these systems, and there was agreement that we should experiment—recognizing that experimentation inevitably means some "failures." "In all science, error precedes the truth, and it is better it should go first than last," said Hugh Walpole.
It does, however, feel very bold for editors to abandon prepublication of peer review—like walking into the street naked. But if the emperor has no clothes, what’s to be lost? Nothing, but much is to be gained.
References
- Godlee F, Jefferson T. Peer Review in Health Sciences. 2nd ed. London: BMJ Books; 2003. [Google Scholar] ↩
- Smith R. Peer review: A flawed process at the heart of science and journals. J R Soc Med 2006;99:178-182. [Google Scholar] ↩
- Smith R. The Trouble With Medical Journals. London: RSM Press; 2006. [Google Scholar] ↩
- Altman DG. Poor-quality medical research: What can journals do? JAMA. 2002;287:2765–2767. [Google Scholar] ↩
- Altman DG. Statistics in medical journals. Stat Med. 1982;1:59–71. [Google Scholar] ↩
- Andersen B. Methodological Errors in Medical Research. Oxford: Blackwell; 1990. [Google Scholar] ↩
- Altman DG. The scandal of poor medical research. BMJ. 1994;308:283–284.
[Google Scholar] ↩ - Jefferson T, Rudin M, Brodney Folse S, Davidoff F. Editorial peer review for improving the quality of reports of biomedical studies. Cochrane Database of Systematic Reviews 2007, Issue 1. Art. No.: MR000016. DOI: 10.1002/14651858.MR000016.pub3 [Google Scholar] ↩
- Demicheli V, Di Pietrantonj C. Peer review for improving the quality of grant applications. Cochrane Database of Systematic Reviews 2007, Issue 1. Art. No.: MR000003. DOI: 10.1002/14651858.MR000003.pub2 [Google Scholar] ↩
- Schroter S, Black N, Evans S, Godlee F, Osorio L, Smith R. What errors do peer reviewers detect, and does training improve their ability to detect them? J R Soc Med. 2008;101: 507-514. [Google Scholar] ↩
- McNutt RA, Evans AT, Fletcher RH, Fletcher SW. The effects of blinding on the quality of peer review. A randomized trial. JAMA. 1990;263:1371-1376.
[Google Scholar] ↩ - Justice AC, Cho MK, Winker MA, Berlin JA, Rennie D. The PEER investigators. Does masking author identity improve peer review quality: A randomised controlled trial. JAMA. 1998;280:240-242. [Google Scholar] ↩
- van Rooyen S, Godlee F, Evans S, Smith R, Black N. Effect of blinding and unmasking on the quality of peer review: A randomized trial. JAMA 1998;280:234-237. [Google Scholar] ↩
- van Rooyen S, Godlee F, Evans S, Black N, Smith R. Effect of open peer review on quality of reviews and on reviewers’ recommendations: A randomised trial. BMJ. 1999;318:23-27. [Google Scholar] ↩
- Schroter S, Black N, Evans S, et al. Effects of training on the quality of peer review: A randomised controlled trial. BMJ 2004;328:657–658. [Google Scholar] ↩
- Surowiecki J. The Wisdom of Crowds: Why the Many Are Smarter Than the
Few. London: Abacus, 2005. [Google Scholar] ↩ - Leadbeater C. We-think, Mass Innovation, Not Mass Production. London: Profile, 2008. [Google Scholar] ↩
- Shirky C. Here Comes Everybody: How Change Happens When People Come Together. London: Penguin, 2009. [Google Scholar] ↩
- Frishauf P. The end of peer review and traditional publishing as we know it. Available at: http://www.medscape.com/viewarticle/583316 Accessed October 3, 2009 (free registration required). [Google Scholar] ↩
- Adler BT, de Alfaro, L. A content driven reputation system for the Wikipedia. In WWW 2007, Proceedings of the 16th International World Wide Web Conference, ACM Press, 2007. Available at: http://users.soe.ucsc.edu/~luca/papers/07/wikiwww2007.html. Accessed October 14, 2009. [Google Scholar] ↩
- Should JoPM have a peer review system that involves external peers or should the editors just decide for themselves?
- Should JoPM adopt a traditional "closed" system of peer review, whereby neither authors nor readers know the identity of reviewers?
- Should peer review be "light" (considering perhaps not whether a paper is original or important but simply whether the conclusions don’t run ahead of the methods and results) or "heavy"?
- Should peer review continue to play a role in strengthening the author’s discourse, as opposed to recommending publication?
- Should reviewers be blinded to the identity of authors?
- Should authors know the names of reviewers?
- Should readers also know the names of reviewers?
- Should all of the comments of reviewers and editors be published at the same time as the papers?
- Should papers be put online as soon as submitted and reviewers and editors asked to place their comments online as completed?
- With the above system should anybody be able to comment at any time?
- Once papers are published, should there be some sort of scoring system that allows some to emerge as more important than others? If so, how should the scoring work?
- Should JoPM try to develop a validated reputation score for readers, reviewers, and authors? If it can be developed, should there be a way of weighting the scoring of papers according to the reputation of the scorers—so perhaps hastening the highlighting of important papers?
Open Questions
If you have read this far, you might be convinced that peer review is a flawed system, but perhaps think it the least bad system we have for deciding what to publish. The last few paragraphs perhaps show that we don’t yet have a clearly articulated alternative to peer review; but this is your chance to "join the revolution" and together with the editors devise a better system for this journal. You might start by venturing thoughts, preferably based on evidence, on the following questions:
Copyright: © 2009 Richard Smith. Published here under license by The Journal of Participatory Medicine. Copyright for this article is retained by the author(s), with first publication rights granted to the Journal of Participatory Medicine. All journal content, except where otherwise noted, is licensed under a Creative Commons Attribution 3.0 License. By virtue of their appearance in this open-access journal, articles are free to use, with proper attribution, in educational and other non-commercial settings.
Much is made these days (over and over and over!) about the supposed horrors of peer review: It is said to be inefficient, biased, and to fail to serve the role that it was intended to serve, of producing a high quality archival literature. Although I think that everyone would agree that the peer review system could be improved, esp. in terms of its speed, I feel that these discussions miss what seems to me to be the most important value of peer review: Its role in the education of scientists, and specifically, of highly efficient, precisely targeted, and secure narrow band communication among scientists. Having received hundreds of peer reviews of my work, and certainly my fair share of negative reviews, I can only think of one review that I received that was not useful to me in improving my paper, and more importantly, in teaching me something important and central to my field — usually many things! We say things in our manuscripts, and in our reviews of others’ manuscripts, things that often ought probably not reach the light of day. Reviewer’s catch these for us, and teach us why these are not the right things to say. On the other side, we say things in the reviews we write that we would never publish, because we are speaking to particular scientists (the authors of the paper), basically teaching them what we think they ought to know. Nowhere else, once scientists leave their final post doc, is there a similar opportunity for direct continuing eduction. Peer review is only in part a filtering system — and to my mind that is a relatively small part of it value. It is, in addition and more importantly, a highly efficient and secure system of targeted peer education. To ignore this function in changing the way the peer review works is, to my mind, to endanger one of the pillars of scientific communication.
In the complexity of medicine there should more attention for ideas and thoughts. The peer review helped me a lot in learning to write things down more precise and more with the reader in my mind.
I agree that accent inmedicine is too often on farmcological interventions. The diagnostic process is much more interesting for me as a GP but at the same time more difficult to investigate.
The editors of the journals should keep intouch with the workfloor.
Selection of what is important to read is a good thing and saves valuable time for the pratitioner.
Dr.Richard Smith is synonymous with science, integrity and intellect. His vast experience with BMJ and other journals could have helped him understand that there needs to be a scientific review system to assess and select papers for publication. It is important to maintain the standard and quality of the journal. It is very essential for authors also to have a review system. It opens up author’s mind and acts as a barometer to check the scientific content and relevance of the article being submitted for publication. BMJ has a very standard peer review system and therefore needs no recommendation to replace it or re-organize the system. Yet such an introspection mirrors the status of journal publication and the need to find ways to improve upon the quality of journal’s publication. It is also imperative for leading journals to understand there are many from the developing countries who wish that their articles are considered for publication . Such consideration may not be on compromising quality but on scientific content or hypothesis that may prove original though it could not have been possible for the worker to do very exhaustive study because of funding and facilities.
Creationist are using Dr. Smith’s work as a reason to condemn evolution and in support of creationism. I’m a layman on this subject, but it seems to me if you open up university websites to publish anything, the truth will be buried under an avalanche of nonsense. Nonsense created as a simple means to confuse and mislead people. How long before oil companies bombard all these websites with bogus studies on climate change? With unlimited resources, Scientific fact would cease to exist, crushed by the resources of every special interest group, vying for votes or profit.
Jeff Shrager showed me this paper, and I was fascinated reading it since I encountered several illnesses described in the paper first hand. And I formed an opinion that the publication system needs to become more transparent.
I had some recent experience with this sort of non blind public review process and wish to share this experience in this forum to try and complete the picture.
The first time I encountered public non blind review was in 2012 when the SciPy conference organizers asked for reviews that will be made public for SciPy 2011 papers. This was a refreshing change and I was excited to participate in such a review. SciPy organizers chose not to continue in this path despite my arguments. And around that time I decided that I will pursue this path on my own. When asked to review papers, I answered editors that I only conduct public non blind review. This was contradictory to most journal policies, yet eventually opportunities came along where editors agreed that reviews will become public.
I opened a web portal where reviews can be posted publicly:
https://groups.google.com/forum/?fromgroups#!forum/public-scientific-reviews
I posted my reviews there as an invited reviewer. Recently I was invited to organize a track in SummerSim, one of the SCS conferences that are peer reviewed. I agreed once the organizing committee allowed me to have public non blind review as the track review process.
For several months I had to solicit papers and invite reviewers and manage their reviews. You can find those reviews online in the above portal.
My first impression is that the scientific narrative improved. No paper was outright rejected, instead the lowest recommendation for accept/reject question was “borderline”. This is not a representative sample since there were only 7 papers considered for evaluation by 2-3 reviewers. Yet compared to blind review processes I encountered, it seemed that the reviewers were more conscious and tried to improve the publication rather than tried to filter publications. Note, however, that they were instructed to do this in my review instructions, so this result is perhaps a self fulfilling prophecy rather than an unbiased observation. Nevertheless, the reviews provided carried important improvements to the submitted papers which seemed welcome by the authors. Moreover, SummerSim organizers agreed to make changes in the computer system to support public non blind review in the future.
It is hard to comment about how much the decision to use public non blind review affected authors and reviewers to join the process since it varied among papers. However, there was one extreme case where more than 40 invitations were sent for people who were identified as experts, only to find one reviewer other than myself. From my past experience, this number of invitations is more than sufficient to find 3 or more reviewers, and therefore the unusual high number is reported here. The Richard Smith mentioned in the paper a fear of possible low response rate of reviewers as a possible hurdle for the non blind public review process. However, most reviewers who answered the review request and gave a reason were complaining about time issues or other commitments and many provided possible alternative reviewers. No potential reviewer specifically answered that they decline review due to it being published with their name signed on it. Moreover, recruiting reviewers for other papers was an easier task.
At the end of the review process, authors of papers were asked to join the review process and one author accepted becoming a reviewer. Again, a relatively low participation rate. This low rate of reviews per invitation implies that reviewers need additional incentive to keep the peer review system intact.
If not enough of us are reading what we write, then the review system is not sustainable. It can be easily compared to a pyramid scheme where some lose. If each academic is an author and you need at least two reviewers for each paper, with a coordinator that needs to locate at least 10 or more experts to get those reviews, some papers will not be published just because of time issues. How many good ideas do we lose that way?
Moreover, since we have advanced editing tools these days, authors can easily produce more text in higher rates. Therefore reviewers have to read more text. I know an older professor who told me his Ph.D. thesis is 19 pages long and it took effort to generate it. I wish papers today were as short and as easy to generate. Today texts are much longer, diverse, and complex, many times requiring collaboration of many people. Do reviewers have sufficient time and expertise anymore to do proper review?
It seems that technology is sending us towards the path of post review publication. Richard Smith has pointed it in this paper, and we may need to adapt to this new reality. Exercises like the one I describe with public non blind review should help us adapt. We need to learn how to behave in such environment that fits our technological progress. And we need to give incentives for reviewers to perform proper review. A first step is recognition. Another possible step is promotion for review services granted.