Summary: Peer review as we know it today is broken. A better way may be to create an online reputation system to rate the quality of an author, editor, or reviewer’s online contributions on the Internet, and apply such a system to pre- and open post-publication peer review. While no such reputation system exists at present, computer scientists have experimented with methods that accurately predict the quality of contributions to Wikipedia, where every word of every contribution can be traced and analyzed. Rather than tinker with fixes to traditional peer review, the author calls for the convening of a conference to reinvent peer review, using work on online reputation systems as a starting point.
Sidebar to “In Search of an Optimal Peer Review System” by Richard Smith.
Keywords: Participatory medicine, peer review, reputation systems, Wikipedia, STM publishing.
Citation: Frishauf P. Reputation systems: a new vision for publishing and peer review. J Participat Med. 2009(Oct);1(1):e13a.
Published: October 21, 2009.
Competing Interests: The author has no conflicts of interest to report with respect to this article.
If ever there was a case of becoming a vegetarian after working in the slaughterhouse it is that of Richard Smith. Better than anyone, Smith uses evidence and experience to demolish any confidence one might still have in traditional medical peer review.
Is there a better way? There is, but it won’t be easy to build.
Let’s start with a basic premise: every serious reader wants to trust that the science and clinical medicine they are accessing is as correct and current as possible and that biases and conflicts of authors and reviewers are disclosed. A few conditions need to be met for this trust to be earned:
- Editors and reviewers should be selected for expertise and trustworthiness, not credentials. Academics are sensitive about this, as academic degrees have traditionally been used to demonstrate credentials. But many experts are self-taught. Credentials are one marker of expertise—most PharmDs, for example, know more about drug effects than most physicians. Other markers of knowledge, understanding, and wisdom may include clinical and scientific experience, creativity, skepticism, intuition, intelligence, and common sense.
- There are degrees of expertise: I have some knowledge of New York City architectural history; I am a novice expert. I have no expertise in neurosurgery, but quite a bit when it comes to understanding how people access and use medical information (and there are no degrees for that – medical informatics barely counts, and would have been useless to me when I founded Medscape).
- We need a trustmark for every editor, author, and reviewer based on expertise in the subject matter under consideration, and on their actual work as a reviewer. The trust must include a method of verifying identity, so that the trust can, at some point, be verifiably associated with the human or system associated with the expertise.
- Review must be open. While a trusted reviewer may have a need to remain anonymous, their level of trust, and a secure way to communicate with him or her, must be a part of the system.
My thinking about peer review changed in 2002, when I discovered Wikipedia. Initially the concept of “an encyclopedia anyone could edit” was offensive to every neuron in my brain: my peer-reviewed publications and the Medscape website defended truth using traditional peer review (blind the author and find three experts). As publishers, we are entrusted to separate the wheat from the chaff and only let quality content see the light of day. Yet reading articles in Wikipedia, I was struck by how many were excellent. Today, a number of Wikipedia articles are state-of-the-art in knowledge on a subject.
I struggled trying to understand why Wikipedia is often so good (short answer: it is not anarchy—there are many human and computer-assisted quality controls). Then in October 2003, an issue of Esther Dyson’s Release 1.0 newsletter landed in my mailbox edited by Jeff Ubois. The entire issue covered “Online Reputation Systems.”
Ubois analyzed how process and technology can assess trust. On eBay, every buyer and seller rates the quality of his or her experience in every transaction. Millions of people learned how to stick to eBay’s rules, avoid the fraudsters, and trust only buyers and sellers with the best ratings. Thousands of other rating methodologies sprouted online, from web-enabled Zagat surveys, to RateMyTeacher.com and even TheEroticReview.com (for rating sex workers). Microsoft was investing millions in a research project called NetScan to see if reliability of content (text) on the Web and within email systems could be mined for reliability and the associated reputation of its authors.
The very concept of a reputation system seems foreign to medicine and health. But if we can create a reputation system for money, why not medicine? What, in essence, is peer review other than a trust in the judgment—the reputation—of authors, editors, and reviewers to look out for us? That we make up the rules for peer review as we go along, that the results may be invalid, all this somehow doesn’t seem to matter much. Hey, we’re peer-reviewed. We’re trying!
It turns out that if you follow Ubois’ line of thinking and look outside the fields of STM (scientific, technical, and medical) publishing, there’s a robust world of scientists and scholars looking to bring a measure of objectivity to rating the reliability of content, using efforts that go far beyond the largely human-powered effort reviewed by Ubois in 2003. Not surprisingly, much of the work comes out of computer science departments.
One important paper published in 2007 was presented at the International World Wide Web Conference Committee (IW3C2), a leading forum for discussing standards and innovation for the Web. “A Content-Driven Reputation System for the Wikipedia” proposed a concept radical to most medical writers, editors, and publishers. Adler and Alfaro, two computer scientists, suggested that the value of an editor’s contributions could be measured by whether changes introduced by an edit persist over time. They found that “short-lived edits ” and “short-lived text ” (changes that were at least 80% undone within a few subsequent edits) could compute reputations for authors as well as predict well whether an author’s text would persist:
In our system, authors gain reputation when the edits they perform to Wikipedia articles are preserved by subsequent authors, and they lose reputation when their edits are rolled back or undone in short order. Thus, author reputation is computed solely on the basis of content evolution; user-to-user comments or ratings are not used. The author reputation we compute could be used to tag new contributions from low-reputation authors, or it could be used to allow only authors with high reputation to contribute to controversial or critical pages. A reputation system for the Wikipedia could also provide an incentive for high-quality contributions. We have implemented the proposed system, and we have used it to analyze the entire Italian and French Wikipedias, consisting of a total of 691,551 pages and 5,587,523 revisions. Our results show that our notion of reputation has good predictive value: changes performed by low-reputation authors have a significantly larger than average probability of having poor quality, as judged by human observers, and of being later undone, as measured by our algorithms.
We know a computer can usually beat a human in reading an EKG, and win matches against chess champions. Can a computer also outperform a human peer-reviewer trying to determine the accuracy of an article?
Picking up from the work of Adler and Alfaro, a group of graduate students and professors at the Department of Computer Science and Engineering, University of Minnesota, took the concept one step further. “Creating, Destroying, and Restoring Value in Wikipedia” introduced the notion of the impact of an edit, measured by the number of times the edited version is viewed:
Using several datasets, including recent logs of all article views, we show that frequent editors dominate what people see when they visit Wikipedia, and that this domination is increasing. Similarly, using the same impact measure, we show that the probability of a typical article view being damaged is small but increasing, and we present empirically grounded classes of damage.
The essence of the University of Minnesota work was to measure how many people are affected by a change to an article:
…we use a more general notion of persistence than Adler and Alfaro, measuring how words persist over time rather than just detecting short-lived changes. Second, we compute how much each word is viewed over time. There is no real value in content that no one views, even if there is a lot of it; conversely, content that is viewed frequently has high value, regardless of how much of it there is. Thus, our metric matches the notion of the value of content in Wikipedia better than previous metrics.
Consider the difference in a system that declares, “There is no real value in content that no one views,” to what we have in STM publishing today: the “ impact factor” that declares a journal important the more it is cited. I wonder if that is not more of a measure of cronyism than the merit of a finding or observation.
By comparison to computer science and several other scientific fields, there is little innovation coming from the field of clinical medicine in testing the notion that there might be a better way than the traditional peer review to assess quality in medical journals. In biosciences, there is innovation in genetics and synthetic biology (OpenWetWare, developed at MIT, being an excellent example).
The scientific method relies on the publication of an observation in a peer-reviewed publication. Thus peer review is at the heart of the claim that science protects the public interest. Yet in medicine we have ample evidence that the way peer review is conducted in the major STM journals today is inefficient and unreliable. Why cling to it? Is it because academia’s tenure system of “publish or perish” only recognizes publication through this obsolete system? Academic achievement is a legitimate marker of expertise that protects a public trust only when it is based on a system that can be objectively measured, not passed on by a member of the club. (Publish or perish also hurts patients who seek to be treated by academic superstars and don’t realize that academic status is based on publishing volume, not clinical skill. A reputation system for clinical skill that looks at data outside of publishing activity—case volume in a disease, outcomes, etc.,—could address this. Today, some academic superstars are entirely devoted to research and publishing and don’t see patients.)
My hope is to instigate change by organizing a conference, together with Richard Smith, dedicated to creating a reputation system for medical publishing. The Journal of Participatory Medicine could publish a special issue of the proceedings to disseminate and promote the conclusions. Here’s what I would like to see come of it:
- A system that could create a reputation for a person, group, or institution that could travel with them around the Web, perhaps as an extension to OpenID (the open-source system of creating an Internet identity endorsed by Google, Microsoft, Oracle, and many others).
- A reputation that would be subject-matter-specific and could grow or shrink over time.
- A system that would include a way to engage in actual/open discussion between authors and reviewers to clarify issues. Google’s Wave technology provides one model for such collaboration.
- The very structure and meaning of a journal “article,” and what we even mean by an “author” and “reviewer.” If the popularity of Wikipedia and thousands of its progeny has taught us anything, it is the benefit of being able to read a single “living” article that is constantly being curated by thousands of people who care about the subject matter.
Andrew Grove, the Intel Corporation computer scientist, echoes Richard Smith’s and my views when he likens traditional peer review systems to Middle Age guilds. He shares our aspiration for a “cultural revolution” in publishing to reinvent peer review. Academic medicine, the National Library of Medicine, and other stewards of the public trust could immeasurably increase their service by discarding the obsolete and discredited methods of peer review and opening their minds to considering something better.
- Ubois, J. Online reputation systems. In: Dyson, E, ed. Release 1.0. 2003;21:1-33. Available at http://cdn.oreilly.com/radar/r1/10-03.pdf. Accessed October 10, 2009.↩
- Adler TB, Alfaro, L. A content-driven reputation system for the Wikipedia. ACM 978-1-59593-654-7/07/0005. Available athttp://users.soe.ucsc.edu/~luca/papers/07/wikiwww2007.pdf. Accessed October 17, 2009. [Google Scholar]
The group has released a WikiTrust extension for the Firefox web browser based on its research that is now available in beta and available at https://addons.mozilla.org/en-US/firefox/addon/11087 accessed October 2, 2009. A review of the extension may be found in Wired magazine at http://www.wired.com/wiredscience/2009/08/wikitrust/ accessed October 2, 2009.↩
- Priedhorsky R, Chen J, Lam SK, et al. Creating, destroying, and restoring value in Wikipedia. ACM 978-1-59593-845-9/07/0011. Available at:http://www.cs.umn.edu/~reid/papers/group282-priedhorsky.pdf. Accessed August 12, 2009. [Google Scholar]↩
- TB Adler, the principal author of the second reference in this article, explained in private correspondences that unlike the Minnesota method, at UC Santa Cruz all edits were used to compute reputation. One interesting result was that short-lived edits were a good predictor of future short-lived edits.↩
- See http://wave.google.com/help/wave/about.html#video for a demo of Google Wave. Accessed October 3, 2009. [Google Scholar]↩
- Begley S. A research revolution. Newsweek Web Exclusive. November 4, 2007. Available at: http://www.newsweek.com/id/68221. Accessed August 12, 2009. [Google Scholar]↩
- If you were to create an online reputation system for authors, reviewers, and editors, how would you do it?
- In your opinion, is the overall state of peer review as practiced in STM journals excellent, adequate, or poor? (this could be a poll w/ comments)
Acknowledgements: I would like to thank Bo Adler at UC Santa Cruz; Reid Priedhorsky at the University of Minnesota; computer researcher and writer Jeff Ubois; and KC Rice, David Kroll, PhD, Brent Menninger, MD, Holly Miller, MD, and John Haughton, MD, for early comments on this article. “Open” peer reviewer Roni Zeiger, MD, provided additional valuable insights that resulted in several improvements.
Copyright: © 2009 Peter Frishauf. Published here under license by The Journal of Participatory Medicine. Copyright for this article is retained by the author(s), with first publication rights granted to the Journal of Participatory Medicine. All journal content, except where otherwise noted, is licensed under a Creative Commons Attribution 3.0 License. By virtue of their appearance in this open-access journal, articles are free to use, with proper attribution, in educational and other non-commercial settings.