Peer Review: A Flawed Process at the Heart of Science and Journals

Richard Smith, former editor of the BMJ

June 6, 2020

The following excerpt — Chapter 7: Peer Review: A Flawed Process at the Heart of Science and Journals — is reprinted with permission from The Trouble With Medical Journals (Taylor & Francis, 2000).

Peer review is at the heart of the processes of not just medical journals but of all science. It is the method by which grants are allocated, papers published, academics promoted and Nobel prizes won. Yet it is hard to define. It has until recently been unstudied. And its defects are easier to identify than its attributes. Yet it shows no sign of going away. Famously, it is compared with democracy: a system full of problems but the least worst we have.

When something is peer reviewed it is in some sense blessed. Even journalists recognize this. When the BMJ published the highly controversial paper that I mentioned in chapter 2 that argued that a new ‘disease’, female sexual dysfunction, was in some ways being created by pharmaceutical companies, a friend, who is a journalist was very excited — not least because reporting it gave him a chance to get sex onto the front page of a highly respectable but somewhat priggish newspaper (the Financial Times). ‘But,’ the news editor wanted to know, ‘was this paper peer reviewed?’ The implication was that if it had been it was good enough for the front page and if it hadn’t been it wasn’t. Well, had it been? I’d read it much more carefully than I read many papers and had asked the author, the journalist Ray Moynihan, to revise the paper and produce more evidence. But this wasn’t peer review, even though I’m a peer of the author and reviewed the paper. Or was it? (I told my friend that it hadn’t been peer reviewed, but it was too late to pull the story from the front page.)

My point is that peer review is impossible to define in operational terms (an operational definition is one whereby if 50 of us looked at the same process we could all agree most of the time whether or not it was peer reviewed). Peer review is thus like poetry, love or justice. But it’s something to do with a grant application or a paper being scrutinized by a third party — who is neither the author nor the person making a judgement on whether a grant should be given or a paper published. But who is a peer? Somebody doing exactly the same kind of research? (In which case he or she is probably a direct competitor.) Somebody in the same discipline? Somebody who is an expert on methodology? And what is review? Somebody saying ‘The paper looks all right to me’, which is sadly what peer review sometimes seems to be. Or somebody pouring all over the paper, asking for raw data, repeating analyses, checking all the references and making detailed suggestions for improvement? Such a review is vanishingly rare.

What is clear is that the forms of peer review are protean. Probably the systems of every journal and every grant giving body are different in at least some detail, and some systems are very different. There may even be some journals using the following classic system. The editor looks at the title of the paper and sends it to two friends whom the editor thinks know something about the subject. If both advise publication the editor sends it to the printers. If both advise against publication the editor rejects the paper. If the reviewers disagree the editor sends it to a third reviewer and does whatever he or she advises. This pastiche — which is not far from systems I’ve seen used — is little better than tossing a coin — because the level of agreement between reviewers on whether or not a paper should be published is no better than you’d expect by chance (121).

That’s why Robbie Fox, the great 20th century editor of the Lancet who was no admirer of peer review, wondered whether or not anybody would notice if he were to swap the piles marked ‘publish’ and ‘reject’. He also joked that the Lancet had a system of throwing a pile of papers down the stairs and publishing those that reached the bottom. I was challenged by two of the cleverest researchers in Britain to publish an issue of the BMJ comprised only of papers that had failed peer review and to see if anybody noticed. I wrote back ‘How do you know I haven’t already done it?’

I think that I ought to describe at least one peer review system and I hope you will forgive me if I describe the system the BMJ had when I left. (It’s since changed.) The old BMJ system is obviously the one I know best, but it had features that made it unusual. I’ll describe what they were as we go.

One thing that is currently unusual about the BMJ peer review process — but will not be for much longer — is that it’s conducted on the World Wide Web. Soon, I suspect, all peer review will be conducted on the web. So far the BMJ has simply transferred a traditional paper-based system to the web, but even the ‘simple transfer’ may have had unexpectedly profound effects. For example, I found at the end of my time at the BMJ that I didn’t know many of the people I was asking to review, whereas a few years before I knew most of them. As one editor put it to me, ‘The old guys are gone. They don’t like computers.’ In the longer term many things will be possible with a web-based system that would be impossible with a paper-based system — for example, conducting the whole process in realtime in full public view. Already authors can interrogate the system to find where their paper is in the system.

The BMJ has about 6000 papers submitted a year. In the end it publishes about 8%. Most major general journals have a similarly low acceptance rate. Smaller and specialist journals accept many more, and some journals, I suspect, publish almost everything they receive. Paradoxically, it always seems to me, editors boast about their rejection rates. The editor with the highest rejection rate is like the banker with the tallest skyscraper. Where else do people boast about rejection?

Editorial folklore says that if you persist long enough you can get anything published, no matter how terrible. Stephen Lock, my predecessor, followed up studies rejected by the BMJ and found that most were published somewhere (121). He found too that most authors had ignored any suggestions for changes suggested by reviewers. My friend Drummond Rennie, deputy editor of JAMA, has produced one of the greatest (and certainly longest) sentences of medical journalism to illustrate that any paper can get published:

‘There seems to be no study too fragmented, no hypothesis too trivial, no literature citation too biased or too egotistical, no design too warped, no methodology too bungled, no presentation of results too inaccurate, too obscure, and too contradictory, no analysis too self-serving, no argument too circular, no conclusions too trifling or too unjustified, and no grammar and syntax too offensive for a paper to end up in print.’ (122)

(In admiration of such a sentence one of his readers sent him the whole sentence embroidered.)

There is a ‘food chain’ with medical journals down which rejected studies pass. Unfortunately the BMJ is not at the top, but fortunately it’s a long way from the bottom. This is clearly a very inefficient system. Studies may be reviewed many times before they are published. Similarly grant applications may be reviewed many times before they are funded or, worse, rejected. The total time spent reviewing must be huge and must be increasing as the rates at which grants are funded and papers published declines. By 2020, one wag calculated, academics will do nothing but peer review.

The BMJ triaged the studies submitted to it, rejecting two-thirds without sending them out for external peer review. We had around a dozen different editors doing this and had a system of duty editors. Our aim was to make an initial decision on a study the day it arrived. We were looking for studies that had an important message for clinical medicine or public health and were original, relevant for our audience and valid — meaning that their conclusions were supported by their evidence and data. I posted an account of how the BMJ triaged studies on its website. We began by asking whether the study had anything interesting and new to say. If not, we rejected it. We then had many methodological signals that caused us to reject, including, for example, any survey that had a response rate below 50% (with very rare exceptions) or any study advocating a treatment that was not a randomized trial unless the paper gave compelling reasons why a randomized study was impossible. Sometimes an editor rejected the study alone, but more often two editors looked at a study before rejecting it.

This system may have seemed brutal (and certainly some authors felt so), but the editorial team at that time thought that it had an unanswerable logic. We would waste everybody’s time, including the authors’, by sending out for review a study that had no chance of making it into the BMJ. Authors could send their rejected study to a journal that might publish it without losing a lot of time, and the BMJ could make sure that it used the highly valuable time of reviewers on studies that it might publish and that could be improved by their attention. It was sensible for the BMJ to spend its resources on papers it was going to publish, not on those it was not. Some will argue that journals ought to be helping authors improve their papers even if they are not going to publish them, and sometimes at the BMJ we did that when we wanted to encourage new types of research. We did it many years ago with research in primary care (general practice), and we did it with research on quality improvement, information and learning.

The two-thirds of studies that were rejected without external review were rejected with a ticksheet that included a list of about 40 common reasons for the BMJ rejecting studies — the editors ticked the reasons for rejecting that particular study. Authors would naturally prefer a personal letter explaining why the BMJ had rejected their study, but this would not be a good way for the journal to use its resources. We thought that the ticksheet was better than just a rejection slip and our research among authors showed that they, grudgingly, agreed. Unsurprisingly, they didn’t like rejection in any form.

The one-third of studies that remained after triage were sent to an external reviewer. Usually the BMJ had just one reviewer for a paper, although it used two when the editors and the reviewers were learning about a new methodology. So we always sent a paper that used economic methods to an economist as well as to a clinical reviewer, whereas we thought that most of our reviewers were familiar with methods like randomized trials. Using a single reviewer was unusual. Most journals use at least two and some use as many as a dozen.

When I first arrived at the BMJ in 1979 we probably didn’t use more than a hundred reviewers altogether, and there was no database of reviewers. Initially I would have to ask the senior editors to suggest somebody. We had perhaps two reviewers for the whole of rheumatology and almost all the reviewers were from Britain and over 50. Such a system could have advantages in that you knew and trusted your reviewers, but it had the obvious disadvantage that one or two individuals had undue influence over what was published.

The BMJ when I left had a database that included several thousand reviewers from all over the world, most of whom the editors didn’t know. Unusually we asked people to volunteer to be reviewers and in this way we have found many hundreds of excellent reviewers. One result that emerges from all studies on who make the best reviewers is that those under 40 do better than those over 40, even after allowing for the fact that younger people often have more time (73). We graded the review produced by reviewers on a four-point scale and generally used only once somebody who got the lowest score. Increasingly we used only people who had repeatedly scored highly for their reviews. We also had data on the time people took to review and we tended to discard those who were always slow.

Our database had detailed information on the particular interests of reviewers. So we would not send a study on rheumatoid arthritis to any rheumatologist but to one who had a particular interest in rheumatoid arthritis, and we would often make a still closer link between the content of the study and the interests of the reviewer.

It’s a hard — and largely thankless task — being a reviewer. To review a paper properly takes several hours and the mean time that BMJ reviewers spend on a study is over two hours. Some spend as many as 20 hours. The BMJ is unusual in paying reviewers, but we paid only £50 to people whose market rate would be more than £I00 an hour. Furthermore, most peer review systems are closed in that the names of the reviewers are known only to the editors. So most reviewers receive no public or academic credit for their work. The BMJ is again unusual in having a system where the authors know the name of reviewers, but — at the moment — readers do not.

A good review will comment on the originality of the work (preferably with references to related work), discuss the importance of the question being asked, give a detailed account of the strengths and weaknesses of the study, comment on the presentation of the paper and give constructive comments on how it might be improved. Published accounts of the quality of reviews show that perhaps one-fifth are outstanding, one-fifth of little or no use and the rest somewhere in the middle. My impression, not supported by strong data, is that the quality of reviews for medical journals is improving — not least as increasing numbers of doctors are trained in the critical appraisal of studies, statistics and clinical epidemiology.

I didn’t when I left the BMJ very often see the review that said: ‘I’m scanning this at the airport and it looks excellent to me — but then it would be because Joe is a first class researcher.’ Nor did I see: ‘I wouldn’t touch this with a bargepole. That lot at St Domino’s can’t be trusted.’ It would be unwise to sign your name to such reviews, but I saw such reviews in the bad old days — that is, 10 years ago.

If a BMJ reviewer gave cogent reasons why a study should not be published, then we rejected the study, sending the authors a copy of the reviewers’ comments. It wasn’t long ago that many journals and grant giving bodies did not send back the reviewers’ comments. They would simply reject without explaining why, fearing that sending reasons would only encourage authors to disagree and appeal — so creating more work.

If the reviewer thought that the paper might be worth publishing or the BMJ editor thought it might be, perhaps because the reviewer was uncertain or hadn’t given a good opinion, then editors discussed the paper at a weekly meeting. Sometimes the paper was rejected at that point or, if it was still thought publishable, it was sent to one of our pretentiously and ominously named ‘hanging committees’ (so-called after a committee of the Royal Academy which decides which pictures will be hung in the summer exhibition).

The hanging committee made the final decision on the paper and comprised one or sometimes two editors, two people from a pool comprised mainly of practising doctors (we called them ‘hangers’) and a statistician. The committee never had more than a dozen papers to consider and everybody was supposed to have read every word of every paper together with the comments of all the reviewers. The committee decided which papers to publish and in particular it gave detailed suggestions on how the papers that were going to be published could be improved. As I will discuss below, the evidence on peer review suggests that the benefit of peer review lies less in deciding which papers to publish or grant proposals to fund and more in improving the papers that are published or the proposals that are funded.

The BMJ‘s hanging committee was unusual in at least four ways. First, many (probably most) journals simply have one or two editors make decisions without needing to meet. Second, it is unusual to have statisticians and clinicians discuss studies together. Usually, the statistician produces a written report. But we thought that the discussion among clinicians, editors and statisticians was one of the greatest benefits of the process-both for each to understand the others’ worlds and for learning. Third, even when journals do have a meeting to decide which papers to publish it is unusual for more than one or two people to have read every word. Usually somebody simply presents the paper to the group. Fourth, the BMJ‘s decisions on which papers to publish were made not by editors but by ‘outsiders’, albeit people who came regularly to the journal and were paid by the journal. The idea was that these hangers not only were good at making decisions on the validity of studies but also represented the readers of the BMJ. They were, as I’ve said, mostly practising doctors — in touch with the realities and brutalities of daily practice in a way that BMJ editors were not. (Ironically the most common criticism of the BMJ was that it didn’t publish enough for hospital doctors and yet most of our hangers were hospital doctors.)

Arguments against hanging committees were that they were expensive, inevitably slowed decision-making and were inconsistent, even quixotic, in their decision-making — not least because the membership constantly varied. There was also something odd about editors abrogating their responsibility for their content of the journal. For whatever reason, Fiona Godlee, my successor as editor of the BMJ, decided to end the system of hanging committees.

Very few papers were published in the BMJ without extensive revision and a common decision of the hanging committee was to reject a paper but offer to see it again if it was heavily revised. We allowed appeals against our decision to reject and many appeals succeeded. We did not, however, allow second appeals. We realized that almost all such appeals ended in tears. Either the authors were fed up with rejection after a long process or we were unhappy because we’d agreed to publish a paper that we didn’t really like. If we were the only journal in the world this would have been unacceptable, but luckily for everybody we weren’t.

If we rejected a paper without sending it out for external review then we aimed to do so within two weeks — and we mostly succeeded. If the paper was sent out for review, then we aimed for a decision within eight weeks. We met this target about three-quarters of the time, but I have to confess that there were papers where the process became inordinately complex and prolonged. Many might think the system horribly slow, but the BMJ had a reputation for being fairly fast compared with many other journals. Nevertheless, I know of at least one journal — Cardiovascular Research — that with the arrival of a new editor went from taking months to make decisions to making every decision within three weeks. Old systems can be reinvented.

Like various other journals the BMJ had a fast track system, whereby we made a decision and published within four weeks — compared with the usual two months to make a decision and another three months to publish. There is something silly about fast tracking in that the time between a study being conceived and submitted is usually several years and then the time between publication of a paper and doctors changing their practice is also years. So why bother with shortening this process of nearly a decade by four months? The answer is in an attempt to attract the best studies. I don’t think that the BMJ succeeded, but it has seemed to work better for the Lancet, which did it first.

Chris Martyn, an associate editor of the BMJ and the editor of the Quarterly Journal of Medicine (which to everybody’s delight is published monthly), has written a witty essay arguing the case for slow tracking (123). The short-term gain from fast tracking is quickly obliterated as everybody does it, and most human activities — particularly cooking and making love — benefit from slowness not speed. The same might well be true of peer review, and the most recent example of a spectacular failure of peer review — Science‘s publication of the fraudulent stem cell research study (124) — was associated with the journal reviewing the paper in half the time it usually took (125).

I ought to make clear that I do not regard publication as the end of the peer review process. Rather it is part of the process. Many studies that have made it through the peer review process to publication are demolished once exposed to hundreds of thousands of readers.

As part of an attempt to improve the BMJ‘s system of peer review we drew a diagram of the process. The resulting flow chart was some five feet long and full of loops and swirls. What I have described above is a simplified version of the process. We imagined that some part of the process must be redundant, but — even with the help of outside consultants — we couldn’t identify a piece that we could delete. Can it really be that this degree of elaboration is justified?

But does peer review ‘work’ at all? A systematic review of all the available evidence on peer review concluded that ‘the practice of peer review is based on faith in its effects, rather than on facts’ (126). But the answer to the question on whether or not peer review works depends on the question, ‘What is peer review for?’

One answer is that it’s a method to select the best grant applications for funding and the best papers to publish in a journal. It’s hard to test this aim because there is no agreed definition of what constitutes a good paper or a good research proposal. Plus what is peer review to be tested against? Chance, or a much simpler process? Stephen Lock when editor of the BMJ conducted a study in which he decided alone which of a consecutive series of papers submitted to the journal he would publish. He then let the papers go through the usual process. There was little difference between the papers he chose and those selected after the full process of peer review (121). This small study suggests that perhaps you don’t need an elaborate process. Maybe a lone editor, thoroughly familiar with what the journal wants and knowledgeable about research methods, would be enough. But it would be a bold journal that stepped aside from the sacred path of peer review.

Another answer to the question of what peer review is for is that it is to improve the quality of papers that are published or research proposals that are funded. The systematic review found little evidence to support this, but again such studies are hampered by the lack of an agreed definition of a good study or a good research proposal.

Peer review might also be useful for detecting errors or fraud. At the BMJ we did several studies where we inserted major errors into papers that we then sent to many reviewers (127, 128). Nobody ever spotted all of the errors. Some reviewers didn’t spot any, and most reviewers spotted only about one-quarter. Peer review sometimes picks up fraud by chance, but generally it is not a reliable method for detecting fraud because it works on trust. A major question, which I will return to, is whether or not peer review and journals should cease to work on trust.

So we have little evidence on the effectiveness of peer review, but we have considerable evidence on its defects. In addition to being poor at detecting gross defects and almost useless for detecting fraud, it is slow, expensive, profligate of academic time, highly subjective, something of a lottery, prone to bias and easily abused.

The slowness of the BMJ‘s system I’ve already described, and many journals and bodies conducting peer review processes are much slower. Many journals, even in the age of the internet, take more than a year to review and publish a paper. It’s hard to get good data on the cost of peer review, particularly because reviewers are often not paid (the same, come to that, is true of many editors). Yet there is a substantial ‘opportunity cost’, as economists call it, in that the time spent reviewing could be spent doing something more productive — like original research. (Some high-quality researchers recognize this problem and refuse to review.) I estimate that the cost of peer review per paper for the BMJ (remembering that the journal rejected 60% without external review) was of the order of £I00, whereas the cost of a paper that made it right through the system was closer to £1000.

The cost of peer review has become important because of the open access movement, which hopes to make research freely available to everybody. With the current publishing model peer review is usually ‘free’ to authors, and publishers make their money by charging institutions to access the material. One open access model is that authors will pay for peer review and the cost of posting their article on a website. So those offering or proposing this system have had to come up with a figure — which is currently between $500 and $2500 per article. Those promoting the open access system calculate that at the moment the academic community pays about $5000 for access to a peer reviewed paper. (The $5000 is obviously paying for much more than peer review: it includes other editorial costs, distribution costs [expensive with paper] and a big chunk of profit for the publisher.) So there may be substantial financial gains to be had by academics if the model for publishing science changes.

There is an obvious irony in people charging for a process that is not proved to be effective, but that’s how much the scientific community values its faith in peer review. People have a great many fantasies about peer review, and one of the most powerful is that it’s a highly objective, reliable and consistent process. I regularly received letters from authors who were upset that the BMJ had rejected their paper and then published what they thought to be a much inferior paper on the same subject. Always they saw something underhand. They found it hard to accept that peer review is a subjective and therefore inconsistent process. But it is probably unreasonable to expect it to be objective and consistent. If I ask people to rank painters like Titian, Tintoretto, Bellini, Carpaccio and Veronese I would never expect them to come up with the same order. A scientific study submitted to a medical journal may not be as complex a work as a Tintoretto altarpiece, but it is complex. Inevitably people will take different views on its strengths, weaknesses and importance.

So the evidence is that if reviewers are asked to give an opinion on whether or not a paper should be published, they agree only slightly more than they would be expected to agree by chance. (I’m conscious that this evidence conflicts with the study of Stephen Lock showing that he alone and the whole BMJ peer review process tended to reach the same decision on which papers should be published. The explanation may be that by being the editor who had designed the BMJ process and appointed the editors and reviewers, it wasn’t surprising that they were fashioned in his image and made similar decisions.)

Sometimes the inconsistency can be laughable. Here is an example of two reviewers commenting on the same papers.

Reviewer A: ‘I found this paper an extremely muddled paper with a large number of deficits.’

Reviewer B: ‘It is written in a clear style and would be understood by any reader.’

This — perhaps inevitable — inconsistency can make peer review something of a lottery. You submit a study to a journal. It enters a system that is effectively a black box, and then a more or less sensible answer comes out at the other end. The black box is like the roulette wheel, and the prizes and the losses can be big. For an academic, publication in a major journal like Nature or Cell is to win the jackpot.

The evidence on whether there is bias in peer review against certain sorts of authors is conflicting, but there is strong evidence of bias against women in the process of awarding grants. The most famous piece of evidence on bias against authors comes from a study by Peters and Ceci (129). They took 12 studies that came from prestigious institutions and had already been published in psychology journals. They retyped the papers, made minor changes to the titles, abstracts and introductions, but changed the authors’ names and institutions. They invented institutions with names like the Tri-Valley Center for Human Potential. The papers were then resubmitted to the journals that had first published them. In only three cases did the journals realize that they had already published the paper, and eight of the remaining nine were rejected — not because of lack of originality but because of poor quality. Peters and Ceci concluded that this was evidence of bias against authors from less prestigious institutions.

This is known as the Mathew effect: ‘To those who have, shall be given; to those who have not shall be taken away even the little that they have.’ I remember feeling the effect strongly when as a young editor I had to consider a paper submitted to the BMJ by Karl Popper’ (130). I was unimpressed and thought we should reject the paper. But we couldn’t. The power of the name was too strong. So we published, and time has shown we were right to do so. The paper argued that we should pay much more attention to error in medicine, about 20 years before many papers appeared arguing the same.

The editorial peer review process has been strongly biased against ‘negative studies’ — studies that find an intervention doesn’t work. It’s also clear that authors often don’t even bother to write up such studies. This matters because it biases the information base of medicine. It’s easy to see why journals would be biased against negative studies. Journalistic values come into play. Who wants to read that a new treatment doesn’t work? That’s boring.

We became very conscious of this bias at the BMJ, and we always tried to concentrate not on the results of a study we were considering but on the question it was asking. If the question is important and the answer valid, then it mustn’t matter whether the answer is positive or negative. I fear, however, that bias is not so easily abolished and persists.

The Lancet has tried to get round the problem by agreeing to consider the protocols (plans) for studies yet to be done (131). If it thinks the protocol sound and if the protocol is followed, then the Lancet will publish the final results regardless of whether they are positive or negative. Such a system also has the advantage of stopping resources being spent on poor studies. The main disadvantage is that it increases the sum of peer reviewing — because most protocols will need to be reviewed in order to get funding to perform the study.

There are several ways to abuse the process of peer review. You can steal ideas and present them as your own or produce an unjustly harsh review to block or at least slow down the publication of the ideas of a competitor. These have all happened. Drummond Rennie tells the story of a paper he sent, when deputy editor of the New England Journal of Medicine, for review to Vijay Soman (132). Having produced a critical review of the paper, Soman copied some of the paragraphs and submitted it to another journal, the American Journal of Medicine. This journal, by coincidence, sent it for review to the boss of the author of the plagiarized paper. She realized that she had been plagiarized and objected strongly. She threatened to denounce Soman but was advised against it. Eventually, however, Soman was discovered to have invented data and patients and left the country. Rennie learnt a lesson that he never subsequently forgot but which medical authorities seem reluctant to accept: those who behave dishonestly in one way are likely to do so in other ways as well.

The most important question with peer review is not whether to abandon it but how to improve it. Many ideas have been advanced to do so, and an increasing number have been tested experimentally. The options include standardizing procedures, opening up the process, blinding reviewers to the identity of authors, reviewing protocols, training reviewers, being more rigorous in selecting and deselecting reviewers, using electronic review, rewarding reviewers, providing detailed feedback to reviewers, using more checklists or creating professional review agencies.

I hope that it won’t seem too indulgent if I describe the far from finished journey of the BMJ to try and improve peer review. We tried as we went to conduct experiments rather than simply introduce changes.

The most important step on the journey was realizing that peer review could be studied just like anything else. This was the idea of Stephen Lock, together with Drummond Rennie and John Bailar. At the time it was a radical idea, and still seems radical to some — rather like conducting experiments with God or love.

The next important step was prompted by hearing the results of a randomized trial that showed that blinding reviewers to the identity of authors improved the quality of reviews (as measured by a validated instrument) (133). This trial, which was conducted by Bob McNutt, A T Evans, and Bob and Suzanne Fletcher, was important not only for its results but because it provided an experimental design for investigating peer review. Studies where you intervene and experiment allow more confident conclusions than studies you observe without intervening.

This trial was repeated on a larger scale by the BMJ and by a group in the United States who conducted the study in many different journals (134, 135). Neither study found that blinding reviewers improved the quality of reviews. These studies also showed that such blinding is difficult to achieve (because many studies include internal clues on authorship) and that reviewers could identify the authors in about one-quarter to one- third of cases. But even when the results were analysed by looking at only those cases where blinding was successful, there was no evidence of improved quality of the review.

At this point we at the BMJ thought that we would change direction dramatically and begin to open up the process. In the early 1990s the journal Cardiovascular Research published a paper arguing for open peer review and invited commentaries from several editors, including Stephen Lock, Drummond Rennie and me (136, 137).

Interestingly, we all concluded that the case for open review was strong. The main arguments for open review are justice, accountability and credit. Judgement by an invisible judge is ominous, and yet that is what happens with closed review. Reviewers should be fully accountable for what they say, but they should also receive credit. Both accountability and credit are severely limited when reviewers are known only to editors. Open reviewing should also reduce abuses of the system and make it more polite and constructive. Plus why should the burden of proof rest with those who want to open up the system rather than those who want to keep it closed? We live in a world where, whether we like it or not, what is not open is assumed to be biased, corrupt or incompetent until proved otherwise.

The main argument against open review is that it will make it difficult for junior researchers (often the best reviewers) to criticize the work of senior researchers. It might also lead to reviewers holding back their criticisms or create resentment and animosity when they let rip. Another argument is the classic ‘if it ain’t broke don’t fix it’, but I hope that most readers will agree that peer review is sufficiently broke for us to need to look for methods of improvement.

At the BMJ we began our pursuit of greater openness by conducting a randomized trial of open review (meaning that the authors but not the readers knew the identity of the reviewers) against traditional review (131). It had no effect on the quality of reviewers’ opinions. They were neither better nor worse. Because of the strong ethical arguments in favour of open review we thus went ahead and introduced a system of authors knowing the names of reviewers.

Other editors often asked about our experience, and my simple answer was, ‘the earth hasn’t moved’. Most reviewers were happy to put their names to reviews, and few problems have arisen — after three years of open review. Before we introduced the system we were challenged by Simon Wessely — a professor of psychiatry and a researcher who has conducted research into peer review — that we were irresponsible in that we were introducing a new ‘drug’ without any system of monitoring for adverse effects. We thus copied Britain’s ‘yellow card scheme’ that asks every doctor to notify the authorities of possible adverse effects of drugs. All BMJ reviewers used to receive yellow forms asking them to notify us of any problems that they experienced through reviewing openly. We had a few yellow forms returned, but none told us of a serious problem. The most common occurrence (not really a problem) was that we discovered that reviewers had conflicts of interest that they haven’t declared. Most journals have not, however, adopted open peer review.

Our next step was to conduct a trial of our current open system against a system whereby every document associated with peer review together with the names of everybody involved was posted on the BMJ‘s website when the paper was published. Once again this intervention had no effect on the quality of the opinion. We thus planned to make posting peer review documents the next stage in opening up our peer review process, but that hasn’t yet happened — partly because the results of the trial have not yet been published and partly because this step required various technical developments.

The final step was in my mind to open up the whole process and conduct it in realtime on the web in front of the eyes of anybody interested. Peer review would then be transformed from a black box into an open scientific discourse. Often I found the discourse around a study was a lot more interesting than the study itself. Now that I’ve left I’m not sure if this system will be introduced.

The BMJ also experimented with another possible way to improve peer review — by training reviewers (121). It is perhaps extraordinary that there has been no formal training for such an important job. Reviewers learn either by trial and error (without, it has to be said, very good feedback) or by working with an experienced reviewer (who might unfortunately be experienced but not very good).

Our randomized trial of training reviewers had three arms: one group got nothing; one group had a day’s face to face training plus a CD-ROM of the training; and the third group got just the CD-ROM. The face to face training comprised half a day from editors on what we wanted from reviewers and half a day on critically appraising randomized trials. Our trial dealt only with randomized trials and we admitted to the study only reviewers from Britain who had reviewed for the BMJ in the past year. All the reviewers were sent the same study to review before we did anything. The study had deliberate errors inserted. Everybody then received further papers with errors one month and six months after the training. We used our standardized instrument to measure the quality of the reviews and we counted the number of errors spotted.

The overall result was that training made little difference. The groups that had training did show some evidence of improvement relative to those who had no training, but we didn’t think that the difference was big enough to be meaningful. We can’t conclude from this that longer or better training would not be helpful. A problem with our study was that most of the reviewers had been reviewing for a long time. ‘Old dogs cannot be taught new tricks’, but the possibility remains that younger ones could. Most of those who had the face to face training both liked it and thought that it would improve their ability to review. The BMJ thus does offer such training, thinking of it more as a reward than something that will improve the quality of review.

One difficult question is whether or not peer review should continue to operate on trust. Some have made small steps beyond into the world of audit. The Food and Drug Administration in the United States reserves the right to go and look at the records and raw data of those who produce studies that are used in applications for new drugs to receive licences. Sometimes it does so. Some journals, including the BMJ, make it a condition of submission that the editors can ask for the raw data behind a study. We did so once or twice, only to discover that reviewing raw data is difficult, expensive and time consuming. I cannot see journals moving beyond trust in any major way unless the whole scientific enterprise were to move in that direction.

So peer review is a flawed process full of easily identified defects with little evidence that it works. Nevertheless, it is likely to remain central to science and journals because there is no obvious alternative and scientists and editors have a continuing belief in peer review. How odd that science should be rooted in belief.

References

Wakefield AJ, Murch SH, Linnell AAJ et al. Ileal-lymphoid-nodular hyperplasia, non-specific colitis and pervasive developmental disorder in children. Lancet 1998;351:637-41.
Laumann E, Paik A, Rosen R. Sexual dysfunction in the United States: prevalence and predictors. JAMA 1999;281:537-44 (published erratum appears in JAMA 1999;281:1174).
Moynihan R. The making of a disease: female sexual dysfunction. BMJ 2003;326:45-7.
Hudson A, Mclellan F. Ethical issues in biomedical publication. Baltimore: Johns Hopkins University Press, 2000.
Sackett DL, Haynes RB, Guyatt GH, Tugwell P. Clinical epidemiology: a basic science for clinical medicine. London: Little, Brown, 1991.
Haynes RB. Where’s the meat in clinical journals? ACP Journal Club 1993;119:A23-4.
Altman DG. The scandal of poor medical research. BMJ 1994;308:283-4.
Shaughnessy AF, Slawson DC, Bennett JH. Becoming an information master: a guidebook to the medical information jungle. J Fam Pract 1994;39:489-99.
Bartrip P. Mirror of medicine: a history of the BMJ. Oxford: British Medical Journal and Oxford University Press, 1990.
Chen RT, DeStefano F. Vaccine adverse events: causal or coincidental? Lancet 1998;351:611-12.
Pobel D, Vial JF. Case-control study of leukaemia among young people near La Hague nuclear reprocessing plant: the environmental hypothesis revisited. BMJ 1997;314:101.
Horton R. A statement by the editors of the Lancet. Lancet 2004;363:820-1.
Murch SH, Anthony A, Casson DH et al. Retraction of an interpretation. Lancet 2004;363:750.
Smith R. The discomfort of patient power. BMJ 2002;324:497-8.
Antithrombotic Trialists’ Collaboration. Collaborative meta-analysis of randomised trials of antiplatelet therapy for prevention of death, myocardial infarction and stroke in high risk patients. BMJ 2002;324:71-86.
Cleland JGF. For debate: Preventing atherosclerotic events with aspirin. BMJ 2002;324:103-5.
Bagenal FS, Easton OF, Harris E et al. Survival of patients with breast cancer attending Bristol Cancer Help Centre. Lancet 1990;336:606-10.
Fox R. Quoted in: Smith R. Charity Commission censures British cancer charities. BMJ 1994;308:155-6.
Richards T. Death from complementary medicine. BMJ 1990;301:510.
Goodare H. The scandal of poor medical research: sloppy use of literature often to blame. BMJ 1994;308:593.
Bodmer W. Bristol Cancer Help Centre. Lancet 1990;336:1188.
Budd JM, Sievert ME, Schultz TR. Phenomena of retraction. Reasons for retraction and citations to the publications. JAMA 1998;280:296-7.
McVie G. Quoted in: Smith R. Charity Commission censures British cancer charities. BMJ 1994;308:155-6.
Smith R. Charity Commission censures British cancer charities. BMJ 1994;308:155-6.
Feachem RGA, Sekhri NK, White KL. Getting more for their dollar: a comparison of the NHS with California’s Kaiser Permanente. BMJ 2002;324:135-41.
Himmelstein DU, Woolhandler S, David OS et al. Getting more for their dollar: Kaiser v the NHS. BMJ 2002;324:1332.
Talbot-Smith A, Gnani S, Pollock A, Pereira Gray D. Questioning the daims from Kaiser. Br J Gen Pract 2004;54:415-21.
Ham C, York N, Sutch S, Shaw A. Hospital bed utilisation in the NHS, Kaiser Permanente, and the US Medicare programme: analysis of routine data. BMJ 2003;327:1257-61.
Sanders SA, Reinisch JM. Would you say you ‘had sex’ If…? JAMA 1999;281:275-7.
Anonymous. lfs over, Debbie. JAMA 1988;259:272.
Lundberg G. ‘lfs over, Debbie,’ and the euthanasia debate. JAMA 1988;259:2142-3.
Smith A. Euthanasia: time for a royal commission. BMJ 1992;305:728-9.
Doyal L, Doyal L. Why active euthanasia and physician assisted suicide should be legalised. BMJ 2001;323:1079-80.
Emanuel EJ. Euthanasia: where The Netherlands leads will the world follow? BMJ 2001;322:1376-7.
Angell M. The Supreme Court and physician-assisted suicide-the ultimate right N Eng J Med 1997;336:50-3.
Marshall VM. lfs almost over — more letters on Debbie. JAMA 1988;260:787.
Smith A. Cheating at medical school. BMJ 2000;321:398.
Davies S. Cheating at medical school. Summary of rapid responses. BMJ 2001;322:299.
Ewen SWB, Pusztai A. Effects of diets containing genetically modified potatoes expressing Galanthus nivalis lactin on rat small intestine. Lancet 1999;354:1353-4.
Horton A. Genetically modified foods: ‘absurd’ concern or welcome dialogue? Lancet 1999;354:1314-15.
Kuiper HA, Noteborn HPJM, Peijnenburg AACM. Adequacy of methods for testing the safety of genetically modified foods. Lancet 1999;354:1315.
Bombardier C, Laine L, Reicin A et al. Comparison of upper gastrointestinal toxicity of rofecoxib and naproxen in patients with rheumatoid arthritis. N Eng J Med 2000;343:1520-8.
Curfman GO, Morrissay S, Drazen JM. Expression of concern: Bombardier et al., ‘Comparison of Upper Gastrointestinal Toxicity of Rofecoxib and Naproxen in Patients with Rheumatoid Arthritis.’ N Eng J Med 2000;343:1520-8. N Eng J Med 2005;353:2813-4.
Curfman GO, Morrissey S, Drazen JM. Expression of concern reaffirmed. N Eng J Med 2006;354: 1193.
Laumann E, Paik A, Rosen A. Sexual dysfunction in the United States: prevalence and predictors. JAMA 1999;281:537-44 (published erratum appears in JAMA 1999;281:1174).
Smith A. In search of ‘non-disease.’ BMJ 2002;324:883-5.
Hughes C. BMJ admits ‘lapses’ after article wiped £30m off Scotia shares. Independent 10 June 2000.
Hettiaratchy S, Clarke J, Taubel J, Besa C. Bums after photodynamic therapy. BMJ 2000;320:1245.
Bryce A. Bums after photodynamic therapy. Drug point gives misleading impression of incidence of bums with temoporfin (Foscan). BMJ 2000;320:1731.
Richmond C. David Horrobin. BMJ 2003;326:885.
Enstrom JE, Kabat GC. Environmental tobacco smoke and tobacco related mortality in a prospective study of Californians, 1960-98. BMJ 2003;326:1057-60.
Roberts J, Smith A. Publishing research supported by the tobacco industry. BMJ 1996;312:133-4.
Lefanu WR. British periodicals of medicine 1640-1899. London: Wellcome Unit for the History of Medicine, 1984.
Squire Sprigge S. The life and times of Thomas Wakley. London: Longmans, 1897.
Bartrip PWJ. Themselves writ large: the BMA 183~1966. London: BMJ Books, 1996.
Delamothe T. How political should a general medical journal be? BMJ 2002;325:1431-2.
Gedalia A. Political motivation of a medical joumal [electronic response to Halileh and Hartling. Israeli-Palestinian conflict]. BMJ 2002. http:/lbmj.com/cgi/eletters/324173331361#20289 (accessed 10 Dec 2002).
Marchetti P. How political should a general medical journal be? Medical journal is no place for politics. BMJ 2003;326:1431-32.
Roberts I. The second gasoline war and how we can prevent the third. BMJ 2003;326:171.
Roberts IG. How political should a general medical journal be? Medical journals may have had role in justifying war. BMJ 2003;326:820.
Institute of Medicine. Crossing the quality chasm. Anew health system for the 21st century. Washington: National Academy Press, 2001.
Oxman AD, Thomson MA, Davis DA, Haynes RB. No magic bullets: a systematic review of 102 trials of interventions to improve professional practice. Can Med Assoc J 1995;153:1423-31.
Grimshaw JM, Russell IT. Effect of clinical guidelines on medical practice: a systematic review of rigorous evaluations. Lancet 1993;342:1317-22.
Grol R. Beliefs and evidence in changing clinical practice. BMJ 1997;315:418-21.
Smith R. What clinical information do doctors need? BMJ 1996;313:1062-8.
Godlee F, Smith A, Goldman D. Clinical evidence. BMJ 1999;318:1570-1.
Smith R. The BMJ: moving on. BMJ 2002;324:5-6.
Milton J. Aeropagitica. World Wide Web: Amazon Press (digital download), 2003.
Coulter A. The autonomous patient ending paternalism in medical care. London: Stationery Office Books, 2002.
Muir Gray JA. The resourceful patient. Oxford: Rosetta Press, 2001.
World Health Organization. Macroeconomics and health: investing in health for economic development. Report of the commission on macroeconomics and health. Geneva: WHO, 2001.
Mullner M, Groves T. Making research papers in the BMJ more accessible. BMJ 2002;325:456.
Godlee F, Jefferson T, eds. Peer review in health sciences, 2nd edn. London: BMJ Books, 2003.
Reiman AS. Dealing with conflicts of interest. N Eng J Med 1984;310:1182-3.
Hall D. Child protection: lessons from Victoria Climbié. BMJ 2003;326:293-4.
McCombs ME, Shaw DL. The agenda setting function of mass media. Public Opin Q 1972;36:176-87.
McCombs ME, Shaw DL. The evolution of agenda-setting research: twenty five years in the marketplace of ideas. J Commun 1993;43:58-67.
Edelstein L. The Hippocratic oath: text, translation, and interpretation. Baltimore: Johns Hopkins Press, 1943.
www.pbs.org/wgbhlnova/doctors/oath_modem.html (accessed 8 June 2003).
Weatherall DJ. The inhumanity of medicine. BMJ 1994;309:1671-2.
Smith R. Publishing information about patients. BMJ 1995;311:1240-1.
Smith R. Informed consent: edging forwards (and backwards). BMJ 1998;316:949-51 .
Caiman K. The profession of medicine. BMJ 1994;309:1140-3.
Smith R. Medicine’s core values. BMJ 1994;309:1247-8.
Smith R. Misconduct in research: editors respond. BMJ 1997;315:201-2.
McCall Smith A, Tonks A, Smith R. An ethics committee for the BMJ. BMJ 2000;321:720.
Smith R. Medical editor lambasts journals and editors. BMJ 2001;323:651.
Smith R, Rennie D. And now, evidence based editing. BMJ 1995;311:826.
Weeks WB, Wallace AE. Readability of British and American medical prose at the start of the 21st century. BMJ 2002;325:1451-2.
O’Donnell M. Evidence-based illiteracy: time to rescue ‘the literature’. Lancet 2000;355:489-91 .
O’Donnell M. The toxic effect of language on medicine. J R Coli Physicians Lond 1995;29:525-9.
Berwick D, Davidoff F, Hiatt H, Smith A. Refining and implementing the Tavistock principles for everybody in health care. BMJ 2001;323:616-20.
Gaylin W. Faulty diagnosis. Why Clinton’s health-care plan won’t cure what ails us. Harpers 1993;October:57-64.
Davidoff F. Reinecke RD. The 28th Amendment. Ann Intern Med 1999;130:692-4.
Davies S. Obituary for David Horrobin: summary of rapid responses. BMJ 2003;326: 1089.
Butler D. Medical journal under attack as dissenters seize AIDS platform. Nature 2003;426:215.
Smith A. Milton and Galileo would back BMJ on free speech. Nature 2004;427:287.
Carr EH. What is histoty? Harmondsworth: Penguin, 1990.
PopperK. The logic of scientific discovery. London: Routledge, 2002.
Kuhn T. The structure of scientific revolutions. London: Routledge, 1996.
www.guardian.co.uklnewsroomlstory/0,11718,850815,00.html (accessed 14 June 2003).
Davies S, Delamothe T. Revitalising rapid responses. BMJ 2005;330:1284.
Morton V, Torgerson OJ. Effect of regression to the mean on decision making in health care. BMJ 2003;326:1 083-4.
Horton R. Surgical research or comic opera: questions, but few answers. Lancet 1996;347:984-5.
Pitches D, Burls A, Fry-Smith A. How to make a silk purse from a sow’s ear — a comprehensive review of strategies to optimise data for corrupt managers and incompetent clinicians. BMJ 2003;327:1436-9.
Poloniecki J. Half of all doctors are below average. BMJ 1998;316:1734-6.
Writing group for the Women’s Health Initiative Investigators. Risks and benefits of estrogen plus progestin in healthy postmenopausal women. JAMA 2002;288:321-33.
Shumaker SA, Legauh C, Thai l et al. Estrogen plus progestin and the incidence of dementia and mild cognitive impairment in postmenopausal women: the Women’s Health Initiative Memory Study: a randomized controlled trial. JAMA 2003;289:2651-62.
Yusuf S, Collins R, Peto R. Why do we need some large, simple randomized trials? Stat Med 1984;3:409-22.
Leibovici L. Effects of remote, retroactive intercessory prayer on outcomes in patients with bloodstream infection: randomised controlled trial. BMJ 2001;323:1450-1.
Haynes RB, McKibbon A, Kanani R. Systematic review of randomised trials of interventions to assist patients to follow prescriptions for medications. Lancet 1996;348:383-6.
Schulz KF, Chalmers I, Hayes RJ, Altman DG. Empirical evidence of bias. Dimensions of methodological quality associated with estimates of treatment effects in controlled trials. JAMA 1995;273:408-12.
Altman DG, Schulz KF, Moher D et a/., for the CONSORT Group. The revised CONSORT statement for reporting randomized trials: explanation and elaboration. Ann Intern Med 2001;134:663-94.
Moher D, Jones A, Lepage L; CONSORT Group (Consolitdated Standards for Reporting of Trials). Use of the CONSORT statement and quality of reports of randomized trials: a comparative before-and-after evaluation. JAMA 2001;285:1992-5.
Garattini S, Bertele V, U Bassi L. How can research ethics committees protect patients better? BMJ 2003;326:1199-201.
Sackett Dl, Oxman AD. HARLOT pic: an amalgamation of the world’s two oldest professions. BMJ 2003;327:1442-5.
loannidis JPA. Why most published research findings are false. PLoS Med 2005;2:e124.
Greenhalgh T. How to read a paper. London: BMJ Books, 1997.
Sterne JAC, Davey Smith G. Sifting the evidence: what’s wrong with significance tests? BMJ 2001 ;322:226-31.
Le Fanu J. The rise and fall of modem medicine. New York: Little, Brown, 1999.
Lock S. A difficult balance: editorial peer review in medicine. London: Nuffield Provincials Hospital Trust, 1985.
Rennie D. Guarding the guardians: a conference on editorial peer review. JAMA 1986;256:2391-2.
Martyn C. Slow tracking for BMJ papers. BMJ 2005;331:1551-2.
Hwang WS, Roh Sl, Lee BC eta/. Patient-specific embryonic stem cells derived from human SCNT blastocysts. Science 2005;308:1777-83.
Nonnile D, Vogel G, Holden C. Stem cells: cloning researcher says work is flawed but claims results stand. Science 2005;310:1886-7.
Jefferson T, Alderson P, Wager E, Davidoff F. Effects of editorial peer review: a systematic review. JAMA 2002;287:2784-6.
Godlee F, Gale CR, Martyn CN. Effect on the quality of peer review of blinding reviewers and asking them to sign their reports: a randomized controlled trial. JAMA 1998;280:237-40.
Schroter S, Black N, Evans S et al. Effects of training on quality of peer review: randomised controlled trial. BMJ 2004;328:673.
Peters D, Ceci S. Peer-review practices of psychological journals: the fate of submitted articles, submitted again. Behav Brain Sci 1982;5:187-255.
Mcintyre N, Popper K. The critical attitude in medicine: the need for a new ethics. BMJ 1983;287:1919-23.
Horton R. Pardonable revisions and protocol reviews. Lancet 1997;349:6.
Rennie D. Misconduct and journal peer review. In: Godlee F, Jefferson T, eds. Peer review in health sciences. London: BMJ Books, 1999.
McNutt RA, Evans AT, Fletcher AH, Fletcher SW. The effects of blinding on the quality of peer review. A randomized trial. JAMA 1990;263:1371-6.
Justice AC, Cho MK, Winker MA, Berlin JA, Rennie D, the PEER investigators. Does masking author identity improve peer review quality: a randomized controlled trial. JAMA 1998;280:240-2.
van Rooyen S, Godlee F, Evans S et al. Effect of blinding and unmasking on the quality of peer review: a randomized trial. JAMA 1998;280:234-7.
Fabiato A. Anonymity of reviewers. Cardiovasc Res 1994;28:1134-9.
Fletcher RH, Aetcher SW, Fox R et al. Anonymity of reviewers. Cardiovasc Res 1994;28:1340-5.
van Rooyen S, Godlee F, Evans S et al. Effect of open peer review on quality of reviews and on reviewers’ recommendations: a randomised trial. BMJ 1999;18:23-7.

More From The Trouble With Medical Journals

References