This Fall, a cover of The Economist boldly asserted that scientific research is all but broken. Their main assertions – that negative findings go underreported, study replication is rare, and dubious findings go unchallenged – are significant and damning.
A recent Economist cover story declared that laboratory science is all but broken. But by ignoring recent positive strides in research, we risk throwing the baby out with the bathwater.
This Fall, a cover of The Economist boldly asserted that scientific research is all but broken. Their main assertions – that negative findings go underreported, study replication is rare, and dubious findings go unchallenged – are significant and damning. Similar criticisms can be levied at medical research, and indeed have been written about in these very pages concerning the research flaws surrounding the use of thrombolytics in stroke. But there is more to the story. We’ll take a close look at the flaws in the medical research system – including a couple items missed by The Economist – while also highlighting the legitimate challenges at play, and the groups taking steps to improve the research establishment.
Publication Bias
It may be surprising to many outside of the research community that a large proportion of findings remain unpublished. This concept, termed “publication bias,” distorts our understanding of treatment efficacy, as studies with negative findings are both less likely to be published than studies with positive results, and often see longer delays in publication. A Cochrane review found the odds of a study with positive results being published to be almost 4 times that of a study with negative or non-significant results. When negative trials are published, the time from study completion to publication is delayed an average of 2 years longer than trials with positive results (Ioannidis 1998). This difference results from both delays in study submission, and delays in the time from submission to publication.
Several factors lead to such disparity, not least of which is industry funding. Companies that develop drugs and biomedical devices may be understandably averse to publishing the results of studies that show no improvement in outcomes, or worse outcomes, when their medication or device is used. While there is a tendency to blame the editorial process, once a study has been submitted the direction of its results do not affect the likelihood of publication (Olson 2002). Researchers themselves shoulder some of the responsibility, as studies with significant outcomes are nearly 50% more likely to be submitted for publication than studies with non-significant results (Stern 1997). Two of the primary reasons researchers cite for lack of publication include concerns that the results are unimportant and fear that their manuscript with be rejected. Researchers may in part feel that they will be less likely to further their career by publishing research that is not “innovative” and does not challenge current practice.
Selective Reporting
Rather than forego publication completely when no improvement in the intended outcome is observed, researchers sometimes choose to publish only those outcomes that support the therapy being studied. Such “selective reporting” is frequently, though not exclusively, observed in industry-funded research, and leads to misconceptions about therapeutic efficacy. One study from Denmark compared 102 publications to their research protocols. The majority of these publications failed to report all outcomes assessed, with statistically significant outcomes being more likely to be reported. More importantly, the primary outcome published differed from that in the protocol in over 60% of cases.
Peer Review Problems
Also in the crosshairs is the hallowed peer review process, which was designed to ensure the validity and quality of publications, and is the cornerstone of the medical literature. Studies on peer review have shown that errors are frequently overlooked during the process. In one study, a manuscript was altered and 8 “areas of weakness” were introduced; peer reviewers caught an average of only 2 of these weaknesses (Godlee 1998). In another paper, 9 “major” deliberate errors were placed in a manuscript; reviewers on average noted less than 3 of these (Schroter 2004). Unfortunately, training workshops have not been found to improve the quality of peer review (Callaham 1998, Callaham 2002), while the use of checklists has had variable results (Gardner 1990, Jefferson 1998).
Underpowered Studies
Though not mentioned in the Economist article, the performance of underpowered studies is also problematic. Without adequate sample size to detect statistically significant differences in outcomes, when they exist, or to assess diagnostic accuracy with ample precision to be clinically useful, these studies cost researchers and institutions time and money often better spent elsewhere. Such research instead exposes its subjects to potentially harmful, unproven therapy or diagnostic testing with limited clinical value (Halpern 2002). While there are situations in which small studies play an important role (e.g. interventions in rare diseases and early phase “pilot” studies), nearly 60% of trials registered with clinicaltrials.gov between 2000-2010 enrolled less than 100 subjects. This percent, unfortunately, did not change significantly during that time period.
Steps in the Right Direction
#1 Public Registration of Research
Despite the many problems documented here and in the Economist, changes continue to be made in the research and publication process to improve transparency and reproducibility. This begins with pre-trial registration. In 2000, the National Institutes of Health established clinicaltrials.gov to provide a venue for such reporting. This website allows authors to detail study methodology, justify the planned sample size, and identify primary and secondary outcomes prior to data collection. Having these aspects of a study pre-registered allows for better transparency when a study is changed partway through. Trial registration may also alleviate some of the burden of publication bias, demonstrated by an article published in the Public Library of Science demonstrating that nearly half of the trials registered at clinicaltrials.gov remained unpublished. Among those trials that had been published, nearly three quarters had registered complete results of adverse events, compared to around half of corresponding journal publications (Riveros 2013). The number of studies registered has grown since the site’s inception, and the percentage of trials registered prior to beginning enrollment has risen by 50% in recent years (Califf 2012). The International Committee of Medical Journal Editors (ICMJE), which includes among its members a number of high-impact emergency medicine journals, has called on its members to require “as a condition of consideration for publication in their journals, registration in a public trials registry.”
#2 Accurate Reporting of Findings
Equally important is the adequate reporting of research findings after study completion. Several groups promote the reproducibility of research through transparent reporting, including the CONSORT (CONsolidated Standards Of Reporting Trials) group, which seeks to improve reporting in randomized controlled trials, STROBE (STrengthening the Reporting of OBservational studies in Epidemiology), and the STARD (STandards for the Reporting of Diagnostic accuracy studies) initiative. The EQUATOR (Enhancing the QUAlity and Transparency Of health Research) Network was launched in 2008 to attempt to consolidate these and other reporting guidelines and further improve standards in research transparency. These groups produce checklists in the hopes that the proper reporting of research methodology and outcomes will improve our ability to appraise the medical literature. While the ICMJE recommends that journals ask authors to follow these requirements, the onus remains on the journals and authors themselves to comply.
#3 Post-Publication Peer Review
Inadequacies in the peer review process have led to the increasing popularity of post-publication peer review. The medical community at large has become more cognizant of the limitations of published research, and skepticism has grown. The rise of social media has contributed to this skepticism, with a slew of blogs, podcasts, and websites dedicated to critically appraising the literature. This propensity for public appraisal was highlighted following the publication of the IST-3 trial last year, in which the authors claimed that “thrombolysis within 6 h improved functional outcome” following stroke. This conclusion was published in The Lancet, a reputable journal with a high impact factor, despite a lack of improvement in the trial’s primary outcome. The resulting backlash in the social media was swift and harsh, as an ever-growing crowd of skeptics pointed out the apparent contradiction, evidence that where peer-review fails, the Internet will provide a safety net.
In addition to such informal online venues as Twitter and Facebook, the National Center for Biotechnology Information (NCBI) has recently introduced PubMed Commons, which allows users to comment directly on any of the articles housed in their database. While still in a pilot phase, the Commons will provide a forum for public appraisal, discussion, and education, and allows its users to share potentially valuable insights. Although un-moderated, the system will not allow anonymous commentary, forcing users to post responsibly and professionally. The major strength of the Commons is that it will allow for a localized repository of public peer review linked directly to the involved citation.
Conclusion
The kind of absolute “reproducibility” in research called for in the Economist is not only ambitious, but is likely unachievable. Two well-performed trials addressing the same clinical question will still differ somewhat in their methodology or patient population, and hence provide different results. Even if a study is reproduced to the smallest detail, the results will tend to differ due to the effect of random chance. It is for this reason that study results are reported with confidence intervals, reflecting the statistical uncertainty associated with those results. Any effort to replicate study findings is therefore likely to fail.
The authors at the Economist seem to harken back to a perceived scientific heyday following World War II, when research was a “rarefied pastime.” This nostalgia seems misplaced, given the exponential growth in scientific research output over the last several decades. Clinical research has improved substantially, with increasing use of sound methodology to reduce the risk of bias in trial design. Improvements in our understanding of statistical techniques have revolutionized the way we interpret research results, and the manner in which we apply those results to patient care. No longer are the numerical results of a trial considered sacrosanct, but instead we consider the statistical (and methodological) uncertainty associated with those results. The importance of ethical conduct in clinical research has also increased since the advent of the Institutional Review Board at US institutions in the 1960s. The Willowbrook study involving developmentally delayed children was conducted in the mid-1960s, while the Tuskegee syphilis study continued until into the 1970’s. In the modern era of clinical research, not only is the potential benefit of an investigational treatment to future patients considered, but also the potential harm to the study subjects themselves.
While the medical research establishment is far from perfect, changes are being made for the better. The Economist seems to call for a complete paradigm shift in the way research is performed, but let’s not throw the baby out with the bathwater just yet. The inability or failure to replicate the results of clinical trials is multifactorial, and cannot be blamed entirely on drug companies and academic competition. Large, methodologically sound trials are expensive to conduct, and funding for clinical research is harder and harder to come by. Government sequestration led to a 5% cut in the NIH budget in March of 2013, impacting every area of medical research. This highlights the need for substantive improvements in the studies that do get funded. Adherence to ICJME recommendations regarding pre-trial registration and adequate reporting of results using validated checklists will improve the transparency and quality of manuscripts, while a decrease in the number of underpowered studies will help ensure that research performed actually has the potential to impact clinical practice in a meaningful way.
Dr. Cohn practices emergency medicine at Washington University in St. Louis and is the director of the Washington University EM Journal Club