Dr. Esther Choo deconstructs the recent JAMA debate.
JAMA Internal Medicine published a remarkable study in December 2016 on differences between clinical outcomes associated with care by male and female physicians, which created a clearly discernible buzz in the medical world. The authors started from a provocative premise – that “career interruptions for childrearing, higher rates of part-time employment, and greater tradeoffs between home and work responsibilities may compromise the quality of care provided by female physicians.”
Thus, the stated objective of the study was to examine if there was data-driven evidence that women physicians provided lower quality care. Examining a nationally representative database of hospitalized Medicare fee-for-service beneficiaries, the authors found that those cared for primarily by female hospitalists had lower 30-day mortality and fewer 30-day re-hospitalizations compared to those cared for by male hospitalists.
How much of a buzz has this article created? As of this writing, it has been read almost 200,000 times and downloaded more than 15,400 times, and will likely be one of the most read papers in the history of JAMA and its spinoff journals. Judging by the heated online response, it also may be remembered as one of the most viciously criticized studies ever published.
Why This Study Actually Is Pretty Cool
The arguments against the JAMA Internal Medicine study are standard criticisms that would apply to almost any observational study, and the authors have addressed many of them already with much patience, eloquence, and good humor. However, I will highlight a few issues here.
First, while the study is observational, dismissing it entirely on those grounds is a little pat. In the last 20 years, we’ve come a long way with our appreciation about the value of well-designed observational studies. In fact, most of the research on which our practice is based is observational. Among observational studies, the ideal situation is a quasi-experimental design, or “natural experiment,” in which some circumstance essentially randomizes people to one condition or the other.
And in fact this is the design of the current study. I have never preferentially assigned patients to a physician of a particular gender; likewise, patients also do not come into the hospital and select their own admitting hospitalist, because this is not possible. In general, they are admitted to whomever happens to be the hospitalist on call. Because this set up is so typical, the authors chose to look only at patients admitted to hospitalists to take advantage of the fact that patients are virtually randomized, even though this is not a randomized controlled trial.
Despite this cool design — which in itself should control for many variables related to the outcomes of mortality and bounce-backs — the authors also did a number of other things to try to account for the universe of potential confounders. First, they adjusted for a wide variety of patient, physician, and hospital-level characteristics, including fixed effects for any confounders at the hospital level that were not captured within the data. In other words, they used statistical methods to control for potential explanatory factors that they couldn’t measure, and some they couldn’t even identify. They also conducted a number of sensitivity analyses, including one to specifically avoid bias against male physicians, looking only at hospitals without a medical ICU, reasoning that male physicians are more likely to work as intensivists, and thus incur greater risk of taking on patients at high risk of mortality. Sensitivity analyses are a way of testing your hypothesis against a variety of assumptions. The number and detail of these sensitivity analyses tell me either that the authors were unusually meticulous or that the JAMA Internal Medicine reviewers were unusually hard on them.
Data Tells a Consistent Story
It is easy to find spurious associations in large data, as many people have pointed out. However, as a health services researcher who has spent a lot of time in front of my computer crestfallen to see my hypotheses refuted, I will say that it is also rather hard to get large data to tell a consistent story. The elderly patients of female physicians had lower mortality and readmission rates across almost all medical conditions examined. And no matter how they sliced the data, through all of the various models and meticulous sensitivity analyses, the finding was highly consistent and almost entirely in one direction, in favor of the female hospitalists.
Many people stated that the size of the difference in outcomes was clinically insignificant, and would need to be bigger to be believable and relevant. The risk difference detected was indeed small: just 0.43% for mortality, for example. But I wouldn’t expect to find a big difference.
Humans are complex beings, and although I do gender-based research, and see gender differences around every corner, if I expected gender to be the whole story or even the dominant part of the story in any clinical situation, I would be frequently and sorely disappointed. While arguably the most salient feature of an individual, gender is only one of many factors that determine how someone behaves. Others include everything from race, ethnicity, age, and training background, to the neighborhood you grew up in, the advice Uncle Morty gave when you first thought about entering medicine, and that thing that happened on the playground in fourth grade that has always kind of stayed with you. A larger difference, in fact, would have given me much more serious methodologic concerns about this study than a small, subtle difference.
Further, in population-based, public health studies, small changes are often triumphs. The only reason we ever use large databases — such as the 1.5 million observations in this study — is to measure an effect that would not hit you in the face as you walk down the street, and yet might be meaningful when scaled across a population. And this amount of difference between groups has been hailed as success for other types of health-related influences, from smoking interventions to the reductions in all-cause mortality among Medicare beneficiaries that the authors mention in their discussion. The authors drew out this point by calculating the “number needed to treat,” coming up with an NNT for having women physicians ranging from 149 to 223, depending on the individual analytic model. This calculation was a little tongue-in-cheek, of course, as physicians are not prescribed to patients as easily as an aspirin.
The Real Take-Home Message
As it stands, gender is not a treatment. There is no forthcoming RCT involving double-blind physician gender reassignment surgeries. Thus, conversation about its clinical implications framed as a gender-divided battle of clinical superiority is unlikely to be helpful or satisfying, while being maximally divisive. Combing through hundreds of comments about the study, I was struck by how a person’s immediate knee-jerk reaction seemed to be inextricably linked to his or her broader attitude about gender equity in medicine: how they themselves viewed women in medicine, their interactions with male and female colleagues, and whether or not they believed that gender bias among physicians is a real phenomenon. It’s hardly surprising that by highlighting gender-based differences in performance between doctors, the study authors faced backlash, most vocally from the physician community itself. The inflammatory potential of gender here cannot be overstated. Note that research from Yale suggesting heavy doctors might be less effective than normal-weight doctors was not met with similar venom.
But although an observation about gender bias may have been the impetus for the study, and although beliefs about gender bias seemed to shape the dominant discussion about the study, I posit that the scientific value of the paper lies in a different direction: as an investigation on what gender differences can tell us about how to improve our clinical practice.
Paving the Way for Increased Personalization
Gender and biological sex are being increasingly recognized as important determinants of health. Sex and gender balanced research is the NIH standard and some journals, including those within our specialty, have policies requiring outcomes to be reported separately by sex or gender, given the near-universal influence these factors have on health and clinical outcomes. We no longer question that sex- or gender-specific differences exist in everything from manifestation of disease (e.g., myocardial infarction presentations in men vs. women) to responses to treatment (e.g., women’s greater risk from QT prolonging medications).
Rather than adopting a one-size-fits-all approach, physicians are increasingly expected to tailor their care based on relevant characteristics of each patient. Moving forward, patient-specific approaches will likely expand in scope and sophistication, with categorization by sex, race and age giving way to genomic profiling. This is a natural and intuitive evolution. If we know more about our patients and how they interact with the world, how could that not translate into better care?
It is a short cognitive hop from patient-specific factors to the influence of the whole therapeutic milieu, including the characteristics of the physicians. While we’re only beginning to disentangle the complex ways in which sex and gender impact how an individual interacts with the world and the implications this can have on health, it seems imminently plausible that in some settings and for some populations, factors correlated with healthcare provider gender might translate into meaningful clinical differences. The JAMA Internal Medicine study opens the door for many questions about how some physician characteristics might be strengths in certain circumstances. Does this mean we’re going to preferentially recruit or select women into medicine or push physicians into specific fields due to gender? Of course not. But it may mean that we further explore the characteristics of care that reflect “female” and “male” tendencies in studies like this one, and begin to understand what they mean for patients.
Prior research has demonstrated gender-based differences in clinical practice, including female physicians’ greater tendency to adhere to evidence-based clinical guidelines and to discuss health prevention activities. Maybe this study will lead us to find that women in general are savvier about what constitutes an adequate home environment for elderly patients upon discharge. Maybe women spend more time or communicate in more detail, on average, with case managers and families and identify when there is a discrepancy between what the patient thinks about their adherence to daily meds and what is actually happening. Maybe we will dig deeper and find out not only what women and men do differently, but why — what combination of experiences and biologic wiring leads to that difference.
No one thinks that the pure female-ness of the physician brings magic dust into the room that wards off death. No one thinks that men never do the things that women are sometimes lauded for bringing to the table. But there may be important lessons to be learned here that one day can be transferred into novel aspects of care that can be applied to all healthcare teams. I would say the exact same thing, and apply the same clinical curiosity, if the study demonstrated better outcomes were associated with men.
“Women may…simply be better overall doctors,” stated an article in online magazine Quartz.com. This message is completely different from what I took away from this study. Thankfully, few people, no matter where they stood on the study itself, seemed to buy this line. However, the idea that women might practice differently, that these differences might be borne of their gendered experiences, and that these experiences therefore may not be simply a cross to bear, but rather an asset in clinical medicine, is fascinating and uplifting. In professional settings outside of medicine, gender diversity has been observed to strengthen the effectiveness and productivity of teams. Defining “success” in terms of hard clinical outcomes increases the urgency to unpack the elements of diversity that matter, and provides a potential reframing of the challenges many women face as they progress through their careers in medicine.
One final note: this study was conducted by an all-male team. As the social media shitstorm progressed, I couldn’t help but wonder: would a woman have communicated the study results to media outlets a little differently? Might a woman have presented it with a little more nuance, considering the social context and the emotions of those receiving the study? Would a woman on the team have made the paper more acceptable to a broad audience? Perhaps on a meta level, the lesson from this piece is that all teams and all outcomes benefit from balanced gender representation, in ways that we have only begun to understand.