The New England Journal of Medicine’s Health Policy and Reform just published an opinion piece about the first public release of online report cards regarding 221 of the 1,100 US cardiac surgery programs. The authors believe that

this event will fuel the debate regarding the risks and benefits of public reporting, including the question of whether it assists patients in discriminating among sites of care.

I hope this blog post can be a modest contribution to this debate, by raising the awareness about the very real risk of a new and serious data divide, just as Susannah is sayingmobile was the final front in the access revolution. It has erased the digital divide. A mobile device is the internet for many people”, adding “we may be entering a new era where access isn’t the point anymore. It’s what people are doing with the access that matters“. Thinking about access, could there be negative unintended consequences of OpenData? OpenData is Difficult! For example, do we know “how many effective users are there likely to be for such services?”

We all agree that individual health data made available without proper context can be dangerous. But could opening large clinical outcome datasets and offering open online access be as dangerous by helping to distort future results and durably impact negatively health outcomes for some?

The Health 2.0 world is abuzz lately with the great potential offered by opening up aggregate public health information for outside analysis. The OpenData movement has clearly stormed in. Hackers galore are working to use this information to create the next generation of useful websites that will make health choices easier. The US and UK governments are actively promoting the public availability of large data sets in many domains where the government is involved. I haven’t heard anyone doubting the great impact these governmental initiatives will have. It seems so obvious. As advocates of patients direct and constant involvement in their care, we are equally excited at the potential these Open Datasets could have on health in a short period of time. Just as we have been advocating for Open Access to full text articles resulting from federally funded scientific research and for full access to our health record, including doctors notes. So, please do not take this post as an attack against opening any health data repository!

Let’s look in some more details at the NEJM article. The opened dataset is composed of clinical outcomes results for coronary artery bypass grafting (CABG) and the ratings are calculated from a registry developed by the Society of Thoracic Surgeons (STS) in 1989 . This may be the largest dataset of clinical outcome and associated ratings made available online. It is an important step forward in the opening to the public of data until recently hard to find or housed in closed governmental data silos. While we are still very far from “Gimme my damned data!”, opening this country-wide dataset and associated ratings is obviously a step in the right direction.

Even though some of us would expect all such data sets to be made public in the near future the reality seems to be vastly different. One particular aspect of this first US-wide ratings release deserve attention: each program that chose to make its data public was assigned a rating of 1,2 or 3 stars for overall performance. The performance thresholds are designed to identify with a 99% probability the programs given 1 and 3 stars, respectively, those that are truly below or above average, called the outliers. This method, over the past 3 years, identifies 23 to 27% of the programs as outliers. This first public release of a large clinical results dataset, as important as it is, is imperfect. The voluntary nature of the public release creates a skewed dataset. Also noticeable are the lack of long-term outcome assessment and of individual physician ratings.

Now, you may ask, even with these imperfections, what could be the negative impact of such an innovative program?

A tweet last week alerted me to a ground-shaking study. Mike Gurstein in “Open Data: Empowering the Empowered or Effective Data Use for Everyone?” says

this drive towards increased public transparency and allowing for enhanced data enriched citizen/public engagement in policy and other analysis and assessment is certainly a very positive outcome of public computing and online tools for data management and manipulation.

However, as with the earlier discussion concerning the “digital divide” there would, in this context, appear to be some confusion as between movements to enhance citizen “access” to data and the related issues concerning enhancing citizen “use” of this data as part, for example, of interventions concerning public policies and programs. […]

In an earlier paper dealing with the digital divide discussion I suggested the use of the concept of “effective use” to distinguish between the opportunity for digitally-enabled activity presented by ICT access, from the actual realization of those opportunities in the form of “effective use”. At that time I introduced a set of layers of requirements, which can be understood as “pre-conditions” for the realization of “effective use” of digital “access”.

Susannah Fox at the Pew Internet and Life Project relentlessly analyzes the US population use of the Internet. Lately she has been writing about the Chronic Divide:

U.S. adults living with chronic disease are significantly less likely than healthy adults to have access to the internet (62% vs. 81%). The internet access gap creates an online health information gap. However, lack of internet access, not lack of interest in the topic, is the primary reason for the difference. […] Living with chronic disease is also associated, once someone is online, with a greater likelihood to access user-generated health content such as blog posts, hospital reviews, doctor reviews, and podcasts. These resources allow an internet user to dive deeply into a health topic, using the internet as a communications tool, not simply an information vending machine.

The full report adds “statistically speaking, chronic disease is associated with being older, African American, less educated, and living in a lower-income household. By contrast, internet use is statistically associated with being younger, white, college-educated, and living in a higher-income household. Thus, it is not surprising that the chronically ill report lower rates of internet access.”

I cannot separate the above from Mike Gurtein’s comment:

Efforts to extend access to “data” will perhaps inevitably create a “data divide” parallel to the oft-discussed “digital divide” between those who have access to data which could have significance in their daily lives and those who don’t. Associated with this will, one can assume, be many of the same background conditions which have been identified as likely reasons for the digital divide—that is differences in income, education, literacy and so on. However, just as with the “digital divide”, these divisions don’t simply stop or be resolved with the provision of digital (or data) “access”. What is necessary as well, is that those for whom access is being provided are in a position to actually make use of the now available access (to the Internet or to data) in ways that are meaningful and beneficial for them.

The question then becomes, who is in a position to make “effective use” of this newly available data? […]

Given in fact, that these above mentioned resources are more likely to be found among those who already overall have access to and the resources for making effective use of digitally available information one could suggest that a primary impact of “open data” may be to further empower and enrich the already empowered and the well provided for rather than those most in need of the benefits of such new developments (unless of course, they have means or the luck to find benefactors such as the Cedar Grove Institute or Harvard Law School graduates willing to work pro bono or on a contingency basis).

I have had this conversation with e-patient Dave a few times. Dave is a remarkable writer and can describe his vision of patient empowerment in a masterful way. But I am convinced he doesn’t speak for the poor and downtrodden, for those who have no job, no insurance and no powerful connections. For this segment of the US population the positive impact of the wide availability of eHealth resources is less than clear. In fact I am very worried that we are fast building a nation of health data outliers, with large numbers of both super-empowered and health data virgins, not by choice. In this context, the example of empowering the empowered mentioned by Michael Gurstein resonates strongly. Tim O’Reilly after reading the account said “we need to think deeply about the future” as he was preparing to launch the GOV 2.0 event in DC!

Read on:

newly available access to digitized land ownership and title information in Bangalore was primarily being put to use by middle and upper income people and by corporations to gain ownership of land from the marginalized and the poor. […] They were able to directly translate their enhanced access to the information along with their already available access to capital and professional skills into unequal contests around land titles, court actions, offers of purchase and so on for self-benefit and to further marginalize those already marginalized. […] This is not to suggest that processes of computerization inevitably lead to such outcomes but rather to say that in the absence of efforts to equalize the playing field with respect to enabling opportunities for the use of newly available data, the end result may be increased social divides rather than reduced ones particularly with respect to the already poor and marginalized.

Does anyone believe that something different will happen with the data releases by the STS? Based on the Pew studies it is hard to imagine how we will witness anything else than a growing data divide in health. And if that data divide really happens, the empowered will have access to the life-saving dataset and will act upon it, while many of the people suffering from chronic diseases (the same population that would benefit most from access to this information) won’t. Over time it is therefore probable that the 3 stars outliers, the centers of excellence, will treat an ever growing number of empowered while the 1 star outliers, the centers with high mortality, will get worse and worse result, simply because they will treat an ever growing number of digital outliers who haven’t the possibility to obtain health information/data and apply filters.

I believe the Open Data movement, particularly as it applies to health, is an important, highly transformative and positive development. For some! What do you think? What should be done?

