Do you know what’s in your medical record? Does it contain mistakes or omissions?

The extraordinary response to our April 1 post about data transfer from PatientSite to Google Health (86 comments so far) made us realize that the time has come for patients to take responsibility for their personal medical data. Toward that end, we’ve begun writing about how to understand health data. And that starts with understanding a few basics about I.T. … information technology.

It’s become apparent that it’s not effective for us to just hope our health data systems were intelligently designed and reliably executed. (Hm, that sounds like the financial bailouts … assuming “they know what they’re doing” didn’t work out too well there, did it.)

So let’s get on with it. Prerequisite: Read the very short The I in IT stands for Information.



As you probably know, computers can’t actually store information (like the reality that “e-Patient Dave is a hunk”), they only store 1’s and 0’s. The process of converting real information into 1’s and 0’s is called encoding.

The person you see in your camera’s viewfinder is a reality whose image you want to capture (store) so you can view it (retrieve it) later. The camera creates a JPEG file containing 1’s and 0’s that encode the photo in an agreed way. When another program knows how to read the same encoding, the information has been transferred. (Experts will note that JPEG is a “lossy” format that gives up some information, to make the file smaller. That’s right, but it’s not central to this particular topic.)

It’s all about agreement: people agree on the encoding and decoding, so the original intent is preserved. Example: JPEG works as a data format because people (the Joint Photographic Experts Group, JPEG) got together and agreed how the data would be encoded and decoded.



In more complex cases, such as medical information, such groups will kick it up a notch: they agree on a vocabulary, an agreed set of things you can express. Vocabularies always involve a trade-off between completeness and usability: just as with an English dictionary, you can agree on a massive vocabulary, in which it’s time-consuming to find exactly what you want but there’s lots of nuance, or you can agree on a more concise vocabulary.

Some tasks require a rich vocabulary, some don’t. Some vocabularies overlap a lot, some a little. And some people have more use for more words about a particular topic (Wikipedia). So it’s important to choose the right vocabulary for the job.

With that as background, here’s a short write-up of various vocabularies that encode information about various things related to healthcare. I am indebted to our Dr. Danny Sands, who is something of a pioneer in “informatics” (healthcare IT), for the bulk of these descriptions.


Medical data vocabularies

There is no standard “consumer” vocabulary for medical conditions — that’s a work in progress. The list below is a partial list of medical vocabularies used by various professionals. These are concise descriptions; there are references where you can read about these, if you like. Some are:

CPT (Current Procedural Terminology) is a set of codes used for procedures, not diagnoses or conditions, including intensity of outpatient visits. The AMA licenses the use of these codes.

SNOMED CT is a clinically meaningful vocabulary. It’s provided without a license fee from the National Library of Medicine to all developers in the US.  It’s the closest thing we have to a universal clinically useful problem list vocabulary. Most commonly used in practice (office) EMRs, not hospitals. But it’s hard to build tools to make SNOMED easy to navigate. Also, not all systems use it.

BI96 is the controlled vocabulary used for the online medical records at Beth Israel Deaconess Medical Center. It is similar to SNOMED but not as sophisticated (was developed at the BI and sent to the National Library of Medicine to be shared as a clinically useful vocabulary.)

NDDF (aka “FDB“) is used by some systems to represent medication information. It is available at a price from First DataBank (FDB), a division of Hearst Publishing.

RxNorm is an incomplete effort by the NLM to create a drug vocabulary that is available to the public for free. It has never been robust enough for general use.

NDC is a way of representing individual prescriptions for inventory management or drug claims.

LOINC represents clinical observations and test results. Was developed at Regenstrief Institute.


It’s important to realize that there’s no guarantee any given reality will be encoded the same in two different vocabularies. As one of those references says,

17.7:  Comparing coding systems is not easy: Unsurprisingly, the same clinical concept might look very different when coded using different classification systems. …[their origins and histories] inevitably result in the use of different terms for similar concepts.” [emphasis added]

What’s important is agreement and to select the right vocabulary for the job.


And now we come to ICD.

ICD (International Classification of Diseases; Wikipedia) is the data set that my hospital selected to transmit to Google Health. Specifically they use ICD-9, the ninth edition.

ICD-9: Insurance billing codes. A much smaller vocabulary of conditions and symptoms than SNOMED.

For billing purposes this is theoretically appropriate, since insurers don’t need to know the subtle diagnostic differences that doctors need to understand. But the ICD vocabulary is weaker still, because it completely lacks many conditions, and has no way to encode that you were just checking for something rather than actually having the condition.

And if you can’t encode that fact into the system, there’s no way to “decode” it back out. Result: train wreck.

That’s what happened with my supposed “metastases to brain or spine” – they were checking for brain mets and didn’t find any, but the billing code that ended up in the system couldn’t express that. Same for my “intestinal parasitic infection.”  So then, when those billing codes are transmitted as if they were clinical reality, the result is wrong information.

And since the I in IT stands for information, wrong information=#fail.

Worse, in reality, billing codes are often misapplied by “coders” (clerks), to feed something into a claims payment system that it’ll accept. So the “information” placed into the system might not be something the doctor would ever have said in the first place.

Bottom line: in practice, ICD data is useless as health information.

btw, If you look at how ICD-9 is meant to be used (to capture disease frequency statistics), and then realize how it is used (to submit charges to billing systems, in an uncontrolled environment) it’s chilling. Are all the disease statistics we hear skewed by billing clerks trying to cope with an inadequate vocabulary??

It’s been said that ICD-10 is coming and will be better. Well yeah, ICD-10 was developed in 1992, and it still isn’t used yet in most systems. (Yes, healthcare technology is that slow to adapt to change!)

And besides, regardless of the vocabulary, if the process for using it isn’t reliable, and data gets inserted into systems with billing intentions, and then someone reads it out thinking it’s clinical information, disasters can – and do – happen.


So, ladies and germs, that’s tonight’s lesson on how information (reality) is encoded … our second lesson on health IT.