Nov 26, 2024 Clinithink

Granularity in clinical AI: Finding the right answer requires attention to detail

The essential value of unstructured notes

To achieve the highest-value insights in healthcare AI, we must analyze the information customers care most about. This means going beyond structured electronic medical record (EMR) data and looking at free-text clinical documents, such as hand-typed notes from a provider. Because these words and numbers are unstructured data—unable on its own to be easily analyzed by technology the way digital data can be—there’s a high tendency for free-text content to be misunderstood by most AI systems, if not overlooked altogether.

However, maximizing the value of that data means rigorously accounting for different styles of notetaking so that in looking at millions of such notes, you can proceed from a consistent, apples-to-apples comparison. Without a way to make sense of what a provider writes down and correlate it with other pertinent data, the opportunity for capturing vital insights is missed.

Take lung cancer, for example, which has at least 100 different subtypes, each defined by stage, tissue and tumor type, and biomarker status. The specific subtype matters greatly to both the patient and the treatment team, as this information guides major treatment decisions and impacts clinical outcomes for the patient and their family. The more complete your data, the better your chances of pinpointing the right treatment approach.

On the life sciences side, companies aim to develop precision medicines with a remarkable capability to attack and disrupt tumor cell biology without the devastating whole-body side effects that accompany traditional chemotherapy. To achieve meaningful results, these innovations target very specific subgroups of patients within the general tumor type. Looking for those patients using structured data such as the single billing code for lung cancer is grossly insufficient—you need to dig into the notes.

Getting granular with unstructured data

In both of these example settings, the details that define the many different types of lung cancer, each with its own approach to management, nearly always reside in multiple physicians’ free text scrawlings. Within a multi-disciplinary oncology team, in this case, radiologists, pulmonologists, pathologists, and oncologists all potentially contribute to the free-text record.

And here’s where the challenge grows complex. Significantly, providers often use slightly different words when referring to the same thing. This greatly complicates the goal of finding insights, because not only must we make the unstructured data machine-readable, but we must also find ways to intelligently equivocate what one specialist says so it can be correlated to another’s notes.

To be consistent, then, we must also be granular. Implementing a clinical language model in any type of healthcare process requires the extraction of extremely precise detail so that synonymous terms and concepts can be correlated successfully.

Optimizing insights with CNLP

Physicians write long lists of details in their notes every day, and each entry contains subtlety that can dramatically change the larger meaning. So, what’s the best technology for deriving maximum impact from doctors’ notes?

When analyzing the actual text in vital, unstructured clinical documents—written by a human and intended to be read by other humans, not by a machine—current LLM techniques often struggle with granular characteristics such as negation, temporal context (that is, past versus present), social context, family history, certainty, severity, and even patient identity.

A more suitable approach is needed to handle these variations, along with the clinical and temporal nuances among them. Picture these three phrases scattered across the oncology team’s notes for a single patient:

  • “the patient denied breathlessness”
  • “there was no evidence of edema”
  • “possible history of dysphagia”

Superior clinical natural language processing (CNLP) can readily recognize associate phrases like these when it operates using domain-specific knowledge tied to ontological schemas like SNOMED-CT to significantly enhance accuracy and relevance. Alternative approaches such as LLMs, on the other hand, might well assume a breathless patient with swallowing difficulties and swollen feet—none of which would be the case because they overlook the vital subtleties of negation, uncertainty, possibility, and temporal context. As a result, the AI has failed to generate the most helpful output, and vast levels of clinical and scientific opportunity remain unmet.

The key CNLP concepts enabling these recognition and association capabilities are called transformers and attention, which we’ll cover in a later blog. Applying these concepts to tame the complexities of clinical language with CNLP allows healthcare AI to separate signal from noise and yield breakthrough insights that empower the industry.

Think of it as carefully reading the notes versus simply scanning them in to see the words. When a clinical study wants to identify patients with diabetes in their history, an optimal AI solution must be able to abstract patients who have a “History of diabetes” from the record and ignore those with “No history of diabetes” (note the identical wording with just “no” added)—a task that sounds simple but is not.

Achieving granularity at scale

To help organizations get to peak innovation, the right AI approach must also be massively scalable. Human clinical reviewers are the gold standard for analyzing a handful of charts, but real change comes when technology can incisively review millions or tens of millions of records at once. Achieve that, and wonders start to happen.

In a recent study presented at the American Society of Clinical Oncology (ASCO) an AI-based approach was successfully used to predict early lung cancer. At this level, with this approach, the impossible becomes possible when the technology supporting you is trained to understand clinical nuances that it can then parse at unprecedented speed.

No approach is infallible, but to get to true innovation, our technology choices must be directionally correct. Healthcare industry AI leaders are continuously improving the machine-assisted support their platforms provide. This progress enables important errors to be caught and fixed. Agility with a lexicon and its myriad contextual cues tells us right away that, for example, Dr. Lye has nothing to do with caustic soda.

The next blog in this series will discuss these false positives (descriptively called errors and hallucinations) and the relative abilities of CNLP versus LLMs to readily identify and discard them en route to producing insights that shape the future of healthcare.


Learn more about responsible AI with healthcare

Published by Clinithink November 26, 2024