Jan 17, 2020 Clinithink

Identifying patients with Non‑Alcoholic Fatty Liver Disease

Product: CLiX unlock
Client: Icahn School of Medicine at Mount Sinai, New York
Market: NAFLD

A study at the Icahn School of Medicine at Mount Sinai in New York has assessed the accuracy of Clinithink’s clinical natural language processing (CNLP) solution, CLiX unlock, for the identification of patients with Non‑Alcoholic Fatty Liver Disease (NAFLD).

It demonstrated that patient details identified by CLiX unlock could be used to assess disease progression from early NAFLD to NASH (too specific to be distinguished by ICD-10CM), and to identify cases where NAFLD was referenced but not propagated into future notes. Some of these patients later developed NASH or cirrhosis. As NAFLD was previously documented in the EHR, these represent examples of critical breakdowns in the continuity of care, as later physicians could have taken preventative measures had they known about the problem.

Using CLiX unlock to identify patients before they develop a more serious clinical outcome could substantially prevent progression of NAFLD to NASH.


It is well known that electronic health records contain both structured data (such as diagnostic codes) and unstructured data (such as clinical documentation), and that the best clinical insights are derived from analysing both. The unstructured data contains critical information that is not found in structured data and which can greatly enhance the clinical insights derived from structured data alone. But it is also well known that manually reviewing unstructured data is painstakingly labour intensive and has a high margin of error. This study at the Icahn School of Medicine at Mount Sinai in New York was therefore designed to compare the patient selection process using CLiX against that using structured data alone or simple text search. It goes on to demonstrate additional analysis opportunities enabled using CLiX encodings.

The Study

Researchers first examined the use of Clinithink’s CLiX unlock CNLP solution for patient selection in one disease model, Non‑Alcoholic fatty liver disease (NAFLD), now one of the most common causes of liver failure in the US. The researchers focused specifically on determining 1) the ability to identify affected patients, 2) the ability to assess patterns of the disease progression from simple NAFLD to Non‑Alcoholic Steatohepatitis (NASH), and 3) the ability to identify gaps in care related to the breakdown in communication amongst healthcare providers.


The BioMe Biobank provided the data for the study, a prospective cohort with over 40,000 ethnically diverse patients recruited from primary care and specialty clinics within the Mount Sinai Health System. Clinithink’s CLiX unlock solution was used to analyse the unstructured narrative within this data by breaking down clinical notes such as radiology reports into separate language constituents, mapping those constituents to SNOMED expressions, and applying a plethora of synonyms, syntax and semantics to that data layer, which then represented as many identifiable clinical facts as possible.


Patient selection validation identified 2,281 patients with NAFLD using CLiX unlock, dramatically outperforming ICD search in sensitivity and free-text search in both sensitivity and specificity.

Further analysis demonstrated that of the 2,281 patients identified with NAFLD, 486 later progressed to NASH.

Among patients with NAFLD identified before NASH, the average progression time was 410 days.

CLiX unlock identified 619 patients where NAFLD was identified in a radiology note, but not in any following clinical notes.

Of these, 170 later progressed to NASH or cirrhosis, indicating missed opportunities where a physician could have tried to avert progression if the information had not been lost.

Table 1: Accuracy of NLP identification of NAFLD patients relative to ICD & text search


The study demonstrates that, by using Clinithink’s CLiX unlock, it is possible to identify many more NAFLD patients than by using structured data, and that it is possible to do so with much greater accuracy than by using text search. It also demonstrated how CLiX encodings can be used for custom analyses that would have been very limited otherwise. The final analysis identifies breakdowns in the continuity of care (which might be classified as preventable medical errors) where critical information identified by one doctor is lost from the knowledge chain between doctors and so results in a suboptimal outcome for the patient. This analysis would not be feasible without the sensitivity and specificity of CLiX unlock’s findings.


Past studies have demonstrated that NLP can be used to obtain valuable data for research that can be more accurate than ICD codes. This new study supports these findings, identifying NLP (using Clinithink’s CLiX technology) as clearly superior for individual phenotype algorithms. As data volume and accuracy are critical for big data initiatives, it stands to reason that NLP-derived features will yield superior models for these endeavors.

NAFLD is poorly captured by structured data. When searching for NAFLD patients, a CNLP-based approach identified 2.5 times as many patients as ICD search, and was significantly more accurate than a rigorous text search approach.

Furthermore CLiX unlock is able to recognise NAFLD patients, not only when the authoring clinician specifically mentions the condition, but also when symptoms that indicate NAFLD are mentioned.

As ICD-9CM cannot differentiate between early stage NAFLD and full NASH at all, and as ICD-10CM has a concept for NASH, but not one exclusively for NAFLD, CLiX unlock was able to facilitate an analysis that would have been almost impossible with structured data. Of 147 patients identified as having NAFLD prior to the discovery of NASH, the average progression time was 410 days.

Suspected NAFLD on imaging is often not acknowledged in subsequent clinical documentation and many patients are later found to have more advanced liver disease. Analysis identified 619 patients where NAFLD had been identified in radiology notes, but was never perpetuated into the chain of progress notes. Of these patients, 170 later developed NASH or cirrhosis. Had the future doctors been aware of the earlier diagnosis by the radiologist, the progression might have been mitigated if not prevented.

Augmented intelligence with natural language processing applied to electronic health records for identifying patients with non-alcoholic fatty liver disease at risk for disease progression was published in the International Journal of Medical Informatics 129 (2019) 334-341

Authors: Tielman T. Van Vleck, Lili Chan, Steven G. Coca, Catherine K. Craven, Ron Do, Stephen B. Ellis, Joseph L. Kannry, Ruth J.F. Loos, Peter A. Bonis, Judy Cho, Girish N. Nadkarni

Contact Clinithink to find out more about CLiX and CLiX unlock:
T +44 (0) 292 125 0190 E info@clinithink.com www.clinithink.com


Published by Clinithink January 17, 2020