CNLP Technology: Build or Buy?

Posted by Jack Kowitt on 03/04/2017

build or buy

Long gone are the days when IT departments consisted of few staff members whose role was a mystery to most and spent most of their time answering the expected why-doesn’t-it-work? questions from colleagues. These days, IT is integrated into almost every process in just about every kind of organization. IT departments are now sophisticated, made up of highly skilled professionals and more often than not, very busy.

Healthcare is no different and IT is lauded as the gatekeeper to better quality, more affordable healthcare that is accessible to more people. It comes as no surprise that when vetting new innovative technologies, the inevitable question is whether to buy it off-the-shelf or build it in-house. Clinical Natural Language Processing (CNLP) is one technology that falls into that category because it’s applications in healthcare are endless and its impact is powerful so why wouldn’t organizations want to build it themselves?

DIY or don’t?

The basic checklist for whether or not to build CNLP technology include a rare range of knowledge and skills that span:

  • the practice of clinical medicine;
  • artificial intelligence algorithms;
  • linguistics; and
  • software engineering.

Even if an organization has these skills in theory or in isolation of each other, developing CNLP algorithms that read and interpret clinicians’ notes requires detecting and structuring complex medical phrases using both clinical and linguistic knowledge, and an understanding of a relevant terminology or ontology used for structuring CNLP output into standardized formats. Building such a system takes years of application of expert knowledge (typically doctoral-level), the curation of large volumes of training data, and lots of trial and error. Building it into a highly available, high performance, accessible product multiplies the effort considerably. This discovery often leaves organizations looking for alternatives to building it entirely themselves, often in the form of Open Source.

Keeping your options Open

There are numerous open source NLP toolkits available that may be a viable alternative to developing a solution from scratch. The fact that they are ‘free’ adds to the appeal considerably but the fact is that what you save in license fees, you’ll spend on developer time and resources with no guarantee that you’ll get it right.

The downside to Open Source NLP is that it’s not developed continually, it’s not documented, there’s no roadmap and there’s no support. In addition, NLP toolkits have very few applications or solutions and even fewer clinically orientated applications. Even if you do find one it’s most likely to be limited in its application or ‘special case’ over multipurpose, and will require:

  • training data;
  • considerable software engineering (don’t underestimate this one! A conservative estimate is 50%+ of developer time);
  • time investment; and
  • testing and development to address deficiencies in performance and interoperability.

Taking this route essentially means building up a team of professionals, clinical, technical and business, to make sense of these newly discovered NLP capabilities. Together, you’ll need to work out: querying; analytics via querying across documents and integration. And while open source might be attractive in its potential, it also opens up a series of risks (including security of patient data) and investment that can be avoided completely by choosing the right solution partner.  And, the overall time investment delays the benefits that the decision for CNLP can realize for the institution.

Easy does it

Perhaps a deciding factor when weighing up whether to build or buy a CNLP technology is how soon you want and need to access your unstructured clinical data to help solve complex healthcare challenges. If it’s sooner rather than later, then choose a solution that meets the following requirements:

  • HIPAA compliance;
  • Encryption for communication and any stored PHI; and
  • De-identified data.

Clinithink’s CLiX ENRICH is a best-of-breed CNLP technology that eliminates the risks associated with building in-house or open source. Choosing CLiX ENRICH means:

  • Lower Total Cost of Ownership: It’s unavoidable that the cost in terms of development time, and continual maintenance and improvement is much higher than purchasing a complete solution. When calculating the overall costs, it’s imperative to take a long-term view to understand affordability and in most instances, choosing a strategic CNLP partner is better value for money.
  • Immediate benefits realization: With CLiX ENRICH you’re able to tackle business and clinical problems right from the start rather than losing years to development time and stalling any improvements you want to make. On the other hand, open source projects are often believed to be ‘free’ but usually have complex dependencies with varying license terms. Even if there is an option that makes economic sense in the short term, the initial savings will be spent on developer time and resources.
  • Support: CLiX ENRICH users have access to professional support 24/7. Opting to go it alone is a heavy burden that can be avoided by purchasing an off-the-shelf solution.
  • Updates: Open source CNLP projects are largely academic, and tend to stagnate in the absence of ongoing, related research. A purchased solution provides a reliable schedule of content and feature updates that are in response to user needs and market demands.

Moreover, choosing CLiX ENRICH affords you benefits that make a marked difference when you are trying to use unstructured data to solve business and clinical challenges. With CLiX ENRICH:

  • You don’t need to clean your data before you process it;
  • You can customize queries specific to your environment and set of challenges;
  • Output can be mapped to a format of your choice;
  • CLiX ENRICH is easy to install and doesn’t compromise patient privacy or data security.

Want to know more about how to avoid the pitfalls of building or using open source CNLP tech? Contact us at

About the Author

Jack Kowitt has been a recognized IT CIO leader who career has been close to the leading edge of bringing technology to improving healthcare operation, both clinical and financial, and services to patients from the earliest EHR implementations through current data analytics and natural language tools. He has held leadership roles at SUNY, Mt. Sinai (NY), Samaritan (Banner) and Parkland (Dallas).