The barriers to tapping into Big Content and how to overcome them

By Jeremy Bentley, Chief Executive of Smartlogic

2 December 2014

The healthcare sector, like every other industry, is creating not just more data, but more information than ever. Vast tracts of information on diagnoses, treatment options, new medications, trials and research, as well as general health factors, are constantly being produced by an ever-widening number of organisations.

More than 90% of the world’s data has been created since 2010, according to Science Daily, and healthcare is a huge contributor — neuroscience alone has generated nearly two million papers that are now online, a 2013 University of California Los Angeles (UCLA) study shows.

All this medical data can be hugely beneficial to healthcare providers and their patients, but pinpointing the exact information required is no simple task, demanding precision and fast data retrieval from the huge wealth that is available.

The key challenge, however, is not in the volume of information available, but in its fragmentation, and this presents a significant barrier to unlocking the value of organisations’ ‘Big Content’ — the unstructured information component of Big Data.

A survey by MindMetre Research of close to 400 senior information management professionals in the US and Europe has identified this fragmentation as the major barrier to accessing and using this valuable content — much of which is unstructured and stored in multiple locations, as well as in different formats. Indeed, 79% of participants from the healthcare and medical industry pinpoint scattered and dispersed information as the biggest hurdle, while lack of tagging and ineffective tagging hindering the ability to find information at speed is the second most cited barrier (46% of respondents).

There is a growing drive amongst healthcare professionals to overcome these obstacles, however, as the benefits in terms of gaining a competitive edge and providing better patient care become clearer. In fact, while the survey shows that 83% of respondents in the healthcare and medical industry believe large organisations are creating more unstructured information than ever before, it also reveals that they are beginning to understand its power and potential, with nearly 90% seeing a major advantage in being able to more quickly and easily reach this content.

So just how does one unlock this unstructured data and reap the benefits both for organisations and patients? The answer is to use content intelligence, which is all about making the mass of unstructured information within an organization findable and actionable.

The problem is that most existing information management applications – including Microsoft SharePoint, Apache Lucene and Solr, Oracle, Google Search Appliance — don’t have the level of content intelligence built into them that organisations need to really unearth very specific and technical unstructured information. As a result, many healthcare and medical organisations lack: the ability to apply consistent metadata across disparate information sources, search functions that enable users to pinpoint very specific data, and mechanisms for managing specialised and unique vocabulary.

To address these requirements, healthcare and medical organisations need to invest in systems that enhance existing information management platforms, by adding the capacity to automatically categorise or meta-tag unstructured data accurately and the capability to create and manage vocabulary more precisely, while leaving the data in its original locations to avoid the expense of formatting and absorbing huge volumes of different types of information into a single hub.

Managing terminology issues is particularly crucial in a sector renowned for its complicated language, and this requires frontline medical professionals to work together to develop a common content intelligence approach. The United Kingdom’s National Health Service is already leading the way in this area, implementing content intelligence capability in its NHS Choices website — seeking in particular to help solve the major issue of ‘patient-speak’.

The NHS patient information portal reads the context of a search term, even a layman’s expression, and brings up results that address the searcher’s intended meaning, while screening out irrelevant documents — thus a patient searching ‘superbug’ will find results on MRSA infections rather than the latest global computer virus or giant insects.

This easy-to-search approach is a best practice model that we should be working towards and encouraging partner institutions to adopt. To access and share the wealth of healthcare information being produced, frontline medical professionals have to work together to develop a common vocabulary and more immediate means of finding it — whether for use in treating patients or in communicating with them.
The implementation of content intelligence solutions radically improves the healthcare industry’s ability to filter unstructured content, resulting in quick access to relevant and accurate information, and so enabling quicker medical advances, the provision of better care and an improved patient experience.

How big data is being used in healthcare today

Protecting critical healthcare data in the era of 'big data'

In cardiology, Big Data covers the ‘whole’ patient

More features ...