Expanding into healthcare big data? Here’s a big data design primer

Aaron Kimball co-founded WibiData in 2010. He has worked with Hadoop since 2007 and is a committer on the Apache Hadoop project.

Software applications have traditionally been perceived as a unit of computation designed and used to solve a problem. Whether an application is a CRM tool that helps manage customer information or a complex supply-chain management system, the problem it solves is often rather specific. Applications are also frequently designed with a relatively static set of input and output interfaces, and communication to and from the application uses specially designed (or chosen) protocols.

Applications are also designed around data. The data that an application uses to solve a problem is stored using a data platform. This underlying data platform has historically been designed to enable optimal data storage and retrieval. Somewhere in the process of storage and retrieval of data, an application applies computation is to produce results in the application.

One unfortunate side effect of this optimized data storage and retrieval design is that it requires data to be structured in a predefined way (both on disk and during information design and retrieval.) In the world of big data, applications must draw on data from rigidly structured elements, such as names, addresses, quantities, and birthdays, as well as to loose and [unstructured data, http://en.wikipedia.org/wiki/Unstructured_data] such as images and free-form text.

Defining and building a big data application can be perplexing given the lack of rigidity in the underlying data. This lack of structure makes it more difficult to precisely define what a big data application will do. This applies to communication interfaces, computation on unstructured or semi-structured data and even communication with other applications.

While the traditional application may have solved a specific problem, the big data application doesn’t limit itself to a highly specific or targeted problem. Its objective is to provide a framework to solve many problems. A big data application manages life-cycles of data in a pragmatic and predictable way. Big data applications may include a batch or high-latency component, a low-latency (or real-time component), or even an in-stream component. Big data applications do not replace traditional single-problem applications, but complement them.

See on medcitynews.com

See on Scoop.it – Pharmaceutical Industry digital vision


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s