The Challenge of Big Data

on Thursday, 25 April 2013

One of the big concerns of the moment is so-called Big Data. It is a great buzzword, but what it really means is lots and lots of data, and in the modern world we are producing data at an unprecedented rate.  According to IBM around 2.5 quintillion bytes of data are being generated every day. Furthermore, of all the data that we have at our disposal, around 90 percent of it was generated over the last two years.

Some people consider that this exponential growth in data will in itself bring about unprecedented benefits, that it will replace the need of theories and traditional scientific methods in finding answers to all of our questions. Certainly we can use all that data to make predictions about the future, and such predictions can succeed. But they can also fail.

One of the principal problems with Big Data is Big Noise. Although information is increasing exponentially, useful information is not. Most of the information is noise, and noise is increasing faster than the signal. In fact the degree of objective truth remains more or less constant, but data overload means that there are just too many data sets to mine and too many theories to properly test. So how do we handle all the data and use it to our benefit?

Often it is impossible to store all the data that is generated. The LHC at CERN generates far too much data to store. Instead it decides what data is important and stores that, and deletes all the data that is considered to be unimportant. Perhaps CERN has reliable systems that allow it to do that, but for most organisations sorting out the data ‘wheat’ from the data ‘chaff’ is much more difficult. The problem is further compounded by the need for regulatory data compliance; throwing away the wrong data at the wrong time is illegal.

The first step in handling Big Data is creating a useful and searchable data archive. However, according to Mimecast, the most important property for most users is data velocity. Data velocity isn’t just the speed of incoming data; it is about how fast that data must be processed. Many organisations are finding that handling big data on site is too complex, too unreliable and far too expensive. Instead they are turning to cost effective third party cloud-powered file archive systems that provide secure long term storage along with full indexing.

Of course it is a chicken and egg situation; Big Data demands the cloud just as the cloud demands big data.

2 comments on

'The Challenge of Big Data'

  1. Yes, I do agree with you Aktar that often keeping big data is a big problem. You have shared a helpful post thanks....

  2. Thanks for sharing such an helpful article Aktar, you are absolutely correct that it is like chicken and egg situation; it is true that big Data requires cloud just as the cloud demands big data.