Is Big Data a Hype?
Share The Love!

A NEW KID ON THE BLOCK

Technologies come.. technologies go.. They create ripples in the lives of software developers like me because its like professional hara-kiri to allow a technology go by without having read about it and at least tried one prototype on it to see how useful it can be.

EVERYONE’S DOING BIG DATA .. WE SHOULD TOO!!

Big Data has been a buzzword for quite some time now, however, not many of us know that it has been there and been talked about since time immemorial! The importance and challenges of managing data have always been on forefront on the minds of business and technology executives alike. And so, there was this “one fine day” where we, the Tech Team at Financial Hospital (www.financialhospital.in) decided to sit together and evaluate where Big Data would fit into our scope of things. Yes… you read it right.. we were trying to “fit” Big Data into our scope of things. That was the very first wrong thought process. The reason behind that thought process was – ok, everyone’s talking about Big Data. We are not in that rat race. We should be! And so the journey to reach the Big Data Implementation in Financial Hospital milestone began…

RESEARCH IS WHAT I’M DOING WHEN I DONT KNOW WHAT I’M DOING

Research

What we were talking about was totally new territory, so I started the journey by cleaning up my new glasses squeaky clean and reading “Designing Data Intensive Applications” – Martin Kelppman. Must say, it was a really good read. It brushed up my RDBMS fundamentals first which had gathered rust in my brain over the ages.

Along with that there was a lot of other stuff that i read and most of it started off by saying – Is your data really “Big” enough to be classified as Big Data? They spoke about around 3TB-5TB of data size as the benchmark where a Technical Architect should even start thinking about Big Data. I was like “Duh.. “. The last backup that we took was not more than 2-3GB! So, what was I doing here researching about Big Data then!

All the Grey Cells in my brain were fighting against each other – some saying “we have to move on to Big Data” and the other group saying “just because there is a new kid on the block you don’t have to play with it!!”. Went outside my room looking a little pale and my colleagues got worried and decided to help me with this. A lot of brainstorming sessions then followed – Ashish – the Android Developer, Aakash – the Technical Architect and me – All bundled up in the conference room and literally scratching each other’s eyes out – each one stressing that what i am saying is right. The core point of discussion now moved from “How to do Big Data” to “Is it time to move our data to Big Data or should we wait till it is really Big”. Because all the research clearly said that if we move to Big Data databases even though our data may not be “Big” could in fact back fire badly since Big Data type of queries give poor performance on “Small” data

We reached a point where we realised we couldn’t come to a conclusion if the Financial Hospital database was really ready to make its move to BigData by just brainstorming. And then, Aakash and Ashish – and this time just because our brains were not enough – we even included our Digital Marketing Head – Shivam – who actually had nothing to do with databases in his day to day work!! – came up with the thought that we should consult someone who “has been there.. done that”. We left the conference room promising each other to dig into our contacts list and find out who has done this and give us some real Gyaan about how this is done.

We spoke to some really good experienced people in the domain, even invited a few to our brainstorming cell – the conference room – And then came along a term which fascinated me for the next few days – Polyglot Persistence – The new wave of technology which recommended RDBMS and Big Data technologies being used side-by-side. So, what the Gurus said was, move over only what needs to be moved to Big Data and keep the rest in RDBMS. Wow! That sounded super-cool. Financial Hospital database implemented using Polyglot Persistence…Man!! That would place us in a different league altogether!!

KNOWLEDGE IS HAVING THE RIGHT ANSWER..INTELLIGENCE IS ASKING THE RIGHT QUESTIONS

After a lot of brainstorming and Guru Gyaan one fine morning I woke up with a simple question in my head – Why did I start thinking about Big Data in the first place? Just because it was the new kid on the block or because the current database of Financial Hospital was giving me some trouble. After two decades of experience, one thing is for sure, I know that the need for a technology switch is mostly because what we have on hand must have some pain points which needed to be addressed. Yes, there were performance issues we were facing where trying to access a client’s Mutual Funds portfolio just kept giving us a loader icon which rolled away happily till i finished my cup of coffee!! Now that was something to really worry about! And a few other such scenarios where a CRM report which was supposed to tell me what was the revenue earning by a relationship manager just kept me waiting for eternity before giving me the final figure.

As suggested by the Gurus i decided to delve deep inside and answer a few questions before we decided to make the jump into the technology switch and more importantly first decide whether we even needed it :

  • Why Big Data?
    • Reduce the effect of noise in the data
    • Speeds up processing time
    • Reduce storage requirements
    • Automates the pursuit of needles in haystacks
    • Discovers unexpected connections across data sources
    • Automatically generates questions
  • How does Big Data fit into our culture?
  • How much time do we have to implement the big data project?
  • How many resources do we have to implement the big data project?
  • What kind of data do we capture / generate as of now?
  • What kind of data do we want to capture / generate in the next 3 years?
  • Do we have complex queries?
  • Do we have frequent read/writes? E.g gaming application has a lot of read writes
  • Do we have complex transactions?
  • Do we have dynamic calculations?
  • Is our data structured or unstructured or both? Can we identify where it is structured and where unstructured storage would have been a boon?
  • Does our data need security? Which part?
  • Is data volume really big?
  • Are queries slow?
  • Are we expecting schema changes frequently?
  • Are we looking at defining processes for archiving data? Which data fits the bill for archive?
  • Are there any tables with transient data? [They can be stored in NoSQL]
  • Consistency(Atomicity)-Availability-Partition Tolerance [CAP] [Choose any two]

I wrote down the answers to each of these questions (Cant be disclosed because of Intellectual Property constraints). And finally I reached the conclusion. …..

SO, IS BIG DATA JUST A HYPE?

Hype

Albert Einstein once said – “I think and think for months and years. Ninety-Nine times the conclusion is false. The Hundredth time i am right !”.

The first and foremost thing that we did was dived right down into the sea of code (I can hear Ashish and Aakash laughing!!) and checked out what was it in the queries that was causing the loader icon to merrily go round and round for ages. And boy!! what we found out blew our minds. A few tweaks with the indices here and there and the queries stopped misbehaving. For the reports we came upon the interesting solution of using OLAP (Online Analytical Processing) Cubes. For reports, where there is a lot of analytics involved and mostly there are aggregate values coming out as results (sum of leads etc) create a materialised view (OLAP cube) of the frequently accessed analytics. When requested obtain results from the cube rather then executing query every time so it saves time.

After all of this we came back to the conference room and decided that this really wasn’t the right time to make the move to Big Data. Our data was of a manageable size and MySQL as an RDBMS was doing a good job of taking care of it and giving us results in the way we wanted. The time would come when no amount of tweaks would be able to give us the results we want and then would be the time to gather up again and this time talk about which Big Data technology to use for which the set of questions we would ask ourselves would be totally different.

Big Data technologies were there since time immemorial (in different ways) and are here to stay. Such a valuable technology cannot be a hype .. its just that the time has to be right to take it into your fold! When the time is right for Financial Hospital, I’ll be back again on this with a new story that time as to how much we scratched each other’s eyes out to decide if MongoDB or Cassandra or Couchdb was the right choice .. Adios till then!

REFERENCES