Big Data is one of the most widely-hyped buzzwords of the tech industry, and that means there is a lot of hype and misunderstanding. Worse, concepts like “data science” are constantly thrown around, making this emergent technology appear like black magic to the business leaders who could benefit from it. This blog aims to cut through the hype to help you understand what Big Data is and its relevance to your business.
What is Big Data?
If you look up Big Data, you’ll probably find a definition along the lines of the following,
“Big Data is the term for a collection of data of any type, where the collection is so large and complex that it cannot be managed as a single instance; and with potential value that exceeds the analytic capabilities of traditional stand-alone solutions”
That’s not wrong, but it doesn’t really tell you what Big Data is, or why it’s so different. In particular, these definitions tend to focus on the amount of data, but that’s only one facet of the problem. Big Data has four key characteristics, known as the four V’s, Volume, Velocity, Variety and Value:
Volume refers to the fact that we generate a huge amount of this data, and generating increasing amounts. For example, smartphones contain an array of sensors which produce data that can be queried for use in analytics, such as a GPS system. As the number, complexity and exploitation of smartphones increases (more of them, producing more data and with users who know how to use them) the volume of data produced will increase as well.
Velocity means that the data changes rapidly, so rather than traditional BIdata about customer orders which can be handled in batches, we are instead looking at smartphone location data which could be outdated within minutes or even seconds if for example the objective is to send a specific offer to a customer in a high street.
Variety refers to the many types of data and sources, from databases to audio and video objects (to which we can attach context and which become part of analytics) and increasing amounts of unstructured mobile and social data.
Value is exactly what it says: the better we get at analysing Big Data, the more value we can extract from it.
In other words, Big Data does not simply mean that we have a bigger version of a traditional relational database, like an Excel chart; in a Big Data world we need to think about objects (literally “things” – for example we can bring tweets and Facebook posts into a CRM system because what matters are the characteristics such as who wrote it, read it, what it’s about and so on). So your Big Data view of a customer might not have the largest volume, but it is changing in real time according to their online activities.
Bringing more information together
The sheer variety of types of data causes a lot of confusion, simply because we aren’t used to thinking about merging different types of data into one record… which itself merges into every other record and object that it touches. The types of data involved can typically be broken down into the following:
Structured data: tables, relational data with semantics. Your typical database
Semi-structured data: such as documents with tables. It’s structured, but it won’t fit neatly in a database
Unstructured data: video, audio, images, raw text
Metadata: structured data about the data: who created it, tags, name, date
Streaming data: constantly changing data moving across networks without being stored. A live feed of your car’s current speed, for example
Temporal data: typically trends over time
Geospatial data: information on position in space, such as location data, GPS or Beacon navigation data
You can see from this list that there is a fair amount of overlap, for example geotagging on a photo (which tells you where it was created) would be metadata but could also be considered to be geospatial data, especially if it includes GPS coordinates for example; and streaming data can be observed over time to create temporal data. That’s why it’s called Big Data: all of this data is related to the object but it cannot be easily pinned down and it needs to be analysed differently to a traditional database.
How do we analyse Big Data?
Traditionally, we loaded data into cubes and used business intelligence (BI) tools to slice this data and gain insights. That worked really well, but it required the data to fit into databases and it only really told you what had already happened; you then needed to use a bit of ingenuity to guess what that meant for the future.
One of the promises of Big Data is real-time and even predictive analytics, meaning that Big Data can tell you what is happening right now (so you can react faster) and even tell you what is likely to happen in the near future. Naturally, the more granular and personal nature of Big Data (if a shopper picks up three items and the Big Data suggests that they are likely to be interested in socks, that relates to one person, not an entire demographic) also means that more of the decisions need to be automated in order to take advantage of opportunities.
Therefore, we need to teach computers to extract knowledge, and then make this work reliably on a large scale. This is another area of Big Data that can sound intimidating, but it’s simpler than it sounds.
We teach computers by:
Providing them with agreed-upon semantic concepts (semantics is the study of meaning; so relating a word, symbol, data point and so on with its significance)
Mapping semantic content from structured or unstructured data into knowledge representations (meaning, “the computer sees this, what should it do with it?”)
Using analytics to create more meaning from extracted data, and map that into the knowledge representations as well.
This is performed on a large scale by:
Dividing the problem into pieces and executing in parallel (so the whole is completed faster)
Building clever knowledge indexes for faster searching
Using high performance computers, with technologies such as In-Memory Data Grids
Incidentally, if you employ Big Data solutions in your business, the majority of this work has already been done by the company that built the platform. All you need to do is set rules and point it at your data.
Once the knowledge has been extracted, statistical algorithms can be used to identify patterns and predict outcomes; then we can implement rules to trigger actions automatically in response. A wide variety of statistical algorithms are used as well as a broad array of visualisation techniques from spreadsheets, various graphs and even 3D mappings to recognise what’s actually going on.
Who does this?
Big Data is handled by data scientists, but this is just a catch-all term for people (or teams) who have certain skills and characteristics. While one person can fill all of these areas, in a team you might have one expert in each of the vertical segments but they will all have the horizontal skills.
How would you implement a Big Data solution?
All of this leads to the key question: does Big Data and its associated In-Memory data mean that traditional BI tools are on the way out? Any attempt to answer this just leads to more questions: should we extend existing enterprise integration to the new tools? Will In-Memory computing replace ETL and batch processing? Will the improvements on ETL such as process-based integration continue to lead the way in this new world? Or do we need a smart platform that can take all these elements and fit them together?
I don’t want to try to answer these questions here, as that’s a topic for another day; instead, I’d like to show the difference the new world can bring and let you start thinking about what this could do in your business.
Imagine a high street retailer like House of Fraser: in the traditional BI model they would want to accumulate your transaction history and to do so would offer you a loyalty card. Scanning this every time you make a purchase would let them track what you have bought, and put this data into a warehouse or cube where it could be carefully sliced to provide insight into which promotions you could be offered. The problem is that this is reactive, tries to extrapolate based on past activity and does not offer you a very personalised experience… and consumers are increasingly bored with loyalty cards.
In an In-Memory, Big Data world, the picture is very different. The retailer has no need to persuade you to take a loyalty card because they can track your purchases by your credit card number. They know what that card is used to buy, and can track differences between shopping trips. For example, if you make a pair of weekly shopping trips, but bought a belt the first time, the system could intelligently offer you a promotion that you might be interested in. Not another belt (you already have one) but perhaps an offer on socks that people who also bought that belt chose. In order to create and personalise that offer, it would be possible to bring in data about people near you, similar demographics, and what people who bought the same item said on social media about it. Who has endorsed the brand, has it been featured in any news or magazines? Not only is the amount of data that can be brought in growing but it has to encompass both structured and unstructured data, and the kind of information that would previously have been too difficult to collect or analyse is now exactly what will persuade the customer to buy.
This personalised offering is made possible in real-time without any cubes of data or warehouses, without any BI and just by having your and every other customers’ purchases held in memory. Real-time also means that offers don’t have to wait until you’re at a checkout (by which point you have psychologically shifted from “browsing” mode to “complete the purchase, get home” mode, and thus are less receptive), instead by combining it with those hand-held scanners that are increasingly popular in supermarkets you can be presented with a personalised offer as you pick up an item, or as you walk past an aisle. If you’ve ever got your shopping home and then remembered something else you needed, you can imagine how useful a quick reminder about some of your frequently purchased items might be.
Click for the online version