While internet behemoths like Google and Amazon have built multibillion dollar businesses on providing goods and services, the secret sauce that helps them to differentiate and potentially dominate is extracting useful insights gleaned from terabytes and increasingly petabytes of raw data.
Data processing and analytics technologies have evolved dramatically since the dawn of the internet. Irrespective of the various merits of different vendor solutions, the most common method of pulling out information from data is to collect it in a big database to try and find patterns, correlations and trends.
However, as the data sets become larger or the type of problems become more complex, the sheer volume of data that needs to be transferred, aggregated, stored, processed, analysed and then potentially long-term archived becomes mind-boggling.
>See also: Network Rail, UBS Investment Bank and Salvation Army to share their big data stories at Data Leadership 2014
Buying a chocolate bar in a retailer like Tesco will generate a few KB of information at the till, but with several million customer transactions each day, the data pile quickly adds up.
This situation is further complicated by certain real-time and traffic-based scenarios that are prevalent when looking at areas like security and the Internet of Things.
For example, a credit card company looking for a fraudulent transaction can’t wait till the end of the day and crunch all the billions of transactions to spot a fraud. It needs a process that can sift the data as it comes in quickly and spot anomalous usage, but with intelligence to understand the overall context of that transaction, merchant and account history to make an instant decision.
In many cases, data is not actually going from a single source into a single destination system. Instead, data often flows from many to many and interacts with many applications and processes along the way.
This is more evident when trying to understand the habits of people, like online shoppers, who may be accessing a number of sites or browsing different areas and sub brands of a large online retailer.
The individual online stores may be on different servers, in different data centres and potentially different countries. Whether the use case is retail, financial or healthcare, with huge levels of complexity are there better ways to gather data and generate insights?
An emerging method that gets around some of the issues of trying to pool all data in one place for analysis is to take a look at the data while it is in transit between different sources and destinations.
This approach is starting to gain traction and is called wire data analysis. In essence, the technology looks at the data packets flowing between applications and decodes the contents of these data flows.
The wire data analysis software is intelligent enough to be able to reconstruct the message into an intelligible conversation, store the parts of interest within a database or potentially make decisions about what it should do with the conversations depending on certain programmable rules.
Wire data is a new technology area that has initially found success as a means of understanding and managing application performance, and diagnosing faults on the network.
Because it has the ability to see and understand inter-application communication, the technology has many advantages over legacy systems that use machine data such as log files or agent data that places software agents on a server to monitor behaviour. Both these legacy technologies have severe limitations as they cannot see the whole conversation.
Although demand for wire data is growing in this traditional application performance management role, the ability to peer inside IP packets and understand what is happening at a transaction layer is also proving valuable in solving challenging big data problems.
Wire data is already used in some of the most challenging environments, like online betting. For bet365, which provides services to over ten million customers, wire data is used to both understand how mission critical applications are performing and also for finding insights within the billions of digital transactions it processes each year.
At bet365, wire data is predominantly used for ‘end-user intelligence’, where, unlike monitoring products that only show what users are doing and experiencing on the frontend, it can correlate user activity and experience to performance in the backend IT infrastructure. In other words, the system does not just show what users are experiencing, it also explains why.
bet365 also runs an extensive security and fraud detection platform. A future aspirational project is the ability to feed data into a complex event processing platform that helps the gambling industry manage both risk and avoid fraudulent activity.
Previously, funnelling data from multiple applications, locations, services and applications into this system was incredibly complex. Instead, bet365 is experimenting with using wire data to examine, rebuild and generate real-time snapshots of the entire transaction landscape that can be quickly fed into the security and fraud detection platform.
The idea is that wire data will replace legacy solutions, which require multiple physical servers that continually struggle with the complex and growing workload.
However, the technology is not a panacea for all ills. There are still many exotic types of application and information transfer protocols that are difficult to understand, such as within the media industry and utilities sector.
The vendors in this space are making progress and building more intelligent systems that can decode and understand a wider range of IP ‘conversations’ between applications and also devices within an Internet of Things landscape.
>See also: Big data and mapping – a potent combination
Unfortunately, there are few standards in the wire data space, which increases the complexity of getting data out of one system and into another that is better able to provide additional analysis.
A new technology direction that is potentially overcoming this limitation is open-data streams, which allows information captured by wire data tools to flow into relational and unstructured databases, search tools and other analytics engines.
The goal is to enable wire data to seamlessly feed into a wider collection of information sources to provide deeper insights and ultimately answer more complex questions by aggregating more data sources for the analysis.
If the phrase ‘knowledge is power’ is true, then innovations like wire data offer the potential to tap into a whole new strata of information that until now was largely unreachable.
Sourced from Owen Cole, VP EMEA, ExtraHop Networks