Guest writer Anthony Osborn examines the cybersecurity benefits of correlating datasets in this new and exclusive piece.
In the last decade, the number of networks, devices, apps and cloud components a typical IT department needs to handle has exploded in complexity. As has the list of vendor tools that are available for monitoring how effectively the different parts of an entire IT infrastructure might be performing. But the correlation of datasets from multiple systems isn’t always straight-forward.
What Exactly Is Dataset Correlation?
The general term ‘data correlation’ can have multiple meanings. It can be used to describe anything from how well the latest Covid-19 vaccine dosage quantities are working on patients, through to the impact interest rates have on house prices. As a ubiquitous term, its use can sometimes be misleading. But ‘data correlation’ usually refers to the testing of relationships between quantitative variables or categorical variables. In other words, it’s a measure of how things are related. In this post, we’re referring to correlations between IT datasets.
What Are the Typical Dataset Correlation Issues Affecting IT Security & Monitoring?
A recent report by Dell Technologies stated that ‘most advanced Cybersecurity attacks will go unnoticed for an average of 197 days. This is usually because the data correlation signals are simply hidden in plain sight.
Other times the warning signal might be triggered in one system; for example, an endpoint or malware issue related to a network or device ID. But without correlated data, it might not be immediately obvious which users or groups could be affected. So, the mean-time-to-resolution (MTTR) can be held back.
Sometimes you need to cross-correlate IT Security & Monitoring data with other organisation data. For example, if you’re running an online business and your server goes down, due to an issue with hitting max CPU during the busy Black-Friday weekend. A quick correlation with the corresponding ‘cost-of-being-offline’ data can aid justifications to senior management regarding your department’s requirements, for example by boosting server bandwidth during times the server is likely to max out again, thus saving your organisation in potential lost revenues.
A quick correlation with the corresponding ‘cost-of-being-offline’ data can aid justifications to senior management regarding your department’s requirements […] saving your organisation in potential lost revenues.
What Makes IT Dataset Correlation So Difficult?
Most attempts to correlate data across different platforms means IT Managers must manually login to these different systems simultaneously before they can even begin to make sense of the data.
Even when they can access the relevant data, often the platforms will show the data in different unit/metric formats or date/time-ranges, and the interpretation of these metrics could vary from one IT manager to another, and their export formats will probably differ. That means errors get introduced and there is seldom an agreed ‘single-source-of-truth’, which could cause disagreements within the team or, even worse, the important warning signal could be lost in the noise.
A highly skilled Security/Monitoring/IT Manager will usually have great ‘domain’ knowledge for the datasets they’re used to working with. But they aren’t necessarily also a Software Engineer or a Data Architect/Data Scientist, which means they won’t be in their comfort zone accessing data via the APIs and they could also quickly become stuck wrestling with and wrangling through huge datasets.
What’s more, once you start moving data around there are data protection and governance policies that must be adhered to, which can present new challenges to the unfamiliar.
What Solutions Are There for Improving the Data Correlation Between Various Components of Your IT’s Infrastructure?
There are different ways you can improve your IT infrastructure’s overall dataset correlation options. With each having different cost, timescale, performance and robustness implications. Here are a few different approaches:
1. Self-Build In-House
Many companies will attempt to address data correlation issues by building solutions in-house. This can work well for building something bespoke, but be prepared for it to turn into a very large project as there can be several moving parts. You’ll require software engineers (to connect to the respective APIs), Data Engineers (to correctly model the data flows and data-pipelines), Data-Governance folk (to ensure you’re adhering to GDPR & other data-policies), and Data Scientists/Analysts and Data-Visualisation experts (to convert the modelled data into easily understandable insights). And as this whole piece can easily require a whole team effort, then they’ll need a Project/Owner/Manager, too.
2. Buying In (Yet More) Software Tools
Another typical approach to resolve data-correlation issues is by buying-in off-the-shelf tools, with a view to then stitching the output from each together. For example, you might purchase Alteryx for the ELT and Tableau, Sisense or Looker for your data visualisation. Whilst these tools are good in their own rights, they aren’t going to connect to all your data sources, nor will they address all the data flows or data automation for you. Also, systems built out of lots of bolt-on component parts can easily fall apart. They are only as good as their weakest link, which means they can run the risk of becoming like the recent U.K.’s track and trace project, where Excel proved a very poor system choice and critical life-and-death data was simply lost.
3. Correlated Data, Built Exactly to Your Specifications
An entirely different approach for data correlation is to have all your API connectors, data modelling and data visualisations, dashboards, portal access and so on fully built for you. The advantage with this approach is that you’ll be using a team that already does this day-in-day-out, and your work will sit atop a proven codebase. With this, you’ll get the highest-quality build available and you’ll be well set up for scale and continuous improvement.
This ‘built-for-you’ approach also means a third party is then responsible for the data connections, data modelling and keeping all those live connections running, using the best-in-class data visualisations for communicating insights, so the headache is off your plate. A great option to use for this would be a platform like Stratiam. The team behind Stratiam already have a growing library of API connectors, ready to plugin to your data sources and get you setup quickly. Custom APIs can also be created, for any API, and their entire system has been built to be fully compliant of GDPR and other policies and is thoroughly pen-tested on a regular basis.
Roadmap for the Year Ahead
Whichever approach you take for improving how your IT datasets can cross-correlate will add massive value to your overall IT infrastructure and will bolster your overall IT Security & Monitoring capabilities. If you don’t currently have plans in place to address this yet, then they should be added to your roadmap for the coming year.
About our Guest Writer
Product Manager, Intergence
Anthony Osborn has been working with digital data for over 20 years. He has an aptitude for distilling huge datasets down into meaningful insight. As a ‘visual thinker’, Anthony believes all datasets deserve a beautiful front-end from which trends, anomalies and signals can be communicated, explored and acted upon.
Prior to architecting Stratiam’s product development roadmap, Anthony was the Digital Analytics Manager at Dyson.