The way something is managed always has a direct impact on the results: whether it’s a football match, the construction of a building, or the creation of a data protection strategy, if any steps are left out, if some elements are not suitably reviewed or some considerations not taken into account, the results will fall short of expectations.
Take data backup and disaster recovery strategies for example; they are supposed to protect a company’s most important asset, information. Yet they can be rendered virtually useless if they are not properly designed and resourced. Today many companies aim to provide data access to their staff 24/7—and if this is halted for even a few minutes—the costs can easily add up to millions of Euros. But before we even get to the stage of timing outages we need to look at the possibility of data not being recovered at all.
Managing complexity means seeing it first
One of the world’s leading suppliers of fast-moving consumer goods (FMCG) realised before it was too late how crucial it is to ensure that their SAN and data protection strategies are indeed up to scratch. As the Global Enterprise Computing Director at the FMCG company once said ‘You can’t make good decisions and reduce costs without the right instrumentation. Paradoxically, the more you over-engineer, the higher the level of complexity and associated risks that components will fail. We have 12PBs of storage and nearly 15,000 Fibre Channel switch ports. The multi-path failover wasn’t working as intended and all the traffic from our tape library was going down one side of the fabric. These things send a shiver down your spine but at least you do know and can act.’
The missing link in this case was a holistic and complete view of the overall SAN infrastructure. This is a clear example of how organisations need to take a step back to secure their foundation before they even start thinking about their data protection strategies. It’s fundamental for companies to understand what they need to ensure that they have a satisfactory view of their IT infrastructure as a whole.
Ever-agile infrastructure demands
The promise of virtualised dynamic resource allocation, although extremely compelling, introduces the risk of breaking related management tools. With today’s ability to migrate application workloads throughout virtual systems (internally to private clouds, burst externally to hybrid clouds for increased capacity, etc.)Yesterday’s tools for management are ill-equipped and cannot track real-time performance of those applications as they move across dynamically allocated resource pools.
The physical systems at the core of both virtual and cloud infrastructures perpetually increase in complexity as they become more and more abstracted by the layers of virtualization on top of them. Storage, which is clearly the largest shared resource and, I would argue, the first ‘cloud’ as it was abstracted from the physical server, centralized and shared back out, really has the most limited visibility into use and no visibility into its effect on the performance of the system. There are spot tools that can tell you the performance of an individual array but they’re isolated to that vendor, to that array and to the performance of the array itself, not to the impact of that performance on the overall system.
This introduces several challenges to the market as customers ‘rush to the cloud and drive higher and higher levels of virtualization.’ First, the cloud environment overall is simply demanding new capabilities for what we’re deeming Infrastructure Performance Management (IPM). The fundamental requirements that we see here are: it must be a multi-vendor solution; it must be applicable across multiple hypervisors, storage providers and network providers. It must give real-time, granular visibility into multiple layers of that system and the related performance.
Especially when it comes to data protection and business continuity overall, it’s not sufficient to look solely at the server stack or virtual server stack. It’s not sufficient to look solely at the storage stack.
In order to meet the requirements of Infrastructure Performance Management, you must be able to look across all layers, across a multi-vendor environment’in real-time. In doing so, you enable the ability to eliminate the risk to application performance, and ensure availability from the migration to that virtual infrastructure, or to a private cloud. When you do this throughout a multi-vendor, multi-level system, you can then align costs and SLAs of that infrastructure with application and business requirements.
Guarantee and optimize the whole
It’s insufficient today to look at optimizing a single element to ensure your infrastructure and therefore your data are sufficiently protected. That would be like treating an ankle sprain and inferring that the person is healthy overall. You may be able to optimize storage; but what’s the impact of that effort on the overall performance of the systems that rely upon that storage? Likewise, virtualization may have driven significant increases in the utilization of server assets, but the impact on the underlying I/O may be unknown.
This introduces a second challenge which is that virtual systems management today is platform specific. No customer is running a single platform for all applications and for all server virtualization in their environment. The two most prevalent are both AIX and logical partitioning for the mission critical UNIX based systems and VMware on X86 for the Linux and Windows infrastructure. A true infrastructure performance management system must be able to address both.
Finally, in the physical world, enterprise systems management tools are device-specific and are quite frankly no longer relevant to cloud infrastructure. Storage resource management, the traditional approach to looking at the most costly and complex layer in the infrastructure, has no view into performance of the system and cannot correlate this limited view with specific vendor/device utilization or with overall system utilization.
So what is needed in order to provide a satisfactory data and systems protection strategy?
Today’s application-aligned infrastructure
When you have a cloud infrastructure in particular and even in a purely virtualized space, finding out about a problem or about a performance issue minutes to tens of minutes after the fact is no longer tenable.
Firstly you must measure performance in real time - at line rate (measured to the microsecond and rolled up and delivered to the platform in one-second intervals) and at the protocol level. Real-time is not a look back at an average of averages based on interval polling, averages, or calculated to estimate an actual performance metric.
Secondly, always monitor the system. In order to give context to the protocol level data, you must understand what’s happening all around it. As such, you need a systems level solution acutely focused at increasingly unified computer and storage environments. As large blade centres get deployed against advanced fabrics supported by new virtualized storage technologies, you should always look into each of those device levels and correlate to the underlying protocol data to get a systems level view in real time.