Human error poses the biggest internal threat to business continuity as the data centre industry refuses to wake up to change, according to intelligent PDU provider Enlogic. The firm conducted its latest survey[1] of industry professionals and analysis shows that more than a third of people viewed human error as the most likely cause of downtime. Equipment failure and the external threat of power outages were second and third in the top responses.
Unintentionally adjusting the temperature from Fahrenheit to Celsius; accidentally pulling power cords from an IT asset; or inadvertently overloading a circuit by accidentally plugging in a server are just a few ways in which data centre workers can cause a crash.
Unplanned downtime can lead to financial loss as well as reputational damage to businesses. A natural response is to implement a redundancy plan which requires investment into additional equipment that remains idle until it might be needed. This approach increases upfront purchase costs and escalates energy bills by sending power to idle servers; Enlogic urges companies to reconsider using this method.
“Human error has actually cropped up numerous times as the most prominent threat in this kind of survey; yet the industry allows the problem to remain. It’s deeply concerning that managers have been aware for some time now of the biggest cause of downtime and despite the solutions on offer, they have failed to implement the right technology”, comments Paul Inett, Vice-president, Enlogic EMEA.
“The general consensus is that a spate of downtime can cost anything from £50k to £1m per hour, depending on the company. Estimates from the survey suggest that just one minute of downtime can cost as much as £100k, so if a data centre suffers downtime for 60 minutes or more then it is likely to go out of business. If that critical failure has been caused by something as simple as a circuit overload, it’s a really painful mistake that could have been prevented. You’ll never eliminate human error from the data centre, but you can make choices in both technology and training that will help to reduce its severity and impact.”
Inett continues: “Interestingly, a quarter of data centre professionals cited equipment failure as the main reason for downtime, which proves that managers need to think about reliability over price when selecting technology. This doesn’t mean doubling up on investments to create redundancy. It’s about understanding terms such as ‘hot-swappable’, which means you can replace the network management card within a PDU, for example, without powering down the rack. When you consider that without a hot-swappable device, a minimum of 30 minutes downtime will occur while servers are unplugged and wired in to a working PDU, this level of understanding is invaluable to prevent downtime spiralling out of control.”