Outages Bring Cloud Monitoring in Focus!

The outage of Amazon's cloud service impacted many businesses hosted in their cloud data center

Over the last week, Amazon’s cloud service had a serious outage that caused many popular web businesses to go offline for several hours and resulted in significant loss of business.

All of a sudden, many in the press (and users as well!) are beginning to realize that applications hosted in the cloud are actually hosted on servers in data centers and are hence, prone to same kind of problems as servers in their own data center. Just because you have not purchased a server or have to provision it, provide power/space, etc., does not mean that the server is failure-proof. As this article indicates, failures can happen due to any number of reasons – a hardware failure, a network outage, an application coding error, etc. Even a configuration error inadvertently made by an administrator can cause catastrophic failures.

Many have gone over-board, predicting the end of cloud computing! If you look at the service contract from these cloud service providers, they have not guaranteed that the infrastructure will be 100% failure-proof. With cloud computing as with everything else, you get what you pay for. Not every business that used Amazon suffered during this outage. The outage was limited to the Amazon east coast (Northern Virginia) data center and for enterprises that had paid for Amazon Web Services’ redundant cloud architecture, it was business as usual. Netflix, the popular movie rental site, was one such.

"You get what you pay for" applies to monitoring tools as well!

Outages like the one Amazon had bring cloud monitoring tools into focus. The saying “You get what you pay for” applies to monitoring tools as well. If you are looking to be alerted once a problem happens, a simple low-cost up/down monitoring tool suffices. On the other hand, if you are looking to be like Netflix and be proactive, want to detect problems before they become revenue impacting, you need a monitoring tool that can alert you to abnormal situations in advance, well before users notice the problem. More sophisticated cloud monitoring tools can also help you rapidly triage where the root-cause of a problem lies – i.e., is it in the cloud data center? is it in your application? is it in the infrastructure services (DNS, Active Directory, etc.)?

Monitoring tools provide insurance cover for your infrastructure. Like Netflix would have assessed the cost of redundancy vs. the benefit from having their business up during the outage, you should assess the return on investment from a monitoring tool.

There are several ways to assess the ROI from a monitoring tool::

  1. By the number of times the monitoring tool can help you avert a problem by proactively alerting you and enabling you to take action before users notice the issue;
  2. By the time the monitoring tool saves by helping you to pinpoint the root-cause of a problem;
  3. By the amount of time that the monitoring tool saves for your key IT personnel by allowing your first level support teams to handle user complaints;
  4. By the savings that the tool provides by enabling you to optimize your infrastructure and to get more out of your existing investment, without having to buy additional hardware or to use additional cloud services.

Related links:

The top requirements for a Cloud-Ready Monitoring SolutionClick here >>>

Customer Testimonial – Citrix Monitoring

Many thanks to Mr. Jacob Ackerman, Director of Information Technology at Horizon Business Services for taking the time to write about his experiences with the eG Innovations software and services. We value every one of our customer relationships and it is accolades like these that motivate us to even greater heights.

In his recent blog post, Mr. Ackerman writes:

Anyone looking to monitor your Citrix farm needs to take a serious look at eG Innovations – http://www.eginnovations.com/. They have a great product and unbelievable support.

Originally, I was looking at Citrix EdgeSight to monitor our XenApp 5 farm. While it’s a good product, it didn’t monitor the health of our entire farm/network. Problems caused by logon servers, database issues, firewalls, etc. just weren’t caught by the EdgeSight – it was just too focused on XenApp. We needed something that went beyond just XenApp.

We’ve been using eG Manager for about a month and a half, monitoring Citrix XenApp, Citrix STA, Citrix Web Interface, Citrix License Server, Domain Controllers, Active Directory, SQL, Windows DNS, Dell iDRAC, standard WMI and Cisco ASA.

Setup and implementation times were a fraction of what I thought it would be. The system comes with many preconfigured components each with their own preconfigured tests and thresholds. Within several days, we had metrics that made sense and allowed us to tune our farm.

They also have some of the best support I’ve seen. The software is easy to use but with the scope of what can be monitored, questions come up. Their technicians have been fast to respond and have been spot on in their responses.

Mr. Ackerman has picked on every one of the points we strive to be the best at:

  • Monitoring of business services end-to-end, not just silos;
  • Rapid implementation, ease of use, and low cost of ownership
  • Delivering return on investment in weeks, not years!
  • And the best support possible, so customers value and enjoy the value of working with us.

Justifying the Cost of A Monitoring Tool

Procuring a monitoring tool is like getting insurance coverage for your IT infrastructure! As long as your infrastructure is working fine, monitoring is not at the top of the mind. The moment there is an incident, the first questions are: what changed? what parts of the infrastructure are not working? where is the root-cause of the problem?

The best way to justify the investment in a monitoring tool is by considering the cost of downtime. Downtime costs vary by industry and are based on a company’s dependence on IT for its day to day operations and the nature of services that are enabled by IT. The chart below illustrates the average downtime per hour for different industries. Besides the financial cost of downtime, you also need to consider the loss of customer confidence, liability, and lost current and future business.

The cost of downtime by industry
The cost of downtime by industry

The above data can be used to build a simple business case for procuring the monitoring solution. Consider the case of a financial institution looking looking to spend $200,000 for a monitoring tool. From the above chart, a $200,000 investment translates into 8 mins of downtime. The decision this company needs to make is whether the monitoring tool can prevent 8 mins of downtime. Suppose the tool is being deployed across 100 servers, this translates to just 5 secs of downtime per server!

Click here to view the results of a TechValidate survey of eG Enterprise users to see the benefits that you can expect from an enterprise-class monitoring solution.