Title: Clustering Solutions and Zero Downtime Hosting Pitfalls Author: Godfrey Heron Email: info@irieisle-online.com Word Count:1452 Copyright: © 2005 by Godfrey Heron Article URL:www.irieisle-online.com/zero-downtime-hosting.htmPublishing Guidelines: You may publish this article in your newsletter, on your web site, or in your print publication provided you include resource box at end. Notification would be appreciated but is not required. -------------------------
Clustering Solutions and Zero Downtime Hosting Pitfalls
There are a number of benchmarks, which we may use to evaluate hosting companies. One of these is, reliability.
Like most things in this life, reliability in web hosting is typically a function of how much we are willing to spend for it. In essence, a “cost-effectiveness” equation needs to be determined and solved.
Reliability can be measured in terms of percentage availability. Industry personnel will talk of reliability in terms of system availability with three (99.9%), four (99.99%) or five nines.(99.999%).
Typically, web-hosting availability exceeding three nines was purvue of extremely large companies with multiple layers of redundancy built into their network and software systems. However technology has now brought high-availability theory and cost-effective reality into alignment.
High availability can be achieved by removing, as far as possible, any “single point/s of failure”, or, where this is not altogether possible, minimizing time spent in a “failure” situation.
One of ways in which small businesses and ISP’s can reasonably avoid single point of failures is by employing server farm clustering and load-balancing solutions.
Webopedia defines server farm clustering as follows:
“A server farm is a group of networked servers that are housed in one location. A server farm streamlines internal processes by distributing workload between individual components of farm and expedites computing processes by harnessing power of multiple servers.
The farms rely on load-balancing software that accomplishes such tasks as tracking demand for processing power from different machines, prioritizing tasks and scheduling and rescheduling them depending on priority and demand that users put on network. When one server in farm fails, another can step in as a backup.”
It is important to note, that typically, web servers, which are load-balanced in such a manner, display one external IP address to public Internet, while using internal network IP’s to communicate between clustered servers and load balancer. Now this is indeed fantastic! Not only do you receive web site peak demand scalability with web server clusters, but you also have built-in “high uptime availability” component which is so important.
However this is only half of picture. There are very important cautionary notes to keep in mind.
Where web hosting is concerned, availability depends on two things:
1.Hardware reliability (RAID drives, server clustering etc) within Data Center;
2.High Bandwidth Internet Connectivity to Data Center / Network Operating Center (NOC).
Now, with all your well thought out server clustering solutions, what would be result, if, (as had recently occurred in a very high profile web company), a fire in Network vicinity had caused entire Data Center to shut down power for hours. Or, a bandwidth provider to NOC had router problems. All your websites would be showing dreaded “Page Cannot be Displayed” page.
The ideal solution therefore would be to employ clustering solutions with servers in entirely different Data Centers with different bandwidth providers. Redundant Data Centers eliminate NOC itself being a single point of failure. This scenario becomes interesting at this point, because difficulty of addressing potential problems now increase exponentially.
We now have to deal with DNS caching, concept of failover, and how static and dynamic web applications respond to failure events.
Failover and Load balancing are frequently used interchangeably, however they are in fact quite different.
·Load Balancing refers to physically sharing servers capacity, so that one server is not overloaded and swamped with requests.
·Failover however, is process that manually or automatically switches a failed server or bandwidth provider to a standby server or network if primary system fails or is temporarily shut down for servicing.
As such, failover software is an important function of mission-critical systems that rely on constant accessibility.