'Design for Failure': A Reviving Strategy for the Cloud Model

By CIOReview | Monday, July 18, 2016

One of the SaaS based cloud computing management solution provider ‘RightScale’ reported that myriad of users prefers to adopt Public Cloud over Hybrid and Private Cloud systems. The movement of enterprise IT towards the public cloud adoption has opened the doorways for different technology omissions such as an outage. An outage crashes the whole system as encountered by Amazon Web Service (AWS), closing the service for a particular duration of time. This breakdown brought in the idea of “design for failure” in the cloud continuum. This new thought helped them suppress the effects of outage even without implementing any Disaster Recovery (DR) strategy.        

What does design for failure mean?

Traditional applications are found to be dependent upon the availability of the underlying infrastructure where outages have adverse effect on the business. But, this dependency over infrastructure has been overcome by cloud applications which can be designed to resist big infrastructure disasters. The cloud applications act as the “strength” and “weakness” of a cloud model. Its strength is characterized by withstanding outages, and minimizing developer’s dependency on infrastructure, which helps developer’s to achieve 100 percent uptime for the cloud applications. Cloud applications that are not designed for failure collapse the whole virtual machine.

Trying to catch failing servers or broken code, Netflix realized that breakdowns are meant to be a part of the system and it is important to ‘design for failure’ so that outages may have minimal impact on the business-critical operations. George Reese, CTO, enStratus mentions some of the basic steps involved in the design of applications to curtail it are:

  • Each cloud application component must be deployed across redundant cloud components, ideally with minimal or no common points of failure.
  • Every application component must make no assumptions about the underlying infrastructure. The applications must adapt to changes in the infrastructure without facing any downtime.
  • Each application component should be able to survive network latency (or loss of communication) among the nodes that support the application component.
  • Automation tools must be deployed in place to orchestrate application responses to failures or other changes in the infrastructure.

How to architect with design for failure approach?

AWS is the ‘Design for failure’ model which combines both the software and management tools for application availability during system failures—giving cloud model 100 percent uptime during outages. And to build such a durable cloud model, organizations must know how to design high-availability architectures on AWS.

Building VPC (Virtual Private Cloud) Environment: AWS VPC offers flexibility and security, enabling users to compute resources like EC2 and RDS. This gives control over all the inbound and outbound network traffic.

Setup AWS RDS (Relational Database Service): RDS eases the effort to set up, operate, and scale relational database in the cloud. AWS RDS is compatible with six different data engines—Amazon Aurora, Oracle, Microsoft SQL Server, PostgreSQL, MySQL and MariaDB. 

Auto-Scaling our application: Auto-Scaling ensures the correct number of Amazon EC2 instances available to handle the load for running cloud applications. User can provide the maximum number of instances in each Auto Scaling group, which ensures that the size does not exceed the desired capacity.  

Add Route 53 to the mix: The final step of architecture is adding Route 53 to the above mix. Route 53 is a highly available and scalable cloud Domain Name System (DNS) web service. It also connects user requests to infrastructure running in AWS—Amazon EC2 instances, Elastic Load Balancing load balancers, or Amazon S3 buckets, and can also enables users to leverage infrastructure outside of AWS.

Leverage CloudFront: CloudFront is a web service used for speeding up the static and dynamic web content such as .html, .php and image files. With CloudFront, users can receive content with lowest latency. This web service improves performance of the content by directing each user’s request to edge location so that it can best serve the request.