[Skip to content]

Share this page

  • Add this article to your LinkedIn page
  • Add this article to your Twitter feed
  • Add this article to your Facebook page
PA Consulting Group MCA Awards - 2012 Award Winner
Contact PA Consulting Group now for more information

United Kingdom
+44 (0)20 7333 5869

United States
+1 212 973 5943

or for further information visit www.paconsulting.com/contact
Search our Site or contact us
contact us now
.
Smart cloud web banner

"The key to confidence in your cloud infrastructure could lie in regular failure, introduced randomly by the ‘Chaos Monkey'."

CONRAD THOMPSON, PA CLOUD EXPERT

 

A smarter solution: building with confidence in the cloud

The degree of complexity and integration inherent in IT systems incorporating cloud computing means each new launch or upgrade can be a source of anxiety. Yet, the key to having confidence in your cloud infrastructure could lie in regular failure - introduced randomly by the 'Chaos Monkey'.

What is a 'Chaos Monkey'

The Chaos Monkey is a robotic servant employed by Netflix, the US movie rental service. To ensure that their systems work reliably, the Chaos Monkey randomly switches off services and server instances within their Amazon Web Services cloud infrastructure. Because this testing regime forces the system builders to design for resilience, reliability increases. They know that their creations need to survive unpredictable failures like those introduced by the Chaos Monkey.

Chaos Monkey and the cloud

The cloud is the natural habitat of the Chaos Monkey. Reliable cloud-based applications must tolerate changes in infrastructure while running. Cloud applications are typically assembled in tiers of multiple components, with the tiers able to 'autoscale' - growing or shrinking in response to varying demand.

Furthermore, since the individual underlying components (for example Amazon EC2 server instances) run on commodity hardware, they can and will fail occasionally. A reliable application needs to cope with such failures and careful design can help ensure nobody notices.

Netflix do this well. For example, if their personalised recommendation function is unavailable, that section of the page can be replaced with a general list of popular films; from a user's perspective the website still works and they might not notice the drop in functionality.

Outsmarting the Chaos Monkey

When PA Consulting Group recently built a large-scale cloud application for a client, we gave careful consideration to failure scenarios within our design. Statistically rare failures can become regular issues in high-volume 'web scale' solutions unless you accept that failure is inevitable and design to handle it gracefully.

One of the keys here is keeping your design as simple as possible. By minimising as much as you can, whether it's the 'chattiness' of the interfaces between your components or the complexity of your data set, you give the Chaos Monkey less chance to catch you out.

To find out more about how PA can help you build confidently in the cloud, please contact us now.