LOADING
×

Disaster Recovery in Cloud Computing

This article provides a brief idea about what is Disaster Recovery and how it can be implemented using different cloud computing platforms like Microsoft’s Azure, Amazon’s AWS and Google’s GCP and gives a comparative study among the platforms.

Disaster Recovery in Cloud Computing

What is disaster recovery?


Disaster Recovery is the process by which we can protect our data during times of disasters like a natural calamity like earthquake or fire or during any network or power outage when our data becomes vulnerable to being destroyed or compromised. Our data is the most important asset when it comes to business continuity and so protecting it is the topmost priority of any business. Creating a backup of data to an offshore data center can be one of the methods of disaster recovery when in case of an on-premise server failure, the redundant data can be restored from the offshore data centers thus the data is saved and always available. There also lies the cost of underutilizing the redundant servers when there is unpredictable traffic and it will also become tedious to set up new servers when the traffic rises. Therefore, setting up backup and DR on the cloud can help companies to seamlessly transition from physical data centers and also provide a huge amount of cost-cutting.


Basics of DR Planning


DR is a subset of business continuity planning and it begins with a business impact analysis that again defines two different key points that need to be evaluated to successfully create a DR strategy -


RTO ( Recovery Time Objective ) - It is the maximum permissible length of time for which the applications can remain offline. This value is usually defined as part of the SLA that is provided for the DR application one uses.


RPO ( Recovery Point Objective ) - It is the maximum permissible length of time for which data can get lost from your application due to any major incident. RPO can vary with the type of data that is being referred to. For frequently used data, RPO could be for less than a minute whereas for less frequently used data, it can be for several hours. However, RPO does not define the amount or quality of data that is lost, it just explains the length of time for which it is lost from an application.



Disaster Recovery using the most common cloud platforms in market - A Comparative study


1. DR using the Microsoft Azure platform -

Microsoft Azure cloud is increasingly gaining popularity amongst small and medium scale enterprises not only for adding applications, data and users but also as a DR strategy to keep their data secure and be able to remain connected to their business-critical data during times of disaster or any kind of outage like network or power outage. Azure provides a range of disaster recovery services that can be set up and kept ready to provide protection 24x7 to business assets and IT infrastructure. Azure Site Recovery replicates workloads that run on physical and virtual machines on a primary site to a secondary location. Whenever there is an outage occurring at the company’s primary site, the workloads automatically failover to the secondary site where the applications can continue to be available to the users. When the primary site returns to normalcy, all workloads can be re-transferred back to the primary location automatically. Azure Site Recovery manages disaster recovery for Azure virtual machines, for on-premise virtual machines and for physical servers running on Linux, Windows, VMWare and Hyper-V, etc. Azure Site Recovery comes with a 99.9% SLA and 24x7 support to keep applications of users running smoothly during times of outages.  It can either be used as a secondary site during times of outages in the primary location or it can be used by configuring it between Azure regions and using it during outage in any other region. Users only pay for the compute and storage resources needed to run applications in Azure during outages. There is no start-up costs or termination fees associated with using Azure Site Recovery during outages and billing is completely based on the number of instances protected.


DR using Azure is preferred by organizations over the traditional on-premise data center deployment option because of certain advantages that it provides. They are -

Reduced Operational Costs - There is no need to build or manage infrastructure and companies can use Azure DR only by paying for the resources actually used.


Simpler Deployment and Management - Azure’s DRaas enables its users to avoid the complexity and costs which are associated with a general implementation of any DR procedure.


Scalability - Azure’s DR procedure is a highly scalable solution and can be integrated with different environments and also can be scaled up or down as per operational requirements.


Minimal Downtime - Azure’s DRaas is a solution which guarantees 99.9% SLA of uptime and so it never goes out of service during runtime.


2. DR using AWS Cloud -


AWS offers 4 different methods of DR strategy that can be implemented in various ways. These are as follows -


Backup and Recovery - To recover data in times of any disaster, it first needs to get backed up at regular intervals to AWS cloud from the systems. Backing up of this data can be done through various methods which can be decided based on the metric of RPO, Recovery Point Objective. For example, if the disaster is struck at 2 pm and the RPO is 1 hr, then Backup and Restore will restore all data till 1 pm. AWS offers AWS Direct Connect and Import Export Services that helps in a faster backup. Now for a very frequently changing database like data that is circulated in the stock markets which keeps changing very frequently, data must be backed up with high RPO. On the other hand, if the data is not changing very frequently then back up can be set up in the periodical incremental way. After the backup mechanisms are set up then we can pre-configure the application software or operating systems and then when the disaster strikes, EC2 ( Elastic Compute Capacity) instances in the cloud using EBS( Elastic Block Store) coupled with AMI’s can access the data from S3(Simple Storage Service) buckets to revive the system and keep it working normally.


Pilot-Light Approach - In this particular approach, we draw the analogy from a gas heater where a small flame can ignite the larger flames of the burner. Firstly in the starting phase, the on-premise database servers copy the data to data volumes on AWS. These AMIs can be updated periodically. This is our entire burner from the analogy. When the on-premise systems fail, the application and caching servers get activated, further users are rerouted using elastic IP addresses to the cloud environment. Then the recovery just takes a few minutes.


Warm Standby Approach - This technique is the next level of Pilot approach which reduces recovery time to almost zero. The application and caching servers are always activated based on the business critical activities but only a few numbers of EC2 instances are dedicated. ELB and Auto-Scaling ( for distributing traffic) are used for scaling up. Secondly using Amazon Route 53,  user traffic is rerouted easily using elastic IP addresses and there is an instant recovery of your system with no downtime at all.


Multi-site Approach - This is the optimum technique for backup and DR after warm- standby. All activities in the beginning phase are similar to warm standby, just that AWS backup on the cloud is also used to handle portions of the user traffic using Route 53. When a disaster happens the rest of the user traffic is rerouted to the AWS cloud from the on-premises server and using auto-scaling multiple EC2 instances are engaged to handle full production workload. The availability of the multi-site solution can further be increased by implementing Multi-AZ Architectures.


3. DR using GCP


GCP fares better than the other cloud-based platforms in terms of DR because it addresses the complicated planning requirements related to physical and network infrastructure, scalability, security, bandwidth and other parameters. Applications and data in GCP use the same security model as used in all Google apps and are very reliable and secure. Also, simplified administration through GCP also causes less costs in managing complex applications. GCP is considered environment-friendly also because the majority of GCP data centers typically use half the energy consumed in normal data centers.


Advantages of disaster recovery using GCP -


GCP has a number of features that make GCP a better disaster recovery platform than its competitors. They are -


Better Pricing - Users of GCP disaster recovery platforms have to pay only for the compute time used with the billing taking into account the minute-level increments into consideration. Also, long-running workloads are provided with discounted pricing.


Tiered Cloud Network - GCP is the first major public cloud to provide a tiered cloud network with a well-provisioned, low-latency global network which ensures continuous traffic flow even in times of disruption.


Live Migration - GCP provides live migration for virtual machines, a feature that is not offered either by AWS or Azure. Live Migration ensures that issues such as patching, repairing and updating the software and hardware are properly addressed without any reboot of the machine.


Security - GCP offers advanced security features which include encryption of all data using 256 bit AES and compliance agreements with enterprise security certifications like SSAE16, ISO 27017, ISO 27018, PCI and HIPAA.


Redundant Backups - GCP provides multiple storage options like coldline storage, nearline storage, regional storage and multi-regional storage with 99.9% durability and zero data loss during times of disaster.


Wide Network - Since GCP is aided by Google’s widespread and advanced global network, GCP can offer its users an effective platform that can help them to have fast, scalable and consistent performance even at times of disaster.


Scalability - GCP can scale like any other application that Google has designed and is backed by managed services that enable it to withstand the sudden rise or fall in online traffic.




Trendy