The answer is almost! Even the smallest business, if relying on IT systems for critical operations like sales, production, industrial control system, tax processing or electronic receipt, some kind of DR capability should be in place.
Depending on systems Recovery Time Objective (RTO) and Recovery Point Objective (RPO), it could be from off-site tape shipping strategy, to an automated geo clustering technology. The largest used strategy in SMB and enterprises is backup recovery based, even if it doesn’t meet RPO/RTO expectations. Reasons may vary but the most common I’ve eared are from not having a specialized or dedicated IT professional that could identify and instruct about this need, to not having enough budget to invest in an IT continuity strategy. Also, many think they don’t need that or, that it’s only for very large companies.
A business, any size, could go out of business by not having any DR strategy in place. Imagine that a fire destroys your server’s room and you lose your tax processing system, or your sales control system. You may say, “That’s why we have backup in place!” but, remember, many businesses does backup in a device close, or sometimes over, to the production systems (when not in the servers disk free space itself). This is one of the worst cases but, you could have a backup copy off-site for saving you. Imagine you have that copy, the next question is:
- How long does it takes for you to recover your systems back to production, stable enough to get back processing your sales orders and taxes (include here the time required to get new servers, if needed)?
When you have the answer for this question, think about the questions bellow:
- Does this time fit your RTO, in a way that the financial impact caused by this downtime will not bankrupt your company?
- Does the transaction you lose by restoring the point-in-time copy you have is acceptable to your business?
If you answer yes to all of the questions above, backup/recovery strategy might be enough for your company, if you answer no to any or both of them, you have better think about adding a new layer of DR to your most relevant systems.
You’ll need to leverage another Server’s room somewhere remotely (might be a branch office or a datacenter collocation), duplicate your most critical systems there, and apply some kind of regular data update and connection between them.
That’s what large companies usually do.
It is often expensive in a way that even if the ROI is positive, many SMB/Enterprise business owners prefer to take the risk and keep things as they are. Hopefully, the mindset of “It will never happen with me!” works forever.
In the last years, a new option has become popular, especially for SMB and Enterprise businesses. Leverage the “potential” power of the public CLOUD. If I would open a new business today, any size, I would definitely consider deploying most of the IT infrastructure needed in the cloud. It requires much less investment to build and easier scalability, you pay for what you use, and normally have available most of the resilience and redundancy big companies are able to construct by their own (remember, no continuity strategy nowadays is able to securely substitute backup, it’s always your last layer of protection).
Let’s think about why I stated most of the IT infrastructure and not all in the public cloud.
Some business operations might be high impacted by the latency and bandwidth variation of a public internet link. Also, if the internet connection provider’s options you have are not trustable enough in terms of availability and quality of the service, like the ones we have in Brazil, generating constant broken connection situations, you will probably need to keep your most critical systems in house, not to stop your operations because of link unavailability.
So, how to protect this systems if you should not put them in production on the public cloud?
You could implement DR to the cloud for them, meaning running those systems in house in normal situations but, being able to quickly failover (or takeover) them in the cloud in case of issues on your own environment.
You could implement that by using some kind of regular geo clustering or geo replication solution like Symantec Storage Foundation HA/DR (SFHA/DR), or Oracle Data Guard, or SQL Always ON, SQL/Exchange DAG, and so on. All I mentioned but SFHA/DR are application specific and will work only for that application.
The use of a regular geo clustering/geo replication solution increases the cost of cloud, as usually requires a 1 to 1 relationship, meaning that for each protected system, you will have one corresponding virtual machine (instance) running on the cloud. Because in cloud services you pay for what you use (processing power, memory and storage space), it’s probably cheaper than building it yourself but, it’s not really cost effective.
Symantec has joint with Microsoft to resolve that, and created a new offering called Disaster Recovery Orchestrator for Microsoft Azure (DRO).
DRO was built with the mindset of delivering ease to use, cost effective DR for SMB and Enterprises of all sizes. It can handle RPOs/RTOs of minutes or even seconds, without the need of having many running instances on the cloud. When requested, DRO deploys a new instance, attaches the data, start the application and is capable of reconfiguring what’s needed, like DNS or Active Directory. It orchestrates all aspects of moving an application from your site to Azure, and back.
With that, you could have many on premise systems replicating to one instance in Azure, a relation of N to 1. This means that, while things are normal, you are running your critical production systems at home and DRO is near continuous updating the data on Azure. You are paying for only one instance running on Azure, instead of dozens. If you experience an outage, like a broken hardware, a corrupted OS or a fire in your server’s room, you are able to quickly takeover them to run on Azure (with 2 clicks), with your most recent data in place. This way, you can keep your business operations running with minimum downtime, while you fix the original outage in your environment. After that, you can easily failback to your on premise servers (again with 2 clicks), as it sets replication back if systems are available, or keep track of the changes for further update after systems are recovered.
Another important aspect when having a DR strategy, is that you must test that time to time to ensure will work in case of need. Usually, DR tests require downtime while testing. DRO has a feature, which comes from SFHA/DR, that is the ability of testing without disrupting production systems. You can test your DR environment with no impact on your production. Your business can keep running while you ensure your DR infrastructure is health.
For more details and trial: http://www.symantec.com/disaster-recovery-orchestrator
Webcast Microsoft: https://msevents.microsoft.com/CUI/EventDetail.aspx?EventID=1032589562&Culture=en-US&community=0