Moving along our organizational resilience journey, we focus on disaster recovery (DR), the perfect follow-up to business continuity (BC) The two go hand-in-hand, often referenced as BCDR, and both are key to your cyber resilience planning. If you recall from the previous piece, NIST SP 800-34 calls out a separate disaster recovery plan, as it supports business continuity and continuity of operation plans.
Small aside, the difference between a business continuity plan and a continuity of operations plan is that the latter focuses on restoring essential functions in the short term (think 30 days or less).
Before we jump in, think about it like this: if BC is about backup strategies for people, DR is about backup strategies for all the stuff people rely on. Therefore, you must understand your business drivers if you want to make an effective plan.
Cyber resilience for disaster recovery: How is it changing?
This is a very good time to discuss disaster recovery (DR), as it goes through a change of sorts. Therefore, let’s start with some classic definitions of DR plans (bold my emphasis):
- NIST SP 800-34: A written plan for restoring one or more information systems at an alternate facility in response to a major hardware or software failure or destruction of facilities.
- NIST SP 800-82: A written plan for processing critical applications in the event of a major hardware or software failure or destruction of facilities.
- DRI International (Disaster recovery, not disaster recovery plan): The technical aspect of business continuity. The collection of resources and tasks to restore IT services (including components such as infrastructure, telecommunications, systems, applications and data) at an alternate site following a disruption of IT services. Disaster recovery includes the resumption and restoration of those tasks at a more permanent site.
Side note: DRI International defines ‘disaster recovery plan’ the same as NIST 800-34, but you will see shortly why I picked the broader meaning.
What jumps out at you? Keep those bolded words in mind for later in the piece.
More on cyber resilience
Is high availability always good for cyber resilience?
In a word: yes. But there are risks that come with high availability, mostly related to ransomware. Therefore, don’t fall into the trap of thinking it is a true disaster recovery strategy by itself. NotPetya proved how fast systems can be wiped away, where IT workers were “frantically notifying everyone they could” to shut down systems to stop the spread. Remember, efficiency can favor the attacker also. That’s why efficiency (ease) and security (redundancy) should never be used side-by-side when trying to harden systems.
Also remember this: getting your infrastructure up is probably the easy part of disaster recovery; having your data open for use may be the real challenge. You have heard of the saying ‘cash is king’, right? Well, data is king too; it’s today’s most valuable currency.
The things you will always need to do
Regardless of which security framework you end up using, there are a couple of things you must do each and every time to check your cyber resilience:
- Running a business impact analysis
- Determining recovery time objective, recovery point objective and maximum tolerable downtime, sometimes referred to as maximum tolerable period of disruption or maximum acceptable outage.
These terms are generally self-explanatory, though you can find detailed definitions at NIST Computer Security Resource Center Glossary or the ISO 22300:2021 Security and Resilience Vocabulary (Section 3.1). Check both to determine what is right for you.
Massive pro tip: definitions matter. Cyber resilience planning requires multiple stakeholders. They may not have any major security, safety or resilience needs as part of their jobs, but be all about efficiency. Therefore, get on the same page with them.
Disaster recovery plans
Remember the words in bold? Well, we are going to talk about them in just a moment. Before we do, let’s list out some pretty obvious recovery strategies:
- Internal backup facilities: Great to have, but could be costly and could go down with your primary system.
- Hot site: Off-premises location ready to go. Again, helpful for cyber resilience but can be costly and you may have to rely on a third party. If you are sharing resources with others, what place in line do you get if everybody gets knocked down?
- Active/Active: Geographically clustered and real-time data replication. A nice strategy, but could be a a costly one.
This is a tiny list, so check out this IBM and Tivoli Storage Manager page, which explains the seven tiers of disaster recovery. Once you’ve gotten that, we want to get to the fun stuff, namely cyber resilience in the cloud.
Whatever strategy you end up using, remember: it’s useless if you cannot access your data. So remember your cyber hygiene and back up, back up and then back up some more, because you need data. A data center and an application running is useless without your data.
Cyber resilience in the cloud
What word got some bold shout-outs earlier? FACILITIES! We are nearing the end of this piece and we haven’t even mentioned the cloud yet. So, the trillion dollar question: is the cloud a suitable disaster recovery strategy?
Well, just like when you ask an accountant “What does 2 + 2 equal?”, the answer is: “It depends.”
Now, let’s up the ante: “Do I need a disaster recovery plan if I am operating in the cloud?”
Cyber resilience in the cloud changes our approach. Disaster recovery in the past has focused on facilities, data centers, colocation sites, third party vendors, suppliers and so on. Those are not exactly aligned with the cloud, right? The considerations and risks are different, from operations to availability zones to data destruction practices.
Questions to ask regarding cloud providers
The questions, therefore, change. Ask yourself:
- Can I transfer to another cloud provider? This is perhaps easier said than done, even more so if you have built your application natively into the cloud. You may be stuck in vendor-lock as certain tools are not there or cannot transfer to another provider. Think of the days when your mobile phone was locked into one carrier. You got a subsidized piece of hardware, but couldn’t take it to another network. There is always a cost.
- What guarantees do I have? Imagine a public cloud service provider goes down. The impact will be widespread and they will work to get you back up and running, but what you need to know is: when do I go back up? Do you know? Can you find out? Go back to the first bulleted question: if you are locked with a service provider — whatever the reasons — you may be at their mercy and you need to consider that.
- Will it matter where backup resides? Ask before disaster strikes. If your cloud provider says, “We can get you up and running, but your data will no longer be in your home country; we need to ship it off across an ocean to operate,” are you able to operate like that? What are the legal implications? Terms like ‘region,’ ‘availability’ and ‘service’ all need to be well-defined before you throw all your eggs into the cloud.
Our journey through organizational resilience, focused on reducing your cyber risk, continues on. With key concepts like business continuity and disaster recovery under your belt, you are building to bend while others break.
Senior Director, Educator and Author