Most businesses are critically reliant upon their IT systems. If these systems go down due to a natural disaster, temporary power outage, loss of data center, ransomware or hacker attack, lost or corrupted files, or an application failure due to a software virus, the results can inflict significant financial harm. In the worst case, the business will be unable to continue functioning. IT resilience ensures business continuity in crisis, but only if executed with forethought and diligence. Here are six tips you can use to make your IT more resilient:

1. Understand your business parameters

Each business has unique risk tolerance thresholds, and thus protection and recovery needs. These factors contribute to how an organization should prepare for handling an unexpected outage. To understand your businesses parameters, first identify your critical systems. Then ask yourself, how much data loss and downtime can I withstand for my critical and non-critical systems? What’s an adequate protection schedule? What would be the impact of being down for an hour, a day, or a week? Can I quantify a potential outage in terms of lost revenue, lost customers and decreased employee productivity?

Answer these questions and you’ll be ready to build a continuity system that works.

2. Enumerate the possible scenarios that will cause downtime for your business

By establishing the possible scenarios that can cause downtime, you can plan appropriately for each case. When planning for a crisis, most businesses assume a natural disaster-type event will be the culprit. Expanding this assumption to include other scenarios, including ransomware attacks, server failures, power outages and user errors (like accidental deletion of data), will lead to a more robust and comprehensive continuity strategy.

3. Specify the technical requirements that will ensure recovery is attainable

Key technical requirements might include items like:

  • Point-in-time snapshots to allow going back in time and recovering the state of data and systems.
  • Full image protection to recover an entire failed system rather than having to build up a new system from scratch; i.e., having to install the operating system, apply maintenance patches, install applications, restore data, and so on.
  • Failover to quickly bring up systems while IT works to restore primary infrastructure, such as restoring power or replacing hardware.

4. Implement a protection solution

When implementing a protection solution, weigh the cost versus coverage in your decision-making process. Apply a benefit-risk analysis, similar to what one does when purchasing insurance, to investigate coverage against the risk tolerance for your business. Next, consider how seamless your total solution needs to be from an IT administrative and recoverability perspective. This will enable you to decide if you’d prefer to use a single-vendor solution or aggregate solutions from multiple suppliers.

Another fundamental decision when implementing a protection solution is whether to replicate data and systems to secondary hardware, which will incur up-front capital expenditures and ongoing maintenance costs, or to shift to the on-demand computing advantage and operating costs of the cloud.

5. Document protection and recovery procedures and keep them current

The midst of a crisis is the worst time to try and figure out recovery protocols. From a disaster readiness perspective, a mature IT organization has a clearly documented protection and recovery plan with detailed, current procedures. Having an executable recovery plan in place enables swift and direct action and less interruption to your business.

6. Test the solution regularly

Building IT resilience for business continuity requires sustained attention. Recovery protocols need to be run through regularly to ensure adequate breadth and depth of test coverage and to uncover any gaps or technical issues before a crisis hits. Changes in primary infrastructure will likely impact your solution. So, the only way to ensure readiness is through frequent testing. IT resilience is dynamic and your solution needs continued attention to ensure successful recovery.

The original article was posted on CSO Online here.