Preparing for Disaster Recovery Pt 1.

Disaster Recovery and Business Continuity or DRBC is traditionally a complete reproduction of a production environment in a different geographical location for the purpose of continuing business in the event of a disaster. The process used to mean preparing the hardware and applications needed in advance, at great cost, though in the modern age of Cloud based architectures and Devops tools for provisioning automation we can create the tools to provision a production environment dynamically reducing the overall cost of having a full DRBC plan in place. 

If you haven’t taken the time to document your disaster recovery plan in detail before taking your environment live then if and when the need arises you’ll find that the reactive plan is a difficult and complicated process. Think of your disaster recovery plan as the total timeline to bring your services back online with no dependencies. If you’ve already built your production environment you must ask yourself “how quickly can i do this again?” and “is there anything i can do to speed up the process?”

I will typically detail the manual steps for building the entire project out for documentation purposes resulting in a linear list of steps we can then seek to automate, both reducing our build time and susceptibility to error. In this example website application i like to address the following requirements for each environment 

Regions

What hosting regions are in use for this site? Is this site a multi-region architecture?

Domains and sub-domains in use

What are the domains assigned by default?

What are the custom domains assigned to this site?

What are the custom sub-domains assigned to micro-sites?”

SSL Certificates required

Is https in use? required?

Is an SSL certificate installed? if yes, what domains?”

Number of public ips needed

How many Public IP’s are needed for this architecture?

vnet size

How large of a network do we need to create? App services do not require vnets

subnet_X size

How large must the network be? 

DNS Transaction

query custom domains for resolution

TTL (timing effects total queries and ability to update) TTL 1 Day or 1 Hour, 5 mins for migrations but left this way will consume all queries. 

total queries Total queries may not be available but we can try to identify the DNS provider and compare to known defaults.

response time DNS response time can indicate a throttled or maxed concurrent capacity DNS lookup.

Endpoints (firewall/loadbalancer/ipwhitelisting)

Identify exposed services – HTTP, HTTPS, FTP, Webdeploy, ssh

Firwall Rules

Identify any open ports – common services are 80, 443, 21, 8175, 22

Loadbalancing Rules

If we have multiple nodes, is the load being distributed? Can we identify the LB Logic?

ip restrictions / whitelisting

Are IP restricted zones responding restricted?

SSL decoding

Is SSL certificate installed to the app services or load balancer?

Network

 What are the Application IP’s, Database IPs, other Services IPs

 vnets or NSGs (ip ranges)

subnets

Security groups

Verify NSG’s or ACL’s reflect the firewall rules and IP restrictions identified.

Application

Check for known application pages

config files, connection strings,

IIS /webserver

(versions, features, permissions)

.NET /stack

(versions, app pools, permissions)

Deployment Dependencies

(webdeploy / octopus agent / sftp)

Service Dependencies

Database Server

(versions, ports, users, permissions, transaction read/write times)

Search Server

ip, index strategy

NOSQL or other application Database Servers

CDN

(cache control headers, origin urls, cache clearing)

OS

Patch level/ updates

version

installed features

users

Deployment Users

Hypervisor

Is there a hypervisor layer installed on the hardware, is this a cloud based hypervisor?

Performance variation

What is the average performance and the Maximum acceptable outliers?

HA / SLA / clustering

What is the SLA based on the current configuration?

Billing

fixed cost provisioned or incremental range(autoscaling)

Hardware

Server,LB,Firewall or cloud service instance sizing

Dedicated solutions

Datacenter details

Cloud Generations

(AWS sizing / Azure Series Letter) Can we identify the sizing for the instances?

Disaster Recovery

Azure ARM or AWS Cloud Formation templates for provisioning allow rapid environment rebuilds

Webserver scaling

Recommended scaling method 

Database Mirroring or replication (service recovery)

Mirror or Paas replication?

Database offsite logshipping (data recovery)

database offsite logshipping, geo replication, backup exports 

Leave a Reply

Your email address will not be published. Required fields are marked *