DisasterRecoveryPolicy
Last updated: January 20, 2023
Introduction
- Zora Communication Disaster Recovery Policy outlines the guidelines, procedures and workarounds to follow in case of a disaster like failures of power supplies, telecommunications, social unrest, terrorist attacks, fire, or natural disasters.
Scope
- Zora Communication is a "software as a service (SaaS)" organization, and its production infrastructure is completely hosted on cloud-based providers. Consequently, there are two primary implications of disasters -
- Our primary infrastructure provider for production services has degraded service or is unavailable. Specifically, one of their hosting regions has degraded services.
- Our primary office(s) location is partially or completely unavailable as a place for our staff to work safely and effectively.
- Since Zora Communication is a SaaS company, most of its core operations can be carried out even if staff members cannot reach their office(s). It suffices for staff members to have access to any laptop and an internet connection to carry out their basic job functions in order to keep our customer services uninterrupted. The advantage of Zora Communication workforce is that if there are clusters of people or systems that are unavailable, the rest of the company will continue to operate normally.
- As a result, the remainder of this Disaster recovery policy details steps to follow in the event where our primary production infrastructure is unavailable.
Assumptions
- This plan assumes that the following requirements are satisfied:
- Production data backups are available (as covered in our Data Backup Policy)
- Our infrastructure provider[s] has multi-region support, and at least one of their worldwide regions is unaffected by the disaster.
- Incidents that affect our customers or partners but have no effect on our production systems are handled by their own respective business continuity and disaster recovery plans.
Priority
- Consider the following priority order in restoring our production systems.
- Critical systems like application servers, background workers and database servers that are bare minimum requirements for a functioning production system. These systems, if unavailable, affect the integrity of data and must be restored, or have a process begun to restore them, immediately upon becoming unavailable.
- Non-Critical Systems like analytics, monitoring, logging etc which do not prevent critical Systems from functioning and being accessed appropriately. Non-Critical Systems are restored at a low priority
Disaster Recovery Plan
- Disaster recovery includes the following steps
- Is the event of a disaster?
- As a guiding principle, if our systems are going to be unavailable for more than 24 hours, we may have to activate our disaster recovery plan.
- The first person to recognize a disaster should notify the CIEO. The CIEO will consider the information available and decide if the situation warrants activation of the Disaster recovery Plan, or if it can be handled more appropriately under the Incident Response plan. The CIEO may also decide to activate the Disaster Recovery Plan based on other criteria.
Assign a Team
- If the Disaster Recovery Plan has been activated, the CIEO will appoint a team of one or more people to carry out subsequent steps. The CIEO will choose the team depending on the type of disaster, the nature of affected systems, availability of staff while giving priority to staff who have performed prior disaster recovery tests.
Recovery Groundwork
- The team should assess if the disaster affected both Critical and NonCritical Systems. In such cases, the team should prioritize the recovery of Critical Systems and proceed to the next steps before addressing the non critical systems.
- The team should analyze damage to affected environments. If possible, it should back up persistent parts of the system (like databases)
- Overall, the team should verify that previous backups of customer data is available before moving on to the next steps
Recovery
- Our infrastructure provider[s] supports multiple regions and availability zones. If the primary availability zone or a region is unavailable, pick another availability zone/ region with similar functionality for subsequent steps.
- Begin building a new environment using the steps followed during Disaster recovery testing. These steps may be available in a runbook. Also use the database backups as confirmed in the steps above.
- Test the new environment to check if all critical functionality is working as expected (for instance, login, logout, dashboard etc.)
- Make the new environment live to your customers if everything seems okay. Keep a close watch for unexpected bugs or outages for 24 hours.
- The management should decide if the new environment is a temporary setup or should continue indefinitely as a primary environment. If the new environment continues to be the primary environment, ensure that all backups and other non-critical functionality like monitoring etc are configured correctly.
Postmortem
- After the disaster incident is addressed and all functionality in the production system is stabilized, the team should analyze the root cause of the disaster if it is not obvious.
- Analyze the available logs, databases etc. To learn more information about potential causes.
- The team should document learnings from the incident, update any of the runbooks if any, and share the learnings with relevant Zora Communication staff.
- Zora Communication management should communicate relevant details of the incident internally. In the event customer data has been compromised, customers must be notified
Periodic Testing
- To ensure our ability to execute this plan effectively, the disaster recovery plan is parodically tested. Specifically, we perform the following tests:
- Restore critical data backups created as per our data backup policy Tabletop review of this Disaster recovery plan.
Non compliance
- Zoracom staff who violate this policy may face repercussions in proportion to the impact of their violation. Zoracom management will determine how serious a staff member's offense is and decide the appropriate penalty. Penalties may include
- Reprimand
- Demotion
- Detraction of benefits for a definite or indefinite time
- Suspension or termination for more serious offenses