Disaster recovery
From Wikipedia, the free encyclopedia
Disaster recovery is the process, policies and procedures related to preparing for recovery or continuation of technology infrastructure critical to an organization after a natural or human-induced disaster.
Disaster recovery planning is a subset of a larger process known as business continuity planning and should include planning for resumption of applications, data, hardware, communications (such as networking) and other IT infrastructure. A business continuity plan (BCP) includes planning for non-IT related aspects such as key personnel, facilities, crisis communication and reputation protection, and should refer to the disaster recovery plan (DRP) for IT related infrastructure recovery / continuity. This article focuses on disaster recovery planning as related to IT infrastructure.
Contents |
[edit] Introduction
With the increasing importance of information technology for the continuation of business critical functions, combined with a transition to an around-the-clock economy, the importance of protecting an organization's data and IT infrastructure in the event of a disruptive situation has become an increasing and more visible business priority in recent years.
It is estimated that most large companies spend between 2% and 4% of their IT budget on disaster recovery planning, with the aim of avoiding larger losses in the event that the business cannot continue to function due to loss of IT infrastructure and data. Of companies that had a major loss of business data, 43% never reopen, 51% close within two years, and only 6% will survive long-term.[1]
[edit] Classification of Disasters
Disaster can be classified in two broad categories. Viz, 1) Natural disasters- Preventing a natural disaster is very difficult, but it is possible to take precautions to avoid losses. These disasters include flood, fire, earthquake, hurricane, smog, etc 2) Man made disasters- These disasters are major reasons for failure. Human error and intervention may be intentional or unintentional which can cause massive failures such as loss of communication and utility. These disasters include walkout, sabotage, burglary, virus, intrusion, etc.
[edit] Security Holes
Security holes are the vulnerabilities in computing hardware or software. It provides indirect invitation to malicious brains to work on it and exploit it. It is achieved through flaws in network software which allows unintended control within the network. Components of network such as PCs and router hold these holes through their operating systems. Technical details of any systems should not be made public abundantly unless required. Once such holes are discovered, information about it should be immediately passed to security professional responsible for it. On the other hand such information is also passed quickly to hacker who might want to intercept into the network. Security professional should always work to heal such holes to eliminate possible attack.
[edit] General steps to follow while creating BCP/DRP
1. Identify the scope and boundaries of business continuity plan.
First step enables us to define scope of BCP. It provides an idea for limitations and boundaries of plan. It also includes audit and risk analysis reports for institution’s assets.
2. Conduct a business impact analysis (BIA).
Business impact analysis is study and assessment of financial losses to institution resulting from destructive event as unavailability of important business services.
3. Sell the concept of BCP to upper management and obtain organizational and financial commitment.
Convincing senior management to approve BCP/DRP is key task. It is very important for security professional to get approval for plan from upper management to bring it to effect.
4. Each department will need to understand its role in plan and support to maintain it.
In case of disaster, each department has to be prepared for the action. To recover and to protect the critical systems each department has to understand the plan follows it accordingly. It is also important to maintain and help in creation of plan for each individual department.
5. The BCP project team must implement the plan.
After approval from upper management plan should be maintained and implemented. Implementation team should follow the guidelines procedures in plan.
6. NIST tool set can be used for doing BCP.
National Institute of standards and Technologies has published good tools which can help in creating BCP.
[edit] Control measures in recovery plan
Control measures are steps or mechanisms that can reduce or eliminate computer security threats. Different types of measures can be included in BCP/DRP
Types of measures:
1. Preventive measures - These controls are aimed at preventing an event from occurring.
2. Detective measures - These controls are aimed at detecting or discovering unwanted events.
3. Corrective measures - These controls are aimed at correcting or restoring the system after disaster or event.
These controls should be always documented and tested regularly.
[edit] Strategies
Prior to selecting a disaster recovery strategy, a disaster recovery planner should refer to their organization's business continuity plan which should indicate the key metrics of recovery point objective (RPO) and recovery time objective (RTO) for various business processes (such as the process to run payroll, generate an order, etc). The metrics specified for the business processes must then be mapped to the underlying IT systems and infrastructure that support those processes.
Once the RTO and RPO metrics have been mapped to IT infrastructure, the DR planner can determine the most suitable recovery strategy for each system. An important note here however is that the business ultimately sets the IT budget and therefore the RTO and RPO metrics need to fit with the available budget. While most business unit heads would like zero data loss and zero time loss, the cost associated with that level of protection may make the desired high availability solutions impractical.
The following is a list of the most common strategies for data protection.
- Backups made to tape and sent off-site at regular intervals (preferably daily)
- Backups made to disk on-site and automatically copied to off-site disk, or made directly to off-site disk
- Replication of data to an off-site location, which overcomes the need to restore the data (only the systems then need to be restored or synced). This generally makes use of storage area network (SAN) technology
- High availability systems which keep both the data and system replicated off-site, enabling continuous access to systems and data
In many cases, an organization may elect to use an outsourced disaster recovery provider to provide a stand-by site and systems rather than using their own remote facilities.
In addition to preparing for the need to recover systems, organizations must also implement precautionary measures with an objective of preventing a disaster situation in the first place. These may include some of the following:
- Local mirrors of systems and/or data and use of disk protection technology such as RAID
- Surge protectors — to minimize the effect of power surges on delicate electronic equipment
- Uninterruptible power supply (UPS) and/or backup generator to keep systems going in the event of a power failure
- Fire preventions — alarms, fire extinguishers
- Anti-virus software and other security measures
[edit] See also
- Backup site
- Business continuity planning
- Continuous data protection
- IBM Global Mirror
- Recovery point objective
- Recovery time objective
- Remote backup service
- Secure virtual office
- Seven tiers of disaster recovery
- Virtual tape library
- HP Continuous access