Data recovery

From Wikipedia, the free encyclopedia

Jump to: navigation, search

Data recovery is the process of salvaging data from damaged, failed, corrupted, or inaccessible secondary storage media when it cannot be accessed normally. Often the data are being salvaged from storage media formats such as hard disk drives, storage tapes, CDs, DVDs, RAID, and other electronics. Recovery may be required due to physical damage to the storage device or logical damage to the file system that prevents it from being mounted by the host operating system.

The most common "data recovery" issue involves an operating system (OS) failure (typically on a single-disk, single-partition, single-OS system), where the goal is to simply copy all wanted files to another disk. This can be easily accomplished with a Live CD, most of which provide a means to 1) mount the system drive, 2) mount and backup disk or media drives, and 3) move the files from the system to the backup with a file manager or optical disc authoring software. Further, such cases can be mitigated by disk partitioning and consistently moving valuable data files to a different partition from the replaceable OS system files.

The second type involves a disk-level failure such as a compromised file system, disk partition, or a hard disk failure —in each of which the data cannot be easily read. Depending on the case, solutions involve repairing the file system, partition table or MBR, or hard disk recovery techniques ranging from software-based recovery of corrupted data to hardware replacement on a physically damaged disk. These last two typically indicate the permanent failure of the disk, thus "recovery" means sufficient repair for a one-time recovery of files.

A third type involves the process of retrieving files that have been deleted from a storage media. Although there is some confusion as to the term, the term "data recovery" may be used to refer to such cases in the context of forensic purposes or spying.


[edit] Recovering data after physical damage

A wide variety of failures can cause physical damage to storage media. CD-ROMs can have their metallic substrate or dye layer scratched off; hard disks can suffer any of several mechanical failures, such as head crashes and failed motors; tapes can simply break. Physical damage always causes at least some data loss, and in many cases the logical structures of the file system are damaged as well. This causes logical damage that must be dealt with before any files can be salvaged from the failed media.

Most physical damage cannot be repaired by end users. For example, opening a hard disk in a normal environment can allow airborne dust to settle on the platter and become caught between the platter and the read/write head, causing new head crashes that further damage the platter and thus compromise the recovery process. Furthermore, end users generally do not have the hardware or technical expertise required to make these repairs. Consequently, costly data recovery companies are often employed to salvage important data. These firms often use "Class 100" / ISO-5 cleanroom facilities to protect the media while repairs are being made. (Any data recovery firm without a pass certificate of ISO-5 or better will not be accepted by hard drive manufacturers for warranty purposes.)

[edit] Recovery techniques

Recovering data from physically-damaged hardware can involve multiple techniques. Some damage can be repaired by replacing parts in the hard disk. This alone may make the disk usable, but there may still be logical damage. A specialized disk-imaging procedure is used to recover every readable bit from the surface. Once this image is acquired and saved on a reliable medium, the image can be safely analysed for logical damage and will possibly allow for much of the original file system to be reconstructed.

[edit] Hardware repair

Media that has suffered a catastrophic electronic failure will require data recovery in order to salvage its contents.

Examples of physical recovery procedures are: removing a damaged PCB (printed circuit board) and replacing it with a matching PCB from a healthy drive, performing a live PCB swap (in which the System Area of the HDD is damaged on the target drive which is then instead read from the donor drive, the PCB then disconnected while still under power and transferred to the target drive), read/write head assembly with matching parts from a healthy drive, removing the hard disk platters from the original damaged drive and installing them into a healthy drive, and often a combination of all of these procedures. Some data recovery companies have procedures that are highly technical in nature and are not recommended for an untrained individual. Any of them will almost certainly void the manufacturer's warranty.

[edit] Disk imaging

Result of a failed data recovery from a Hard disk drive.

The extracted raw image can be used to reconstruct usable data after any logical damage has been repaired. Once that is complete, the files may be in usable form although recovery is often incomplete.

Open source tools such as DCFLdd v1.3.4-1 or DOS tools such as HDClone can usually recover data from all but the physically-damaged sectors. A 2007 Defense Cyber Crime Institute study shows that the DCFLdd v1.3.4-1 installed on a Linux 2.4 Kernel system produces extra "bad sectors", resulting in the loss of information that is actually available. The study states that when installed on a FreeBSD Kernel system, only the bad sectors are lost. Another tool that can correctly image damaged media is ILook IXImager, a tool available only to government and Law Enforcement.[1]

Typically, Hard Disk Drive data recovery imaging has the following abilities[2]: (1) Communicating with the hard drive by bypassing the BIOS and operating system which are very limited in their abilities to deal with drives that have "bad sectors" or take a long time to read. (2) Reading data from “bad sectors” rather than skipping them (by using various read commands and ECC to recreate damaged data). (3) Handling issues caused by unstable drives, such as resetting/repowering the drive when it stops responding or skipping sectors that take too long to read (read instability can be caused by minute mechanical wear and other issues). and (4) Pre-configuring drives by disabling certain features, such a SMART and G-List re-mapping, to minimize imaging time and the possibility of further drive degradation.

[edit] Recovering data after logical damage

Logical damage is primarily caused by power outages that prevent file system structures from being completely written to the storage medium, but problems with hardware (especially RAID controllers) and drivers, as well as system crashes, can have the same effect. The result is that the file system is left in an inconsistent state. This can cause a variety of problems, such as strange behavior (e.g., infinitely recursing directories, drives reporting negative amounts of free space), system crashes, or an actual loss of data. Various programs exist to correct these inconsistencies, and most operating systems come with at least a rudimentary repair tool for their native file systems. Linux, for instance, comes with the fsck utility, Mac OS X has Disk Utility and Microsoft Windows provides chkdsk. Third-party utilities such as The Coroners Toolkit and The Sleuth Kit are also available, and some can produce superior results by recovering data even when the disk cannot be recognized by the operating system's repair utility. Utilities such as TestDisk can be useful for reconstructing corrupted partition tables.

Some kinds of logical damage can be mistakenly attributed to physical damage. For instance, when a hard drive's read/write head begins to click, most end-users will associate this with internal physical damage. This is not always the case, however. Another possibility is that the firmware of the drive or its controller needs to be rebuilt in order to make the data accessible again.[citation needed]

[edit] Preventing logical damage

The increased use of journaling file systems, such as NTFS 5.0, ext3, and XFS, is likely to reduce the incidence of logical damage. These file systems can always be "rolled back" to a consistent state, which means that the only data likely to be lost is what was in the drive's cache at the time of the system failure. However, regular system maintenance should still include the use of a consistency checker. This can protect both against bugs in the file system software and latent incompatibilities in the design of the storage hardware. One such incompatibility is the result of the disk controller reporting that file system structures have been saved to the disk when it has not actually occurred. This can often occur if the drive stores data in its write cache, then claims it has been written to the disk. If power is lost, and this data contains file system structures, the file system may be left in an inconsistent state such that the journal itself is damaged or incomplete. One solution to this problem is to use hardware that does not report data as written until it actually is written. Another is using disk controllers equipped with a battery backup so that the waiting data can be written when power is restored. Finally, the entire system can be equipped with a battery backup that may make it possible to keep the system on in such situations, or at least to give enough time to shut down properly.

[edit] Recovery techniques

Two common techniques used to recover data from logical damage are consistency checking and data carving. While most logical damage can be either repaired or worked around using these two techniques, data recovery software can never guarantee that no data loss will occur. For instance, in the FAT file system, when two files claim to share the same allocation unit ("cross-linked"), data loss for one of the files is essentially guaranteed.

[edit] Consistency checking

The first, consistency checking, involves scanning the logical structure of the disk and checking to make sure that it is consistent with its specification. For instance, in most file systems, a directory must have at least two entries: a dot (.) entry that points to itself, and a dot-dot (..) entry that points to its parent. A file system repair program can read each directory and make sure that these entries exist and point to the correct directories. If they do not, an error message can be printed and the problem corrected. Both chkdsk and fsck work in this fashion. This strategy suffers from two major problems. First, if the file system is sufficiently damaged, the consistency check can fail completely. In this case, the repair program may crash trying to deal with the mangled input, or it may not recognize the drive as having a valid file system at all. The second issue that arises is the disregard for data files. If chkdsk finds a data file to be out of place or unexplainable, it may delete the file without asking. This is done so that the operating system may run smoother, but the files deleted are often important user files which cannot be replaced. Similar issues arise when using system restore disks (often provided with proprietary systems like Dell and Compaq), which restore the operating system by removing the previous installation. This problem can often be avoided by installing the operating system on a separate partition from your user data.

[edit] Data carving

Data Carving is a data recovery technique that allows for data with no file system allocation information to be extracted by identifying sectors and clusters belonging to the file. Data Carving usually searches through raw sectors looking for specific desired file signatures. The fact that there is no allocation information means that the investigator must specify a block size of data to carve out upon finding a matching file signature. This presents the challenge that the beginning of the file is still present and that there is (depending on how common the file signature is) a risk of many false hits. Also, data carving requires that the files recovered be located in sequential sectors (rather than fragmented) as there is no allocation information to point to fragmented file portions. This method can be time and resource intensive.[3]

[edit] Recovering overwritten data

When data has been physically overwritten on a hard disk it is generally assumed that the previous data is no longer possible to recover. In 1996, Peter Gutmann, a respected computer scientist, presented a paper that suggested overwritten data could be recovered through the use of Scanning transmission electron microscopy.[4] In 2001, he presented another paper on a similar topic.[5] Substantial criticism has followed, primarily dealing with the lack of any concrete examples of significant amounts of overwritten data being recovered.[6][7] To guard against this type of data recovery, he and Colin Plumb designed the Gutmann method, which is used by several disk scrubbing software packages.

Although Gutmann's theory may not be wrong, there's no practical evidence that overwritten data can be recovered. Moreover, there are good reasons to think that it cannot.[8]

[edit] Recovery software

[edit] Bootable

Data recovery cannot always be done on a running system. As a result boot disk, Live CD, Live USB, or any other type of Live Distro containing a minimal operating system and a set of repair tools.

  • Knoppix : The original Linux Live CD. It contains many useful utilities for data recovery.
  • Ubuntu Rescue Remix : A GNU/Linux live system that runs from CD or USB pen drive that includes free-libre, open source data recovery and forensics tools. [9]
  • SystemRescueCD : A Gentoo based Live CD, useful for repairing unbootable computer systems and retrieving data after a system crash
  • NeroBackItUp ImageTool : A user friendly bootable environment, through which you can restore your image created by NeroBackItUp and/or NeroBackItUp ImageTool to make your machine back to consistent stat.
  • Selkie Rescue Data Recovery : A bootable cd that runs before Windows with a user friendly interface. Allows you to retrieve files to a second computer via crossover cable or a network.

[edit] Consistency checkers

  • CHKDSK : A consistency checker for DOS and Windows systems.
  • Disk First Aid : A consistency checker for Mac OS 9.
  • Disk Utility : A consistency checker for Mac OS X.
  • fsck : A consistency checker for UNIX file systems.

[edit] File Recovery

[edit] Forensics

  • The Coroner's Toolkit : A suite of utilities aimed at assisting in forensic analysis of a UNIX system after a break-in.
  • The Sleuth Kit : Also known as TSK, The Sleuth Kit is a suite of forensic analysis tools developed by Brian Carrier for UNIX, Linux and Windows systems. TSK includes the Autopsy forensic browser.
  • EnCase : A suite of forensic tools developed by Guidance Software that is used for imaging and forensic analysis for UNIX, Linux, and Windows systems.

[edit] Imaging tools

  • ddrescue : The GNU tool for imaging failing hard drives.

[edit] See also

[edit] References

  1. ^ IXImager Bad Sector Drive Imaging Study. Cyrus Robinson, Defense Cyber Crime Institute Cyber Files Reports and studies are available upon request.
  2. ^ 'Disk Imaging: A Vital Step in Data Recovery' - This white paper describes disk-level issues that must be handled during a hard disk data recovery imaging.
  3. ^ Advanced Data Carving. Special Agent Daniel Dickerman, IRS criminal Investigation, Electronic Crimes Program
  4. ^ Secure Deletion of Data from Magnetic and Solid-State Memory, Peter Gutmann, Department of Computer Science, University of Auckland
  5. ^ Data Remanence in Semiconductor Devices, Peter Gutmann, IBM T.J. Watson Research Center
  6. ^ Feenberg, Daniel (14 May 2004). "Can Intelligence Agencies Read Overwritten Data? A response to Gutmann.". National Bureau of Economic Research. Retrieved on 2008-05-21. 
  7. ^ Data Removal and Erasure from Hard Disk Drives
  8. ^ Erasing hard disk drive data: How many passes are needed?
  9. ^ Ubuntu-rescue-remix
  10. ^ "RescuePRO Deluxe". Retrieved on 2009-03-27. 

[edit] Further reading

[edit] External links

Personal tools