sponsored by
OSdata.com: failure handling and recovery 


Failure Handling and Failure Recovery

    No matter how stable an operating system, no matter well maintained a facility, no matter how redundant the hardware, eventually systems will fail. Not “might” fail, but “will” fail.

    What happens when failure occurs? How catastrophic is a failure? What has to be done to recover from failure? These are all basic questions that are often overlooked.


OSdata.com is used in more than 300 colleges and universities around the world

Find out how to get similar high web traffic and search engine placement.


    The first line of defense in any kind of failure (disk crash, virus attack, burned out part, software freeze-up, software crash, whatever) is the back up copies of all essential software and data.

    And there is a tendency for users and even administrators to overlook keeping back ups on a regular basis.

    Having to redo work can be extremely expensive. It is possible that some data or customers might be lost for ever. It is possible that the expense can be so great that it drives the company out of business.

    An archival backup is one that is kept in an archive for long term storage. Archival copies are particularly useful for audits or cross-referencing long term information.

    A periodic backup is one that occurs on a regular schedule (such as annually, quarterly, monthly, weekly, daily, hourly, etc.). In UNIX systems, a cron job can be created that automatically performs a periodic backup.

    A temporary backup is one that is kept for a short period of time. Organizations are unlikely to be able to keep every single back up made. Depending on the storage medium, it may be possible to reuse temporary back up media (although the total usage and age of the media should be tracked).

    The basic rule of thumb is to maintain at least three copies of a backup. It is optimal if these copies are all the same, but often organizations use the most recent three generations instead (to save time and money at the possible expense of some lost data). One of the three copies (the oldest, if using three generations) should be stored off-site. An off-site copy of applications and data can be essential for recovery from a major disaster, such as fire, hurricane, tornado, earthquake, flood, or terrorist attack. Keeping the off-site copy at a distant location saves the data even if there is a large scale disaster. If your organization doesn’t have a distant office or branch that you can trade back up copies with, you may want to consider such options as a data storage or archival business, a bank safe deposit box, or a law office.

    There are a lot of arbitrary backup policies that can be imposed. One very common system is to make complete archive backups on a quarterly, monthly, or weekly basis, and to make temporary backups on a more frequent basis (usually daily), with the temporary backups being destroyed or released for reuse at each archival backup. An arbitrary backup policy can impose discipline that can save a great deal of money, especially if the archival backup can be automated so that it always runs as scheduled.

    The real rule for determining how often to perform short term backups is to consider the amount of time necessary to redo all of the lost work and backup your data more often than that amount of time that exceeds how much work you can afford to redo.

error reporting

    OpenVMS will detect and give early warnings of most hardware failures. OpenVMS can diagnose both program and operating system problems while the system is running, and will provide an accurate report of why a program failed. Error reporting, operator, and security logs allow easy tracking of critical system events.” —John Malmberge85

fail over

     “MCI has adopted a strategy of coexistence to address this issue. MCI runs NT Server in places where it’s most appropriate—departmental and middle-tier application environments—while continuing to rely on hosts and Unix servers for high-transaction applications, says Craig Ashapa, MCI’s senior manager for NT deployment. MCI’s most important information resides on Digital VAX VMS servers, he adds. Ashapa acknowledges NT’s limitations but works around them. The NT servers make up MCI’s enabling architecture that lets desktop systems access the back-end servers. The NT domain architecture is set up so that if an NT server goes down, the desktop systems can still access the back-end servers. Homegrown failover software and hardware instantly moves work to the desktop systems if there’s an outage.” —“The Hidden Cost Of NT”w69

hot-swapping hardware

     “Disks can be added to a OpenVMS server and put in use with out rebooting the system.” —John Malmberge85

manufacturer comments

     OS/2 Warp Server: “With OS/2 Warp Server, you gain … backup and recovery services, and more…all in a single, cost-effective solution that’s easy to install and manage.”w35

geek humor

    “It was working ten minutes ago, I swear…” —Rob McCool

OSdata.com is used in more than 300 colleges and universities around the world

Read details here.

    A web site on dozens of operating systems simply can’t be maintained by one person. This is a cooperative effort. If you spot an error in fact, grammar, syntax, or spelling, or a broken link, or have additional information, commentary, or constructive criticism, please e-mail Milo. If you have any extra copies of docs, manuals, or other materials that can assist in accuracy and completeness, please send them to Milo, PO Box 1361, Tustin, CA, USA, 92781.

    Click here for our privacy policy.

previous page next page
previous page next page

home page

one level up

holistic issues

peer level

Made with Macintosh

    This web site handcrafted on Macintosh computers using Tom Bender’s Tex-Edit Plus and served using FreeBSD .

Viewable With Any Browser

    Names and logos of various OSs are trademarks of their respective owners.

    Copyright © 1998, 2001 Milo

    Last Updated: May 4, 2001

    Created: June 5, 1998

previous page next page
previous page next page