Lesson 8

The flashcards below were created by user slmckissack on FreezingBlue Flashcards.

  1. How can we make sure a system or network remains available?
    There are two primary methods: The application of redundant systems and devices, and built-in fault tolerance.
  2. involves the use of duplicate equipment and possibly software to improve the availability of a system, workstation, or processing function.

    In a redundant system, the primary devices are active, and the secondary or backup devices are idle.
  3. also implements duplicate devices, but the difference is that both the primary and secondary devices are active.
    Fault Tolerance
  4. NOTE: When a redundant system fails, it typically requires human intervention to get the backup system up and running. Think of it this way: The spare tire in the trunk of your car is a redundant system. You have the component you need in the event of a failure, but unfortunately, you have to change the tire to restore the system (your car) back to working order.
  5. NOTE: A fault-tolerant system has the ability to withstand a failure and continue to operate normally. Going back to the flat tire example, a fault-tolerant system would be a run-flat tire that allows you to continue driving down the road without interruption in service. In a fault-tolerant system, both the primary and secondary devices are active and equally up-to-date so that if the primary device fails, the secondary device can activate to fulfill the same role in the processing, with little or no interruption in service.
  6. So how do we achieve redundancy or fault tolerance in a system?
    One common way is with RAID
  7. What does RAID stand for?
    Redundant array of independent disks
  8. is a disk storage technology that creates a single logical unit from multiple hard disk drives. Has become synonymous with a variety of disk storage schemes that separate and segregate data across multiple disk drives.
    Redundant Array of Independent Disks (RAID)

    Essentially, RAID divides and distributes stored data across multiple disk drives in one of several RAID levels, each of which provides different redundancy and performance. The system views the various RAID disk drives as one large storage device. The operating objective for RAID is data reliability, availability, and input/output (I/O) performance.
  9. How many different levels are a part of the RAID system?
  10. What are the 4 most common level in the RAID system?
    • -RAID 0
    • -RAID 1
    • -RAID 5
    • -RAID 10
  11. Also know as disk mirroring, is a fault-tolerance and redundant level. Its system continuously copies data from one disk to another, making the second drive an exact replica of mirror of the first. The idea is that if the primary drive is lost, the second drive can take over without losing any data. If that happens, however, the second drive is now on its own. Another drawback is that you effectively have only half of the total physical disk space to use.
    RAID 1

    Provides a simple solution for data redundancy and fault-tolerance.
  12. Also known as disk striping, it writes data across multiple disk drives, with a minimum of two disk drives. It is primarily a performance level. It writes equal-size blocks of data alternately to two or more disk drives. In a two-disk drive setup, it writes sector A1, then to sector A2, then to A3, and so on. The drawback of this system is that if either drives are lost, the system fails due to such a heavy data loss.
    RAID 0
  13. Is the most popular of the RAID level. It requires three more disk drives and writes stripped data and generated parity date (for data recovery) across the disk drives.
    RAID 5
  14. NOTE: RAID 5 takes advantage of the fact that disk drives rarely fail entirely; rather, they fail in sectors. Should a disk sector fail, RAID 5 immediately and automatically recovers the data on that sector using the parity data and any remaining accessible data.
  15. NOTE: RAID 5 is very common in enterprise servers and network attached storage (NAS) systems. RAID 5 performs better than RAID 1 (mirroring) and provides much better fault-tolerance.
  16. is really a shortened form of RAID 1+0. It combines RAID 1 mirroring with RAID 0 data striping. It provides the best performance of all of the RAID levels, but it requires twice as many disk drives as any other RAID level (a minimum of four drives). Is more common for high-use database servers with a high-volume of disk writes. It is also available as a system implementation, but some of the performance advantages of the hardware-based system are lost.
    RAID 10
  17. NOTE: Aside from redundancy and fault tolerance, another benefit of a RAID system is data security. Because of data striping and its data storage methods, RAID makes data much harder to extract, especially by an intruder who doesn't have access to the RAID system I/O interface.
  18. Which is the capability to switch out some devices while a system is running, rather than having to stop some or all of a system to replace a failed component.
    Drive Swapping
  19. What are the three levels of drive swapping?
    • -Hot swap
    • -Warm swap
    • -Cold swap
  20. You can replace a hot-swappable component while the system is up and running, typically without service interruption. USB devices are hot-swappable.
    Hot swap
  21. To replace a warm-swappable component, the computer and operating system can be running, but you must stop all other services and applications. Many supposedly hot-swappable components are actually just warm-swappable.
    Warm swap
  22. As you may guess, to replace a cold-swappable component, everything--including the operating system and computer--must stop. Cold swap is the common state for nearly all components that install inside the system case of a desktop computer.
    Cold swap
  23. NOTE: Redundancy and fault tolerance aim to provide maximum uptime so that users and customers can rely on the availability of a system or network.
  24. NOTE: Availability is the opposite of unavailability, which translates to downtime. I know that sounds obvious, but to users, there is only one acceptable condition: available. If you've ever felt the frustration of not being able to access a network or application because the network or system is down, you know how users feel when they encounter this condition.
  25. Is a design and implementation approach that ensures a specific level of service.
    High Availability

    Typically, the specific level of service that the high availability system assures is part of a Service Level Agreement (SLA).
  26. document commits system availability at a specified level to users, internal or external.
    Service Level Agreement (SLA)
  27. Is a necessity on most networks. Is when maintenance, upgrades, and equipment swaps occur, usually sometime during late night or very early morning hours.
    Scheduled downtime.
  28. NOTE: However, the design of a high availability system is to prevent unscheduled downtime.
  29. The percentage of uptime?
    Guaranteed availability

    For example a commitment to 90% availability promises that the system or network will be available 90 percent of the time, allowing for 36.5 days of downtime in a year.
  30. NOTE: Typically technicians express system availability in terms of a number of "nines." Ninety percent availability is one "nine;" 99% availability is two "nines;" 99.9% is three "nines; 99.99% is four nines; and the ultimate goal for availability, 99.999%, is five nines. If you wish to shoot for a nearly impossible 99.9999% availability (31.5 seconds of downtime per year), it's six nines.
  31. NOTE: So what's the connection between availability and security? As I mentioned earlier, security measures are only good when they're running. On organization has a business rule that commits its network to 99.9% availability, secure attack or intrusion on the network may cause a system to be unavailable for more than the 10 minutes of downtime it can have.
  32. A related average times that represents the time a device takes to recover from a failure?
    mean time to recover
  33. the time it takes to affect a repair?
    Mean time to repair
  34. The length of time before a device should fail?
    Mean time before failure
  35. The average time the system is down for the repair?
    Mean downtime.
  36. MTTR
    Mean Time To Recover
  37. NOTE: It's a good idea when developing a contingency plan or disaster recovery plan to consider the MTTR for the major components and the entire system. In addition, if you have a maintenance contract on any of your major components, it should state the MTTR and refer to mean time to recovery, not mean time to respond.
  38. is the average time it takes to repair a failed device. it should not include the order lead-time of a replacement part or any other administrative or logistical downtime (ALDT). However, it should include the fault latency.
    Mean time to repair (MTTR)
  39. The time between the failure occurring and when its detected?
    Fault latency
  40. Is the average time a system is unavailable during a particular span of time, which is typically one year. It includes all forms of downtime, including failures, scheduled and unscheduled downtimes, and any ALDT.
    Mean Downtime (MDT)
  41. A plan that specifies, among other things, the recovery time objective and the recovery point objective.
    Information Technology Service Continuity (ITSC)
  42. is the target time duration for restoring service after a system or network failure, disruption, or catastrophe (like sever weather). Is a product of a business interruption analysis.
    Recovery time objective

    Any time lost beyond the recovery time objective begins to cause a serious loss of business continuity.
  43. represents the time (and data) to which you plan to restore data, assuming that data could have been lost after the disrupting event.
    Recovery Point objective
Card Set:
Lesson 8
2015-02-18 03:04:35
Lesson 8
Lesson 8
Show Answers: