Enhancing the Stability of Off-the-Shelf Operating Systems

Authors

  • Michael M. Swift, Brian N. Bershad, and Henry M. Levy Department of Computer Science and Engineering University of Washington Seattle, WA 98195 USA Author

Keywords:

Recovery, Device Drivers, Virtual Memory, Protection, I/O

Abstract

Despite decades of research in extensible operating system technology, extensions such as device drivers remain a signif- icant cause of
system failures. In Windows XP, for example, drivers account for 85% of recently reported failures.
This paper describes Nooks, a reliability subsystem that seeks to greatly enhance OS reliability by isolating the OS from driver failures. The
Nooks approach is practical: rather than guaranteeing complete fault tolerance through a new (and incompatible) OS or driver architecture,
our goal is to prevent the vast majority of driver-caused crashes with little or no change to existing driver and system code. To achieve this,
Nooks isolates drivers within lightweight protection do- mains inside the kernel address space, where hardware and software prevent them
from corrupting the kernel. Nooks also tracks a driver’s use of kernel resources to hasten auto- matic clean-up during recovery.
To prove the viability of our approach, we implemented Nooks in the Linux operating system and used it to fault- isolate several device
drivers. Our results show that Nooks offers a substantial increase in the reliability of operating systems, catching and quick ly recovering
from many faults that would otherwise crash the system. In a series of 2000 fault-injection tests, Nooks recovered automatically from 99%
of the faults that caused Linux to crash.
While Nooks was designed for drivers, our techniques gen- eralize to other kernel extensions, as well. We demonstrate this by isolating a
kernel-mode file system and an in-kernel Internet service. Overall, because Nooks supports existing C-language extensions, runs on a
commodity operating sys- tem and hardware, and enables automated recovery, it repre- sents a substantial step beyond the specialized
architectures and type-safe languages required by previous efforts directed at safe extensibility.

Downloads

Published

2020-04-12

Issue

Section

Articles

How to Cite

Enhancing the Stability of Off-the-Shelf Operating Systems. (2020). INTERNATIONAL JOURNAL OF MANAGEMENT RESEARCH AND REVIEW, 10(2), 01-16. https://ijmrr.com/index.php/ijmrr/article/view/482