Predictive Self Healing in the Solaris 10 Operating System -- Delivering Relentless Availability

The Solaris 10 Operating System (OS) introduces a new architecture for building and deploying systems and services capable of Predictive Self Healing. This technology enables Sun systems to accurately predict component failures and mitigate many serious problems — before they actually occur. Solaris Fault Manager and Solaris Service Manager are the two main components of Predictive Self Healing. Solaris Fault Manager receives data relating to hardware and software errors and automatically diagnoses the underlying problem. Once diagnosed, Solaris Fault Manager automatically responds by offlining faulty components. Solaris Service Manager makes services, rather than processes, into first-class citizens, permitting automatic self-healing. Base Solaris services have service descriptions which include full dependency information for start, stop, and restart; applications can easily be converted to run under Solaris Service Manager.

73K PDF[73K]
Table of Contents
 
 
 
 

Highlights

Highlights
Maximized system and service availability through predictive diagnosis and isolation of faulty components
Automatic diagnosis of failed components and automatic restart of failed services in milliseconds
Simplified administration model for managing services, reducing cost of ownership
Fast and easy repair of problems with links to knowledge articles
Scalable architecture can be rapidly upgraded and adapted to new problems — without requiring downtime
 
 

Maximizing availability

Predictive Self Healing is designed to maximize the availability of the system and application services by automatically diagnosing, isolating, and recovering from faults. This helps to not only reduce hardware failures but also to reduce the impact of application failures, leading to increased system and application availability.

  • Reducing Hardware Failures — A self-healing system automatically diagnoses problems, and the results can be used to trigger automated reactions such as dynamically taking a CPU, regions of memory, and I/O devices off line before these components can cause system failures. Solaris Fault Manager isolates and disables faulty components, and helps ensure continuous service even before administrators know there is a problem. In addition, remote service agents can retrieve information from Sun that is vital to diagnosing the underlying root cause of the failure.
  • Reducing the Impact of Service Failures — If an application service should fail, the built-in service restart mechanism in the Solaris 10 OS automatically restarts the application or service. This mechanism also extends into Sun Cluster software failover environments for even higher availability.

Back to top

 
 

Automatic diagnosis and recovery from failures

With Solaris Fault Manager, the system automatically diagnoses faulty components, a function that in some cases can reduce analysis time from days to seconds. Once diagnosed, the system can quickly take corrective action and automatically restore application services. This powerful technology ensures that business critical applications and essential system services can continue uninterrupted in the event of software failures, major hardware component failures, and even software misconfiguration problems.

Customers can now deliver higher levels of availability and application services while minimizing downtime and associated administrative costs. Reduced downtime can potentially save companies $10,000 to $6 million per hour for mission-critical environments.

Back to top

 
 

Simplified administration

Solaris Service Manager reduces complexity by abstracting problem diagnosis and services in a manner that is transparent to users and applications. It simplifies common administrative tasks, speeds system boot, and significantly reduces human errors associated with system failures that can lead to service downtime and inefficient management of the system. Administrative tasks such as enabling and disabling services and changing properties are simplified and secure, with an undo capability to revert changes. In addition, service information is stored in a central repository, making the systems easier to manage and maintain.

Self-healing technology can also help improve the productivity of support staff. They can now spend much less time investigating and resolving issues, resulting in a higher ratio of supported systems per individual. Plus, a self-healing system can lead to reduced administration costs — systems that perform many complex tasks without user intervention require staff with less expertise, experience, and salary.

Back to top

 
 

Fast and easy repair

Solaris Fault Manager issues easy-to-understand diagnosis messages that link to knowledge articles at sun.com/msg. By providing system administrators with unique event IDs, they can access detailed information in knowledge articles, which describe what failure occurred and what the system did to fix it. These knowledge articles guide system administrators through any tasks that require human intervention, including repairs, and explain predicted or detected problems using clear language and links to repair procedures and documentation — all of which greatly reduces the complexity of repairing the system.

Back to top

 
 

Scalable and flexible architecture

The scalable architecture of Sun's Predictive Self Healing technology can be rapidly evolved to new problems and updated as new diagnosis and availability technologies are added to the system. Most future updates can be dynamically loaded and unloaded from the system while it is running and can be upgraded on the fly without requiring downtime or losing previous diagnosis data.

Back to top

 
 

Conclusion

With businesses operating around the clock and demanding uninterrupted service, service availability is of paramount importance. Predictive Self Healing delivers the next generation of availability technology today, including features that keep systems and services running and simple for administrators. Over time, a rapidly evolving ecosystem of self-healing components can help provide consistent, easy-to-use, and always-available Sun systems.

Back to top

 
 

Learn More

Get the inside story on the trends and technologies shaping the future of computing by signing up for the Sun Inner Circle program. You'll receive a monthly newsletter packed with information, plus access to a wealth of resources. Register today at sun.com/joinic.

Back to top

Left Curve
System Administrator
Right Curve
Left Curve
Developer and ISVs
Right Curve
Left Curve
Related Products
Right Curve
solaris-online-forum-banner