2008 East
DIAMOND SPONSOR:
Data Direct
Frontiers in Data Access: The Coming Wave in Data Services
PLATINUM SPONSORS:
Red Hat
The Opening of Virtualization
Intel
Virtualization – Path to Predictive Enterprise
Green Hills
IT Security in a Hostile World
JBoss / freedom oss
Practical SOA Approach
GOLD SPONSORS:
Software AG
The Art & Science of SOA: How Governance Enables Adoption
PlateSpin
Effective Planning for Virtual Infrastructure Growth
Fujitsu
Automated Business Process Discovery & Virtualization Service
Ceedo
Workspace Virtualization
Click For 2007 West
Event Webcasts

2008 East
PLATINUM SPONSORS:
Appcelerator
Think Fast: Accelerate AJAX Development with Appcelerator
GOLD SPONSORS:
DreamFace Interactive
The Ultimate Framework for Creating Personalized Web 2.0 Mashups
ICEsoft
AJAX and Social Computing for the Enterprise
Kaazing
Enterprise Comet: Real–Time, Real–Time, or Real–Time Web 2.0?
Nexaweb
Now Playing: Desktop Apps in the Browser!
Sun
jMaki as an AJAX Mashup Framework
POWER PANELS:
The Business Value
of RIAs
What Lies Beyond AJAX?
KEYNOTES:
Douglas Crockford
Can We Fix the Web?
Anthony Franco
2008: The Year of the RIA
Click For 2007 Event Webcasts
SYS-CON.TV
TOP THREE LINKS YOU MUST CLICK ON


* Error Recovery *
Facilitate problem resolutions in both Web and desktop systems

Java developers have access to a variety of error detection tools, such as try/catch blocks and variants of the Exception class, but few mechanisms provide a structured, interactive recovery. As a result, users are frequently faced with frustrating application instability that can be difficult to resolve. Fortunately, an error recovery framework can be easily created and integrated into almost any desktop or Java server application.

Progressing from Error Detection
A typical application begins the error handling process by catching an Exception object bearing some kind of problem description, whether by virtue of the class type or attribute data. The system might also perform error checking and data validation on method return values for situations in which corrupt data will not cause a crash outright. Ideally, the application should be able to analyze each situation and take appropriate measures to repair the environment so that the task can be successfully completed. What typically happens instead is that the afflicted task is aborted, perhaps with a report or log of some kind.


try
{
  StringBuffer helloWorld = null;
  JOptionPane.showMessageDialog( null, helloWorld.toString() );
}
catch ( Exception error )
{
  JOptionPane.showMessageDialog( null, "Oops!" );
  System.exit( 1 );
}

The problem is that providing recovery logic for each error situation can be a time-consuming task whose solutions must be distributed throughout a system's logic. Each application is unique, so preparing a general framework to handle error situations is a challenging task that many library and API development teams prefer to leave to the application developers.

The solution in these situations is for development groups to create their own framework that allows common logic to be reused within a context that allows for interactive system guidance. The following roles facilitate this progression from error detection to error resolution:

  • Problem description
  • Suggested solution
  • Resolution manager
  • Resolution listener
  • Resolution validator
Describing the Problem and Its Solution
The error description is the best place to begin when designing a recovery framework. Systems have the option of specifying a problem component that either serves as the base class for application exceptions, or is distinct with error information as a constructor parameter. The latter strategy works best with applications that are tightly bound to Sun's libraries and virtual machine because the exception architecture remains consistent. Regardless, the component must describe the problem in terms that both the user and the system can understand. The problem can be represented by a variety of perspectives that range in complexity - from a simple text string to a domain expert facility with access to a wide variety of resolution and logging mechanisms.


public interface IProblem
{
  public void setError( Throwable p_error );
  public String getErrorDescription();
  ...
}

The logic that discovered the error will typically create the problem object and work to resolve it, but a collection known to the application should track the object for later analysis to ensure persistence beyond the current operation.


catch ( Exception error )
{
  // Log the issue for later reference.  
  problemCollection.add( new Problem( error ) );
  ...
  // Work to resolve issue.
}

Figure 1 illustrates a problem description component that offers a reasonable amount of error and tracking detail.

The error handling logic can also instantiate solutions suggested for the problem because of the failing operation's intimate familiarity with the process and the expected environment state. Not all solutions can be automated (such as the need for a missing disk), so the solution role is best broken into two classes: a suggestion component to provide resolution instructions and a solution component that automates the resolution for execution by the system on behalf of the user. Suggestion objects are directly associated with a problem in order to provide system access regardless of where the error is to be resolved, and each suggestion in turn can be associated with a solution object if the system can act on behalf of a user for the resolution attempt.


public interface ISuggestion
{
  // Call this if the suggestion can be automated.
  public void setSolution( ISolution p_automatedSolution );
  ...
}

Designers should keep in mind alternate description or instruction strategies that offer additional graphical or animated illustrations to support other language groups or people with varying disabilities. Figure 2 illustrates the relationship between potential solutions and the problem description.

There's little reason for the role functionality of IProblem and ISuggestion to vary so concrete convenience classes can be provided for these two components. ISolution's role will almost always be custom to a specific application or library so its implementation is left to the development team.

Simultaneous attempts to solve a problem should be avoided because the ideal benefits of the structured error recovery include consistency and predictability. As such, synchronization techniques need to be used with all of a problem's methods, which has the added benefit of allowing the internal ISuggestion collection to use fast data structures.

ISolution serves as a facade for all of the operations involved in a single attempt to recover from an error. Therefore, Suggestion only needs to contain one reference to an associated ISolution, and a value of null can represent a user-driven solution. This one-to-one relationship ensures that the user retains a clear understanding of the environmental and data consequences of any error resolution attempt, which in turn increases the user's confidence in the integrity of the system he or she is using.

A simple Suggestion class will often provide only a text description of the potential solution's steps and consequences, but architects should also consider the use of animations to clearly communicate what will occur with the suggestion's selection. Wizards can be used in automated solutions to collect custom data values that will affect the nature or severity of the consequence, such as how far to adjust steam pressure in overloaded industrial equipment.

Guiding the Resolution Process
The selection of a suggested solution is best provided through a component that can consistently orchestrate the resolution process. The Mediator pattern provides an excellent template for the problem-solver component by logically connecting the problem and its solutions to the needs and insight of the application. Figure 3 illustrates the IProblemSolver interface and its relation to IProblem.

Concrete implementations of IProblemSolver accept a specific problem, execute solutions as appropriate, and then check to determine if any given solution fixed the problem. Problem solvers can either act independently, query the user for guidance, or a combination of both as appropriate for user accessibility and the failing operation.


...
IProblemSolver solver =
  new AutomatedProblemSolver( someProblem );
boolean resolved = solver.solveProblem();

Automated problem solvers that use only ISolution objects are often best for server-side systems and client tasks that are trivial in their resolution. AutomatedProblemSolver is an example component that should be provided by the framework for just such a purpose. AutomatedProblemSolver looks through a given Problem instance for valid ISolution instances and organizes their execution according to a success likelihood indicator supplied by the associated Suggestion. Architects need to be careful when using automated problem solvers, however, because the user has no control over the outcome other than what executed solutions might provide. In fact, the solutions provided will often require expert knowledge about the system and environment, and the application may need to intercede between resolution attempts by the problem solver to reset any environment variables left in an inappropriate state from a failed attempt.

Alternatively, user-driven problem solvers take advantage of all provided ISuggestion objects by presenting to the user a subset of those most likely to succeed. The primary benefit of an interactive problem solver is the control felt by the user, but situations can also arise in which the user is essential to the task at hand through intimate knowledge of the environment or access to resources not programmatically accessible. The recovery framework can provide such a component, GuidedProblemSolver, that uses Swing or a similar technology to present the user with an error resolution dialog containing the top three suggestions. The resolution is iteratively attempted by users through the selection of a suggestion and the execution of its instructions. Automated solutions are supported through the association between ISuggestion and ISolution components, but the suggestion should clearly indicate the automated nature. A cancel button should always be provided by the dialog to allow the user the opportunity to cancel the task instead of resolving it (for example, a disk that is not on hand might be needed). The application should still communicate with the problem solver after each resolution attempt to ensure that the environment is in an appropriate state for the next attempt.

Just as with suggestions and solutions, several variations of the problem solver can be introduced to applications. For example, a wizard could query users on their preference for resolution with regard to the existing state of the operation, much in the same way some applications present users with a series of questions to determine which help document would best meet the user's needs. Alternatively, a Web system might use a problem solver to track the session and generate HTML forms that simulate an interactive dialog. As can be imagined, problem-solver components can become complex applets in their own right, and so require special attention from the technical and business experts.

Application Participation
The confirmation or rejection of an attempted solution is an important aspect of error resolution. A problem solver could be used for this task, but a role-based component better allows the application's seamless integration into the process without affecting the more general nature of the error recovery framework. Figure 4 illustrates the relationship between a problem solver and a resolution validator.

A concrete implementation of IResolutionValidator will be, by necessity, a domain expert component that is specific to the operation affected by the error. The resolution process can be complex because the observing component must intimately understand what a stable operation looks like from execution, data integrity, and environment perspectives. Validation logic must consider whether all three aspects of the operation have been repaired by an attempted solution; if not, the validator must consider whether additional attempts should be made, or if the operation will at least work well enough to allow the data to be saved in anticipation of a new session. For example, word processors often create backup files that can be reopened once the application has been restarted, and many systems can behave in a similar fashion for calculational tasks. However, it's important to understand that validators are not responsible for performing any repairs upon the system, but rather must only communicate to the problem solver whether or not the given problem remains an issue. Environment adjustments and data rescue are performed by solutions or additional problem-solver observers.


public class StringBufferValidator
  implements IResolutionValidator
{
  public boolean validateResolution( IProblem p_problem )
  {
    NullStringBufferProblem issue = 
      ( NullStringBufferProblem ) p_problem;
    if ( issue.isBufferNull() == true )
    {
      return false;
    }

    return true;
  }
}

...
// Prime the problem solver.
IProblemSolver solver = 
  new AutomatedProblemSolver( NullStringBufferProblem,
                 new StringBufferValidator() );

...
// Validating inside an automated solver.
currentSolution.solve();
boolean resolved = 
  m_validator.validateResolution( m_problem );

The system can provide listener objects to the problem solver to observe the resolution process, provide guidance to the problem solver, and adjust the environment as necessary in reaction to the outcome of any attempted solution. Often these components are specific to the operation at hand, but it would be reasonable to also make the observer interface implementations the same class that also implements the validating logic. Figure 5 illustrates the relationship between the problem solver and the problem-solver listeners.

IProblemSolverListener provides the final component in the resolution process that facilitates the application's active participation. Instead of determining whether the operation can be reactivated for further processing, concrete derivations of IProblemListener perform system housekeeping in reaction to the failure or success of the solution attempted. In many instances these listeners can record particulars of the error recovery attempt to facilitate the development of new system solutions that avoid the problem outright. Other listeners might attempt to reset the environment state if the process of executing a solution went awry. If the resolution attempt placed the task into a particularly dangerous or disrupted state, each listener has the opportunity to recommend to the problem solver that no further attempts be made (i.e., abort the task and let the system start over, even if that would result in lost data).

These suggestions are just that, however, and the ultimate responsibility for approving the abortion or continued processing of an operation remains with the problem solver's validator. For the best effect, a problem solver should notify each associated listener immediately after a solution is attempted.


...
// Inside an automated solver.
currentSolution.solve();
boolean resolved = 
  m_validator.validateResolution( m_problem );
for ( int i = 0; i < m_listeners.count(); i++ )
{
  IProblemSolverListener listener =
    ( IProblemSolverListener ) m_listeners.get( i );
  listener.solutionAttempted( m_problem, 
         currentSolution, resolved, countRemainingSolutions() );
}

Conclusion
The tools necessary to identify the occurrence of errors are already available to developers, but additional work is necessary to provide a consistent and reliable resolution mechanism. A general error recovery framework can be easily created and extended to facilitate problem resolution in both Web and desktop systems. In many instances, however, expert insight into how the system works will be required to make the most of the framework's capability to ensure the best user experience possible.

This article explored the design of such a framework to provide a precise method for describing the problem at hand, identifying and attempting solutions, and ensuring that the problem has been resolved. The process can be automated in situations where user input is impossible or inappropriate, or made interactive through the presentation of dialogs and wizards. A complete implementation of this framework can be found at http://home.insight.rr.com/thebretts/todd/research/.

Reference

  • Gamma, E., et al. (1994). Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley.
  • About Todd Brett
    Todd has over 10 years of experience with multiplatform applications and holds an M.Sc from Regis University, Denver, CO.

    LATEST JAVA STORIES & POSTS
    Unit testing is hard. There I said it. Although I have been developing software for the past 18 years I still find that putting my applications through their paces via unit testing is difficult. I have learned the lesson (I'm sure like many of you) the hard way. Unit testing is p...
    Continuent has announced support and enhancements to MySQL Server 5.1.30 GA release, the 5.1 production version of the open source database. MySQL 5.1.30 is recommended for use on production systems by the MySQL build team at Sun Microsystems. Continuent Tungsten provides advance...
    As a software journalist, there are times when certain vendors will shut the door on reporting opportunities that might represent too much of an "inside view" of their technology or their organization. I've been to more developer events than I can remember where I've been handed ...
    Active Endpoints has announced the general availability of ActiveVOS 6.0.2, in response to ever increasing demands for improved process performance and efficiencies. ActiveVOS is an all-in-one, 100% standards-based orchestration and business process management system (BPM) that p...
    Just because the web has been open so far doesn't mean that it will stay that way. Flash and Silverlight, arguably the two market-leading technology toolkits for rich media applications are not open. Make no mistake - Microsoft and Adobe aim to have their proprietary plug-ins, ak...
    Doing network I/O on the user interface (UI) thread is bad. Most developers know that and can tell you why; unfortunately, it’s still done. At this year's JavaOne, one of the keynote JavaFX demos bombed because the network was slow, something that would be forgivable had the en...
    SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS
    SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
    Click to Add our RSS Feeds to the Service of Your Choice:
    Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
    myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
    Publish Your Article! Please send it to editorial(at)sys-con.com!

    Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021


    SYS-CON FEATURED WHITEPAPERS

    SPONSORED BY INFRAGISTICS
    In every field of design one of the first things students do is learn from the work of others. They ...
    There are many forces that influence technological evolution. After a decade of building enterprise ...
    2008 is going to be an important year for Rich Internet Applications. Most organizations are deliver...
    The OpenAjax Alliance is developing an Ajax industry wishlist for future browsers, using a dedicated...
    Infragistics announced the availability of two Community Technology Preview (CTP) User Interface (UI...
    The YUI development team has released version 2.5.2; you can download the new release from SourceFor...
    ADS BY GOOGLE