Samson W. Tu, Mark A. Musen
Section on Medical Informatics, Medical School Office Building
Stanford University School of Medicine
Stanford, CA 94305-5479
email: {tu | musen}@smi.stanford.edu
This paper describes the reformulation of the episodic skeletal-plan refinement (ESPR) problem-solving method (PSM) in a new framework that seeks to integrate knowledge-based applications with a temporal data-abstraction and data-management system. In this framework, both applications and temporal-data managers are encapsulated as CORBA objects. We found that we need to reformulate the method ontology, mapping relations, and control structure of the PSM in this framework. Our experience suggests that PSMs are not necessarily fixed structures that can be plugged into arbitrary application environments, and that we need to develop a flexible configuration environment and expressive mapping formalisms to accommodate the requirements of application environments. These requirements include the ways data are made available and the ways software components interact with each other.
The concept of a problem-solving method (PSM) has been one of the major organizing principles of recent knowledge-engineering frameworks (McDermott 1988; Steels 1990; Klinker et al. 1991; Chandrasekaran et al. 1992; Wielinga et al. 1992; Schreiber et al. 1993; Eriksson et al. 1995). Although details differ, most frameworks characterize a problem-solving method as having (1) a knowledge-level characterization of the methodís functional competence and assumptions, (2) well-defined knowledge roles, (3) modular decomposition into subcomponents, and (4) a task control structure or explicit control knowledge. Knowledge-level characterization helps a user in selecting and constructing a PSM appropriate for the application task; knowledge roles guide developers as to the type of domain knowledge that should be acquired; and decomposition of methods and explicit control knowledge encourage reuse of subcomponents and custom-tailoring of a PSM for different application requirements.
Concerns about reuse of problem-solving methods and about knowledge acquisition have motivated our work in the PROTÉGÉ-II project. In PROTÉGÉ-II, we developed an ontology-based framework for generating knowledge-acquisition tools and configuring the problem-solving methods (Tu et al. 1995). The knowledge roles and algorithms of problem-solving methods are formulated in domain-independent terms. These knowledge roles are instantiated with application equivalents through the use of mapping relations (Gennari et al. 1994). In PROTÉGÉII, a method either is not decomposable, in which case we call it a mechanism, or it is made up of a set of subtasks and a control structure, in which case we call it a composite method. To solve these subtasks we have to select additional methods or mechanisms.
Our hypothesis has been that we can define problem-solving components with sufficiently well-understood functionalities and knowledge requirements that these components can be reused for similar tasks in multiple application areas. With PROTÉGÉII, we have had some success in showing that such reuse is possible. For example, the separation of domain-specific terms and relations from method-specific knowledge roles allows us to reuse the propose-and-revise method in multiple applications (Gennari et al. 1995). By separating out the temporal-abstraction method from the episodic skeletal-plan refinement method, we showed that a component that was developed for temporal reasoning in one PSM can be reused for spatial reasoning in a different context (Molina and Shahar 1996).
Despite these examples of successful reuse, ìplug-and-playî stories of PSM reuse remain rare and are mostly limited to research prototypes. Recent work on the theory of PSM construction suggests that, even if a PSM is selected to solve two application tasks that have the same functional requirements, the process of operationalizing the PSM to satisfy efficiency goals requires the introduction of assumptions about types of available domain knowledge and resource constraints. For example, in the board-game method that the PROTÉGÉII group developed, where a player moves pieces from location to location to achieve a goal configuration of pieces on a board, one formulation of the method may require that the possible-move operation generates only legal moves of the game; whereas for other games, it may be more efficient to generate possible moves that are not necessarily legal and use a second step to prune moves that lead to contradictions (Eriksson et al. 1995). The resulting implementations of the same PSM for these two formulations may not have the same decomposition and control structure (Fensel, Straatman, and Harmelon 1996). Thus, for each PSM, there may exist many variants with different assumptions and behaviors. Storing all variants of a PSM in a library may not be practical. Instead, a PSM library may have to provide tools and operators for transforming a PSM schema into operational methods appropriate for the operating environment of an application task.
Coming from an empirical rather than theoretical perspective, we have reached a similar conclusion about the variability of problem-solving methods. In the Sisyphus-2 experiment (Schreiber and Birmingham 1996), multiple research groups that tried to solve the same elevator-configuration task with the same propose-and-revise method ended up creating their own variants of the method. That experiment gave the knowledge-engineering community a valuable set of cases for studying the nuances of PSM construction. In this paper, we report another case study, one where the same research group implemented the same method for the same task in different settings.
At Stanford we have been developing decision-support systems for protocol-based care for over 10 years. The application task remains substantially the same: Given a detailed clinical protocol that specifies how to treat a patient with a particular disease, and a medical record about the patient's current and past conditions, determine the appropriate treatment steps at a given time. We have implemented several decision-support systems based on the episodic skeletal-plan refinement method (Tu et al. 1992). We modified and extended the skeletal-plan refinement method originally developed for designing molecular biology experiments (Friedland and Iwasaki 1985) for planning therapies for clinical trials. In the method, a clinical protocol is modeled as a skeletal plan that needs to be refined into an execution plan for treating patients at a given time. Because the method is used to instantiate skeletal plans at multiple time points, we call the method episodic skeletal-plan refinement (ESPR).
In successive implementations of the ESPR method, we found that the operating environments in which the PSM was embedded change the decomposition and the control structure of the method substantially, even when the overall functional requirements and the structure of domain knowledge remain essentially the same. The ESPR method was originally implemented as an monolithic problem solver for applying cancer protocols (Tu et al. 1989). We decomposed it into a number of subtasks and submethods in the THELPER system (Musen et al. 1992) that was designed to assist clinicians in the management of patients infected with HIV (Tu et al. 1995). In a new architecture that we are developing (called EON) (Musen et al. 1996), we are re-engineering the PSMs as distributed objects and encapsulating temporal-reasoning capabilities in a data manager. In this new framework, we find that, because of the assumptions we are making about the capability of database-management system, the functionalities and domain knowledge are redistributed to different components. ESPR has to be decomposed in new ways that conform to the requirements of the application environment. In sum, as we re-implement the ESPR method in different settings, we episodically refine it to adapt it to new operating environments.
The possibility of a multitude of versions of a PSM, each adapted to a particular set of assumptions about domain knowledge, resource constraints, and operating environments raises some fundamental questions about the PSM approach of developing knowledge-based systems. Is the PSM approach useful only as a conceptual framework for formulating, analyzing, or reasoning about computational algorithms and their data requirements, but not effective as a strategy for developing reusable component software? Does it mean that the PSMs that we develop are rarely, if ever, sharable across institutions and operating environments?
We believe that reuse and sharing of PSMs is possible, albeit the conditions for their reuse depend on a number of factors, including both their knowledge-level competence and their implementation characteristics. The challenge facing us is to devise strategies that will maximize reusability of the problem-solving components. Not only do we need theoretical work that clarifies the semantics, formalizes the representation, and defines the knowledge requirements of PSMs, we also need to explore alternative implementation frameworks that control for some of the possible variations in operating environments, and that make explicit the characteristics of the operating environment to which it is committed. In this paper, we describe an implementation framework that embodies a two-pronged approach to the problem of reusability: (1) to maximize interoperability of software components, interfaces to components of PSMs are defined in terms of the industry-standard Common Object Request Broker Architecture (CORBA), and (2) to circumscribe the operating environment, powerful temporal data-abstraction and data-management services are available to applications. The resulting framework is thus suitable primarily for those classes of tasks requiring the supported temporal-reasoning or data-management capabilities. Making these commitments regarding an operating environment helps to clarify the type of applications for which the implemented PSMs are appropriate. The requirements of this operating environment in turn have implications for the way PSMs are formulated and configured. Thus, we shall see that certain formulations of method ontologies are more consistent with the way interfaces are defined in CORBA, and that the transaction nature of database queries imposes requirements on relations that map domain-independent PSM terms into domain expressions. It is the thesis of this paper that implementation studies such as the one we present here help us to test the adequacies of our theoretical formulation.
We will first give a capsule summary of the ESPR method (Section 2), and review the way it was implemented in the THELPER system, noting the assumptions it makes about its operating environment (Section 3). Then we will sketch the EON framework that we are developing for more general automation of protocol-based care in medicine (Section 4). Following that, we will detail the formulation of the ESPR method in the EON framework (Section 5). We will see how method configuration can be done as mappings between subtask ontologies of a composite method and method ontologies of the method and mechanisms selected for the subtasks. We will comment on extensions to ontology mapping that we are contemplating, and we will conclude by assessing the current stage of this work.
The ESPR method was formulated for the purpose of applying clinical protocols as skeletal plans in treatment of patients afflicted with known diseases. A detailed description of the method is given elsewhere (Tu, et al. 1995). In this section, we give only a short summary in order to motivate the rest of the discussion.
The ESPR method is decomposed into three main subtasks: (1) proposing plan actions based on a high-level skeletal plan, (2) identifying problems, and (3) modifying the plan actions based on the identified problems (Figure 1). A skeletal plan is a collection of planning entities that are organized into a compositional part-of hierarchy. The planning entities specify how a plan actionóthe details of how an action is to be carried out to affect the state of the worldócan be instantiated in a particular case. The plan actions can be seen as forming an abstraction hierarchy where properties of plan actions have been partially determined. For a given case at a given time point, these plan actions form an execution plan. An execution plan is defined relative to a time point. As time passes and more data become available, a new execution plan appropriate at that time is required. Hence the planning is episodic in that a problem solver using ESPR may generate multiple execution plans, one for each time point where decisions need to be made. A planning entity might be a protocol for prescribing the drug AZT to a patient, and the corresponding plan action would be the prescription for a patient to take AZT over a particular interval of time.
For each subtask of the PSM, the PROTÉGÉII methodology requires a developer to select and configure a method or mechanism for accomplishing the goals or generating the output of the subtask. For the task of proposing an execution plan, we used the instantiate-and-decompose method. It creates a plan action from a given planning entity (or extends a plan action temporally if one already exists) by (1) decomposing the given planning entity into its constituent parts and (2) finding values for the attributes of the new plan action. The decomposition and value-finding processes were formulated as subtasks for which we defined the problem-solving mechanisms use-procedure and use-definition (Figure 1).
For the subtask of detecting problems that may affect the current therapy, we use the knowledge-based temporal-abstraction method made up of five temporal-abstraction mechanisms (Shahar and Musen 1993; Shahar et al. 1994). The temporal-abstraction mechanisms have been designed explicitly for domains where data may vary over time and where actions have temporal duration. These mechanisms process time-varying input data represented as values of primitive parameters (e.g., values of platelet count) to generate interval-based abstractions represented as abstract parameters (e.g., platelet state abstractions of low, normal, and high) (Shahar and Musen 1993).
Figure 1. The subtasks (ovals) and submethods and mechanisms (boxes) of the ESPR method. The method has three subtasks: propose plan, identify problem, and revise plan. For each subtask, we select a method or mechanism to solve it.
Depending on the methods used to generate an execution plan and to identify potential problems, a plan-revision subtask may be necessary. A plan-revision subtask, in this formulation, takes as inputs (1) an execution plan, (2) a set of problem patterns, and (3) an operator to modify plan actions, and outputs a revised execution plan.
Following the PROTÉGÉII methodology, we defined method ontologies to specify the knowledge roles of the ESPR problem-solving method and its submethods and an application ontology to describe the terms and relations, as engineered to satisfy the needs of the ESPR method. For bridging mismatches between subtasks and the methods that have been selected for these subtasks and between terms of the application ontology and those of the method ontology, we defined two sets of mappings. The first set of mappings specifies that, for example, instantiate-and-decompose has been selected for the propose-standard-plan subtask and sets up the correspondence between the inputs and outputs of the subtask and method. The second set of mappings establishes correspondences such as those between protocols and planning entities and between clinical interventions and plan actions.
The implementation of the ESPR method in the THELPER system assumes that PSMs, method and application ontologies, and the domain knowledge base operate within a single process that communicates with the applicationís user interface and database through interfaces that are outside the specification of the PSMs. The PSMs are invoked from the user interface or possibly run in batch mode for a set of cases. The problem solvers load case data from a relational database, apply mappings to convert the data to the representation assumed by the PSMs, and write the results back into the database for later retrieval (Figure 2). Implementers of the PSMs can assume that all PSMs share the same implementation language and the same operating environment, run on a common platform, and have common access to shared knowledge bases and case data.
In this operating environment, PSMs perform their tasks as part of an essentially closed system. Mapping relations, ontologies, domain knowledge, and the code that implements PSMs operate in a monolithic environment that communicates with the external world through limited channels. The implementers of PSMs have a great deal of freedom in designing the interface among their PSMs and between the PSMs and the external data sources and data sinks. In the case of T-HELPER, the submethods of ESPR are invoked according to the specification of the mappings from submethods to subtasks. However, submethods share the global ESPR ontology, and common knowledge roles (e.g., planning entity and plan actions) are implicitly mapped from one to another. The result is a model of reuse where (1) applying a PSM to a new application requires that the interface between the PSMs and the application data be re-engineered from scratch each time, and (2) PSMs from different systems have little chance of interoperating with each other at run time.
Figure 2. The operating environment of the ESPR method within the THELPER system. The arrows represent data flow.
In the T-HELPER formulation of ESPR, the method ontology was defined to be the union of the method ontologies of all of the submethods and mechanisms selected for a particular configuration of ESPR (Figure 3) (Tu et al. 1995). Thus, we could not specify the method ontology of a composite method until all of its submethods and mechanisms have been selected. As a consequence, the method ontology of ESPR varies from case to case, and need to be re-defined each time a new application is built.
In contrast to the closed run-time environment of PSMs in the THELPER system, we have been developing a new architecture where PSM components can be invoked from anywhere and by any application for which they are appropriate. This new environment (1) uses an open standard for defining interfaces between components of a system, and (2) provides a standardized service for making abstractions and for matching patterns on temporal data. We call the new operating environment the EON framework (Figure 4).
The design of the EON framework is motivated by two recent developments. One is the maturing of distributed-object technology in the software industry. The other development is the completion of work on temporal databases and a temporal query language in our laboratory at Stanford, and the realization that, combined with the temporal-abstraction method that we use in THELPER, the new temporal-database technology allows us to construct a powerful mediator (Wiederhold 1992) for applications that make use of temporal data (Das et al. 1994). As a mediator, the process acts between user-oriented applications and traditional database systems to provide temporal-abstraction and temporal-query functionalities.
Figure 3. Part of the method ontology of the ESPR method as defined in the T HELPER system. The method ontology was the union of the method ontologies of the individual submethods and mechanisms of the ESPR.
Object technology has been available for a long time. However, with the adoption of the Common Object Request Broker Architecture (CORBA) 2.0 specification in 1994, an object component that conforms to the CORBA standard can be made portable and invocable across programming languages, operating systems, and networks. With the advent of low-cost bandwidth on wide-area networks and a new generation of popular networked-enabled operating systems, software sharing can take place between desktop systems distributed around the globe. The new standard opens up the possibility of reusing problem-solving components not only within one framework such as PROTÉGÉ-II, but across multiple frameworks as well (Gennari, Stein, and Musen 1996).
Developed by the Object Management Group (OMG), CORBA (OMG 1996) is a standard for a distributed-object infrastructure that defines the interface and mechanisms for sharing object-oriented services across platforms and networks. A distributed-object component is one or more objects that provide a service accessible by invoking methods associated with those objects. OMG requires the use of an Interface Definition Language (IDL) to specify a component's interface with potential clients. The interface consists of declarations of classes of objects, their attributes, the parent classes from which these objects inherit attributes, the methods to which the objects respond, the exceptions they can raise, and the typed events they can create. The IDL allows multiple inheritance and type definitions for the attributes of the objects.
Figure 4. The EON framework for developing decision-support applications that require temporal data abstraction and pattern matching. The components communicate through a federation of Object Request Brokers (ORBs) that are connected by a CORBA bus. The Tzolkin data manager provides temporal-abstraction and database management services.
Objects send requests to, and receive responses from, other objects through Object Request Brokers (ORB). An ORB contains (1) the IDL stubs that define static interfaces to each service provided by the server, (2) interfaces for dynamic invocation of objects, (3) one of more object adapters that accept and dispatch requests for service, and (4) an implementation repository that provides a run-time repository of information about the classes of objects a server supports. To bridge object request brokers that run on different platforms, OMG relies on message exchanges that follow the specification of the Internet Inter-ORB Protocol (IIOP), a mandatory protocol that defines how CORBA messages are transmitted over TCP/IP networks. Each type of ORB for a platform is required to implement a bridge that connects the ORB to the IIOP-based CORBA bus (see Figure 4).
In the EON framework that we are developing, we will be implementing problem-solving methods as CORBA objects. This implementation of PSMs will address the problem of idiosyncratic and limited interface between PSMs and its external environment, and similar problem of closed interface among components of the PSMs. Each sharable component of a PSM will have a public interface that can be inspected and invoked from any other CORBA-compliant program on the network. Components of a PSM will interact with each other in identical fashion, invoking methods of other objects and responding to method invoked by others.
The second motivation in the design the EON framework is our desire to make database technology an explicit part of the knowledge-based system. Most knowledge-based systems, if they use databases at all, treat databases as merely a data repository. As part of the THELPER project, we developed a model for representing temporal data in a relational database, extended the relational algebra to incorporate operations on temporal data, and extended SQL by embedding temporal operations and computations in the query syntax (Das et al. 1992). The implemented system, called Chronus (Das and Musen 1994), can process queries involving complex temporal patterns. It is thus not only a data repository, but a component that provides certain kind of temporal-reasoning capability described below.
The model of temporal data used in Chronus is similar to that of RÉSUMÉ, the subsystem in THELPER that implements the knowledge-based temporal-abstraction method. Chronus stores patient data in relational history tables that time stamps each tuple with two temporal attributes, START_TIME and STOP_TIME (Das et al. 1992). Thus, each row in the table maintains a fact about the patient with that fact's real-world period of validity, which corresponds to an interval on a timeline. For interval-stamped data (such as abstracted episodes of anemia), the first temporal attribute records the starting time point, whereas the second temporal attribute stores the stopping time point. We also can store instant-stamped data (such as hemoglobin-test results) in a history table; for this type of temporal data, the start and stop temporal attributes are equal. Time-invariant data are stored as intervals that exist for all possible time periods; the value of the START_TIME is negative infinity, and that of the STOP_TIME is now.
This representation of instant-based, interval-based, and time-invariant data in a history table allows uniform treatment of relational tables by the query language. The Chronus query language allows us to search for complex temporal patterns such as the following:
These examples illustrate the ability of the Chronus query language to make comparisons of temporal intervals (myelotoxicity during protocol enrollment), to make ordinal selections (second episode of myelotoxicity), and to concatenate of dissimilar temporal intervals (Grade 2 or 3 myelotoxicity). These functionalities obviate the need for the PSMs to make their own search for similar conclusions.
The temporal-pattern-matching functionality of the Chronus system is highly complementary to the temporal-abstraction capability of the RÉSUMÉ system. Whereas RÉSUMÉ extracts from primitive data high-level summaries of a patientís condition over time, Chronus provides a general SQL-based data-access language to make complex temporal pattern-matching on data stored in databases. RÉSUMÉ, unlike Chronus, does not support finding complex temporal patterns expressible as database queries; Chronus, unlike RÉSUMÉ, does not support identification of intervals that are not stored explicitly in the database. Each component thus provides a complementary type of temporal deduction using patient data.
In the EON framework, we are integrating these two components into a single temporal data-abstraction and data-management system that will ensure that the actions of the two modules are coordinated and synergistic. This integration allows different problem-solving applications to access a single system for the temporal-abstraction and temporal-pattern-matching capabilities. The development of a unified temporal-data-management system also enforces consistency and compatibility of the temporal knowledge that is used by both components. The EON temporalñdata-management system, called Tzolkin,* combines extended versions of RÉSUMÉ and Chronus into a single ìserverî process (Das et al. 1994).
Formulating problem-solving modules as CORBA objects and separating out temporal abstraction and data management result in a task-specific architecture that developers can use to build knowledge-based systems that require the capabilities of temporal abstraction and temporal pattern matching. By adopting the CORBA system, we remove software and hardware barriers to reusing components of our PSMs. In developing the Tzolkin temporal data manager, we provide a service useful for certain class of applications, and, in doing so, require these applications to make a particular set of commitments to the representation of case data and to the necessary temporal-abstraction knowledge.
The EON framework thus consists of three different components: (1) the domain knowledge base of medical concepts that will specify the application and temporal-abstraction knowledge necessary for reasoning in the EON framework, (2) one or more applications that use problem-solving methods formulated as collections of CORBA objects, and (3) the Tzolkin subsystem that performs temporal abstraction and temporal pattern matching (see Figure 4). All components are connected through Object Request Brokers to the CORBA bus. The legacy database in which case data are stored may or may not be encapsulated as a separate CORBA object. In the next section, we will examine in detail the implication of this architecture for the formulation of the ESPR method.
Relocating the temporal-abstraction mechanisms from the ESPR method to the Tzolkin data manager forces us to reformulate the ESPR method in significant ways. The temporal-abstraction mechanisms were previously activated during a discrete stage in ESPR's problem-solving process. Now they are invoked on demand as part of a temporal query sent to the Tzolkin server. In this paper, we will not discuss the query-evaluation algorithm for integrating temporal abstraction and temporal query (Das et al. 1994). Instead, we will focus on the reformulation of ESPR as a consequence of separating out the temporal-abstraction process and of re-implementing it as a collection of CORBA objects.
Creating Tzolkin as the module for both temporal abstractions and temporal pattern matching has two consequences. First, the subtask of determining problems with the proposed execution plan using temporal abstractions becomes trivial; the application needs only to query Tzolkin data manager for the existence of particular temporal patterns. Second, at run time, PSMs will be sending out transactional queries to and accepting results of such queries from the Tzolkin temporal database manager, which uses domain-specific abstraction knowledge to resolve some of the queries. Thus the application of mappings between domain terms and method terms is part of the run-time transactions.
Re-implementing PSMs to conform to the CORBA standard requires us to partition PSMs into objects. The CORBA model is agnostic about the granularity of objects. A CORBA object can be as fine-grained as a C++ object. Alternatively, an entire application can be represented as a single object. The choice of granularity depends on both the intended unit of reuse and the possible overhead costs involved in invoking code as CORBA objects. For example, because of the close interaction between temporal abstraction and temporal query processing, the Tzolkin temporal data manager is most easily modeled as a single CORBA object that encapsulates such interactions. We merely define an object to represent the temporal data manager. The application programming interface of the data managerfunctions for operations such as opening a database session, executing a query, receiving the results of a query is naturally specified as the method interface of the object.
A PSM such as ESPR, on the other hand, can be defined as a collection of CORBA objects. We need an object that represents the PSM itself. In addition, elements of the PSMís method ontology are naturally represented as objects in CORBAís IDL. CORBA's object-oriented paradigm is consistent with the frame-based formalism that PROTÉGÉ II uses for its domain and method ontologies, although the procedural orientation of CORBA means that the representation of more complex declarative knowledge is outside its scope. IDL is designed to be an interface language, not a knowledge-representation language. Classes defined in IDL have associated access functions for the attributes of the classes. They are designed for external data accesses, not necessarily as properties of the objects in question. Code that implements the functionalities of the declared CORBA classes has internal representation for objects it manipulates. Consequently, inferences on instances of the classes have to be performed on objects instantiated in the internal representation of the PSMs. As an interface language, IDL includes not only definitions of the publicly accessible classes, but also operations representing messages that can be sent to objects. Thus, the first requirement of reformulating ESPR in this new operating environment is that a PSM must have an IDL interface specification defining its public method ontology and that the invocation of the PSM must be implemented as a message sent to an object representing the PSM.
In the next two subsections, we will describe in detail how we reformulate the ESPR method in terms of CORBA, and how we plan to extend the formulation and implementation of PROTÉGÉII's ontology-mapping relations to satisfy the requirements of the EON framework.
In contrast to the abstract characterization of the terms in the reformulated ESPR method ontology, the method terms in the T HELPER version of ESPR were much more explicit. The method ontology was seen as the union of the terms used in the set of submethods and mechanisms that had been selected for the subtasks of ESPR. In the new implementation of ESPR as CORBA objects, much of the details will be specified only as part of the submethod components, and not at the level of the ESPR method.
The need to reformulate a PSM so that it has an IDL interface specification defining its public method ontology dovetails with recent work on method configuration that the MIKE and PROTÉGÉII groups collaboratively developed (Studer et al. 1996). In short, the approach partitions a method ontology into (1) global definitions that include concepts and relations that are part of the interface specification of a PSM and (2) internal definitions that are used for defining data flow within a PSM. The global definitions include concepts that are part of the PSM's input and output specification as well as higher-level concepts introduced to simplify definitions of more specialized concepts. The internal definitions specify concepts representing those outputs of a method's subtasks that are used only as inputs to other subtasks of the method. This separation between global interface definitions and internal definitions naturally lends itself to an implementation of the interface definitions using IDL.
The inputs and outputs of a method's subtasks are described in terms of subtask ontologies. A subtask ontology uses the terms and relations from the method ontology of the subtask's parent method. The subtask ontology defines the input and output interfaces of a subtask. Configuring a method or mechanism for a subtask involves specifying a mapping between the inputs, outputs, and ontology of the subtask and their equivalents in the method or mechanism. In those cases where inputs are being created from the application knowledge base or case data, the subtask-to-method configuration mapping simply sets up a correspondence between the subtask and the method inputs. Detailed mappings between domain concepts and method concepts are specified only for methods that actually use the corresponding data (Studer et al. 1996).
In the new approach, the use of a subtask ontology to insulate a method from the submethods and mechanisms that may be chosen to implement subtasks of the parent method allows us to specify the method ontology of ESPR without reference to submethods or mechanisms. On the other hand, because the details of the knowledge available to implement the subtasks of ESPR now are not known until specific methods and mechanisms are chosen, we need to characterize abstractly the knowledge roles expected in the ESPR method for the methodís subtasks. In the following, we will sketch briefly method ontology of the reformulated ESPR, and contrast it with the one defined in the T HELPER version.
Figure 5 presents the reformulated ESPR method ontology in the EON framework.
Each class listed as part of the global definition requires a CORBA IDL
interface definition (see, for example, Figure 6). In the global part of
the method ontology, a skeletal plan is made up of hierarchically
organized planning entities and various kinds of knowledge needed
to generate an execution plan made up of plan actions. A
plan action is a method entity for describing a domain-specific action
carried out within a plan. Properties of an action, such as its duration
and its intensity, are specified as attributes of a domain-specific action
object. Determining values for these attributes of action objects constitutes
refinement of plan actions. We can abstractly characterize the refinement
process as a search in the refinement space of a plan operator, where the
refinement space is determined by possible values of those attributes of
domain-specific action (e.g., possible values of drug dose and duration
in a drug prescription) that are mapped into the plan-action knowledge
role. Thus, refining a plan action is a subtask requiring specific refinement
knowledge (e.g., knowledge to calculate the drug dose) that can be
fully described only after an appropriate refinement mechanism for the
subtask is selected. Similarly, determining which planning entities in
the planning-entity decomposition hierarchy to instantiate and which plan
actions to modified, given identified problem patterns, are subtasks that
require domain knowledge whose exactly form cannot be fully specified before
the decomposition and revision mechanisms are chosen.
Global definitions for input:
|
Global definitions for output:
|
Internal Definitions
|
Figure 5. The method ontology of the reformulated ESPR method.
On the other hand, the form of problem-knowledgeóa placeholder for temporal-abstraction knowledgeóis fully specified for the methods implemented in the EON framework. All of the abstraction knowledge, such as abstraction types and inference tables, that were necessary in the current version of RÉSUMÉ temporal-abstraction system still need to be present. In fact, whether an application can supply the necessary temporal-abstraction knowledge and make use of the temporal-pattern matching capability of Tzolkin are good indicators of the suitability of EON for that application. Furthermore, the EON framework imposes the way abstraction knowledge is used. Instead of supplying a data-driven method that searches for all of the possible abstractionsómany of which are events that make changes in the execution plan necessaryóthe database-oriented approach in EON makes the temporal-abstraction and temporal-pattern matching process a query-driven one. Thus, the set of temporal patterns used in queries need to be specified in advance.
The formulation of the ESPR method in EON requires that we extend the mapping relations that we had defined for PROTÉGÉII. In previous formulations of mapping relations, we defined four classes of mappings: rename, filter, attribute, and class mappings (Gennari et al. 1994; Tu et al. 1995). We used mapping relations as translation rules in two ways: (1) by creating method instances from application instances and (2) by creating method-specific views of application instances. Using the first technique, instances of elevator components in the VT task were transformed into instances of parameters used in the propose-and-revise method (Gennari et al. 1994) . The advantage of this technique is that there is a clean separation of domain and method terms, but the disadvantage is that the system is incapable of using dynamically changing domain data, especially if the translation is done at compile time. In the second technique, the method code actually passes around instances of domain classes (e.g., protocol objects), but accesses the domain objects through a set of method-specific views (e.g., accessing the attributes of planning entities instead of those of protocols). The advantage here is that we have dynamic update of and access to domain information. In this approach, however, each method instance must correspond to a single domain instance. In the EON framework, the requirement that our PSMs interact with a transaction-driven temporal data manager implies that mappings between domain data and knowledge roles of the PSMs have to be processed at run time for each transaction. Neither of the existing techniques is sufficient to satisfy this requirement.
interface Plan_Action {
attribute string name;
attribute string action_type;
attribute string status;
attribute datetime start_time;
attribute datetime stop_time;
attribute sequence string domain_attributes;
attribute sequence <attribute_value> attribute_value_pairs;
}
Figure 6. An example of IDL specification. Action type corresponds to the type of intervention (e.g., giving drug prescription) that the plan action represents. Status might be the status of the intervention (e.g., suspended medication). Start and stop times represent the duration of the intervention, and domain attributes and attribute-value pairs specify the details of the intervention (e.g., drug dose, drug route).
In the EON architecture, we have a domain knowledge base containing time-invariant facts about an application area, a temporal database containing facts about particular cases (patients), and knowledge roles of domain-independent problem-solving methods. We need to define mappings (1) between domain terms and relations in the domain knowledge base to knowledge roles of PSMs and (2) between the inputs and outputs of problem-solving methods and the interface of tasks for which the PSMs are selected.
To illustrate the mapping problems involved in configuring a subtask and creating the domain-to-method mapping, we will look at the task of decomposing a planning entity that specifies an ongoing intervention in the domain to the appropriate subcomponents of this planning entity. The subtask has as inputs the current time, the planning entities that are active, and the decomposition knowledge for the composite planning entity to be refined, and as output a set of planning entities, each of which is a part of an input planning entity.
We had developed a mechanism called use-procedure for this decomposition subtask. The mechanism uses plan procedures to model the specification of dynamic behavior of a composite planning entity. A procedure is a plan consisting of the temporal sequencing of other planning entities. We can visualize a procedure as a directed multigraph that has more than one class of nodes. The simplest type of procedure we have defined has two types of nodes, start step and plan step, and uses directed arcs called selections to make choices among alternative plan steps. A selection has a name and an associated selection condition. The mappings from the interface of the decompose subtask to that of the use-procedure mechanism is straightforward (Figure 7).
Figure 7. Mapping of inputs and outputs of the decompose subtask and use-procedure mechanism. The solid arrows indicate data flow and the shaded arrows denote mapping relations.
As described earlier, the decomposition knowledge in the decompose subtask ontology does not have detailed structure. A plan procedure, on the other hand, has specific knowledge requirements (e.g., plan steps, selections, and selection conditions) for determining the progression of plan steps over time. Because of this mismatch, the mapping from decomposition knowledge of a planning entity to its plan procedure is mostly formal, involving little transfer of content knowledge. The important information that is transferred in the mapping is the identity of the planning entity that needs to be decomposed, so that the specific plan procedure for that planning entity can be selected. Detailed knowledge for instantiating terms used in the ontology of the use-procedure mechanism must be derived from the domain-to-method mapping of ontologies. In the domain of protocol-directed therapy planning, planning entities and plan steps correspond to protocols for performing medical procedures, and a plan procedure corresponds to a clinical algorithm for sequencing these medical procedures. Selection conditions in a plan procedure correspond to Boolean patterns in patient data. Thus the mapping must transform objects in the application knowledge base (e.g., protocols) into the method knowledge base (e.g., plan steps), and vice versa. In addition, Boolean conditions on the method side (e.g., selection conditions) must have application equivalents that can be evaluated by the Tzolkin data manager (e.g., predicates involving patient data).
We are developing a generalization of the current PROTÉGÉII mapping relations between application and method knowledge bases to satisfy these requirements. The mappings define ways of translating between domain predicates and method predicates and between sets of domain objects and sets of method objects. For predicate translation, because the Tzolkin data manager is the engine that evaluates Boolean patterns involving patient data, it is redundant to translate domain data into method forms if the data are used only in Boolean queries evaluated by the Tzolkin data manager. Thus it is unnecessary to map all domain data from the application side to the method side. Instead, we define mappings between predicates of the two knowledge bases such that only results of evaluating domain predicate are translated into results of method predicates.
To translate between sets of domain objects and sets of method objects, we take an approach that borrows from relational theory of taking joins between multiple relations. A mapping from instances of a set of classes in the source knowledge base to instances of a target class in the target knowledge base is defined by (1) taking the cross product of instances of the classes in the source knowledge base, (2) applying a set of constraints to select from the cross product, and (3) projecting attributes of classes in the cross product to attributes of the target class in knowledge base. A translation problem that needs to be resolved in the object-oriented framework is that the attribute values of objects can be references to other objects. In order to define mappings that don't rely on knowing object ids, we assume that each object can be identified by a ìkey.î For the sake of simplicity, assume that each object has a ìkeyî attribute whose value is not another instance, but a member of a primitive type such as string or integer. We define conditions under which an object in one knowledge base is translatable from a set of objects in another knowledge base. Based on that, we define ontology mappings between classes of objects in one knowledge base and those in another.
The literature on the reuse of problem-solving methods reports cases where the same domain-independent PSMs are used to solve problems in different application areas. Much recent work on the theory of problem-solving method concentrates on the formal underpinning of PSMs. This work has yielded a great number of insights into the conceptual structure of PSMs and formal methods to model them. On the other hand, we believe that there is an empirical component to the discipline of knowledge engineering. Implementations of PSMs in alternative operating environments can yield insights into the sensitivity of a PSM to changes in the application environment. The power of such approach was amply demonstrated in the Sisyphus II experiments with the elevator-configuration task and propose-and-revise method (Birmingham 1996). In this paper, we explore the changes to a PSM that we have needed to make when we re-engineered it to function in a new operating environment. In this section, we briefly summarize the points elaborated elsewhere in the paper.
First, the formulation of a PSM can change substantially depending on whether it is implemented as a standalone method, as a component of a complex method in a problem solver, or as part of another module in the system. Integrating the knowledge-based temporal-abstraction method into a temporal database obviously changes the way that the method can be used. Instead of a method with a data-driven control structure, the Tzolkin temporal data manager is a transactional program driven by queries and commands sent to it. The temporal patterns that are the results of temporal abstractions need to be formulated as queries into the database. The requirement that our PSMs interact with a transaction-driven temporal data manager also implies that mappings between domain data and the knowledge roles of the PSMs have to be processed at run time for each transaction. Thus we have to extend the existing PROTÉGÉII mapping relations.
Second, implementing the PSMs in a distributed-object framework has forced us to view the PSMs as discrete units that communicate with each other through well-defined interfaces. We have abandoned the previous concept of the method ontology of a composite PSM as the union of the method ontologies of its constituent submethods and mechanisms. Using the subtask ontology of a method as the source of mappings to the knowledge roles of the submethod selected to implement the subtask allows us to specify a methodís ontology in a modular way. Before the selection of submethods and mechanisms for the subtask, however, the detailed structure of the necessary domain knowledge is unknown. Thus, the concepts in the top-level method have to be formulated in abstract terms. The possibility that the mapping from a subtask input to a method input may only be a formal correspondence introduces additional complexity in the mapping relations between subtask ontology and submethods and between the application and method ontologies.
We are not at the stage where we can formulate a theory of how characteristics of an application's operating environment affect the formulation of PSMs. What we have described in this paper is a case study where, because the application problem and PSMs remain the same, the impact of the implementation environment is especially clear. It may appears that the approach we have taken here is counter to much of the recent literature on knowledge-based system development, where it is exhorted that knowledge modeling should be done without being constrained by implementation details. The CommonKADS modeling framework, for example, calls for ìthe description of problem solving behavior at a conceptual level that is independent from representation and implementation decisionsî (Wielinga et al. 1993). The contradiction is only apparent though. It is not our aim to dispute the need for knowledge-level analysis of PSMs and of problem domains. What we have shown here are (1) the utility of using implementations of PSMs to test the limits of our conceptual understanding of PSMs, and (2) operating environments, that is, the computing resources available to implement an application, have an impact on the formulation of a PSM used in that application. Our goal is not to bring back the type of thinking where symbol-level concerns dominates the construction of knowledge-based systems. Rather, we hope to elevate concerns about operating environments of PSMs in the conceptual modeling of PSMs.
Our conclusions are similar to the ones reached in recent work on competence theory of PSMs (Fensel, Straatman, and Harmelon 1996), where it was shown that varying assumptions about the available domain knowledge and the functionality of the task lead to varying conceptual structures in the Propose-and-Review problem-solving method. In this paper, through examining our past and current implementation of ESPR, we show that what affect the formulation of PSMs include not only the available domain knowledge, but also the operating environment of the PSMs. Our conclusion is consistent with their claim of that the role of PSMs is to introduce assumptions about domain knowledge and about the task to improve the efficiency of achieving a goal. Operating environments, in particular the available computing resources, certainly help to determine the efficiency of an application.
The practical implication of this paper points toward the importance of iterative modeling and implementation. Given that we have limited understanding of the constraints imposed by operating environments, and that such environment often change, we have to be open to the possibility of revising the formulation of our PSMs as we implement and test our applications. The MIKE approach (Angele et al. 1993), has provisions for revising specification items after designing and implementing parts of a system. The PROTÉGÉ-II system, with its tool kit for rapidly modifying ontologies and regenerating knowledge-acquisition tool, takes the same approach.
Finally, we believe that knowledge-based systems cannot be constructed in isolation from development in the software-engineering community. We have much to learn from them in the areas of implementation and software architecture. We should take advantage of emerging technology such as CORBA, and demonstrate that our frameworks are consistent with it. Now that heterogeneity of programming languages, operating systems, and platforms should, in principle, no longer be barriers to sharing software, the remaining barriers are differences in ontological commitments, in assumptions made in algorithms and architectural designs, and in the resources shared by the componentsótopics that researchers in the knowledge-engineering community have been exploring for years and about which they have much to say.
This work has been supported in part by grants LM05708 and LM05304 from the National Library of Medicine, by grant CA65426 from the National Cancer Institute, and by contract N66001-94-D-6052 supported by the Defense Advanced Research Projects Agency. Dr. Musen is the recipient of National Science Foundation Young Investigator Award IRI-9257578.
We are grateful to Amar Das, John Gennari, Yuval Shahar, and Rudi Studer for discussions related to the problem-solving methods, and to Zaki Hasan for software development of the THELPER system.
Angele, J., Fensel, D., Landes, D., Neubert, S., Studer, R. (1993). Model-based and incremental knowledge engineering: the MIKE approach, in J. Cuena (Ed.), Knowledge Oriented Software Design. Amsterdam: Elsevier. 139ñ168.
Chandrasekaran, B., Johnson, T.R. and Smith, J.W. (1992). Task-structure analysis for knowledge modeling. Communications of the ACM 35(9): 124ñ137.
Das, A.K. and Musen, M.A. (1994). A temporal query system for protocol-directed decision support. Methods of Information in Medicine 33(4): 358ñ370.
Das, A.K., Shahar, Y., Tu, S.W. and Musen, M.A. (1994). A temporal-abstraction mediator for protocol-based decision-support systems. Proceedings of the Eighteenth Annual Symposium on Computer Applications in Medical Care, Washington DC, 320ñ324.
Das, A.K., Tu, S.W., Purcell, G. and Musen, M.A. (1992). An extended SQL for temporal data management in clinical decision-support systems. Proceedings of the Sixteenth Annual Symposium on Computer Applications in Medical Care, Baltimore, MD, 128ñ132.
Eriksson, H. Shahar, Y., Tu, S.W., Puerta, A.R., Musen, M.A. (1995). Task modeling with reusable problem-solving method. Artificial Intelligence 79: 293ñ326.
Fensel, D., Straatman, R. and Harmelen, F.Y. (1996). The mincer metaphor: A new view on problem-solving methods for knowledge-based systems? Proceedings of the 6th Workshop on Knowledge Engineering Methods and Languages, Paris, France.
Friedland, P.E. and Iwasaki, Y. (1985). The concept and implementation of skeletal plans. Journal of Automated Reasoning 1: 161ñ208.
Gennari, J.H., Tu, S.W., Rothenfluh, T.E. and Musen, M.A. (1994). Mapping domains to methods in support of reuse. International Journal of Human-Computer Studies 41: 399ñ424.
Gennari, J.H., Altman, R.B. and Musen, M.A. (1995). Reuse with Protégé-II: From Elevators to Ribosomes. Proceedings of the ACM-SigSoft 1995 Symposium on Software Reusability, Seattle, WA, 72-80.
Gennari, J.H., Stein, A., and Musen, M.A. (1996). Reuse for knowledge-based systems and CORBA components. Proceedings of the 10th Banff Knowledge Acquisition for Knowledge-Based Systems Workshop, Banff, Canada.
Klinker, G., Bhola, C., Dallemagne, G., Marques, D. and McDermott, J. (1991). Usable and reusable programming constructs. Knowledge Acquisition 3: 117ñ135.
McDermott, J. (1988). Preliminary steps toward a taxonomy of problem-solving methods. Automating Knowledge Acquisition for Expert Systems. Boston: Kluwer Academic. 225ñ256.
Molina, M. and Shahar, Y. (1996). Problem-solving method reuse and assembly: From clinical monitoring to traffic control. Proceedings of the 10th Banff Knowledge Acquisition for Knowledge-Based Systems Workshop, Banff, Canada.
Musen, M.A. Carlson, R.W., Fagan, L.M., Deresinski, S.C., Shortliffe, E.H. (1992). T-HELPER: automated support for community-based clinical research. Proceedings of the Sixteenth Annual Symposium on Computer Applications in Medical Care, Baltimore MD, McGraw-Hill, 719ñ723.
Musen, M.A. Tu, S.W., Das, A.K., Shahar, Y. (1996). EON: A component-based approach to automation of protocol-directed therapy. Journal of the American Medical Informatics Association, (in press).
The Object Management Group (1996). CORBA 2.0 specification. Technical Report ptc/96-03-04.
Schreiber, A.T., Wielinga, B.J. and Breuker, J.A., eds. (1993). KADS: A principled approach to knowledge-based system development London, Academic Press.
Schreiber, A.T., Birmingham, B., eds. (1996). Special issue on the Sisyphus-VT Initiative, International Journal of Human-Computer Studies 44(3/4): 1ñ365.
Shahar, Y., Das, A.K., Tu, S.W., Kraemer, F.M. and Musen, M.A. (1994). Knowledge-based temporal abstraction for diabetic monitoring. Proceedings of the Eighteenth Annual Symposium on Computer Applications in Medical Care, Washington, DC, 697ñ701.
Shahar, Y. and Musen, M.A. (1993). RÉSUMÉ: A temporal-abstraction system for patient monitoring. Computers and Biomedical Research 26(3): 255ñ273.
Steels, L. (1990). Components of expertise. AI Magazine 11: 30ñ49.
Studer, R., Eriksson, H., Gennari, J.H., Tu, S., Fensel, D. et al. (1996). Ontologies and the configuration of problem-solving methods. Proceedings of the 10th Banff Knowledge Acquisition for Knowledge-Based Systems Workshop, Banff, Canada.
Tu, S.W., Kahn, M.G., Musen, M.A., Ferguson, J.C., Shortliffe, E.H. et al. (1989). Episodic skeletal-plan refinement on temporal data. Communications of ACM 32: 1439ñ1455.
Tu, S.W., Shahar, Y., Dawes, J., Winkles, J., Puerta, A.R., Musen, M.A. (1992). A problem-solving model for episodic skeletal-plan refinement. Knowledge Acquisition 4(2): 197ñ200.
Tu, S.W., Eriksson, H., Gennari, J.H., Shahar, Y. and Musen, M.A. (1995). Ontology-based configuration of problem-solving methods and generation of knowledge-acquisition tools: Application of PROTÉGÉ-II to protocol-based decision support. Artificial Intelligence in Medicine 7: 257ñ289.
Wiederhold, G. (1992). Mediators in the architecture of future information systems. IEEE Computer 25:38ñ50.
Wielinga, B.J., Schreiber, A.T. and Breuker, J.A. (1992). A modelling approach to knowledge engineering. Knowledge Acquisition 4(1): 5ñ53.
Wielinga, B.J., Van de Velde, W., Schreiber, G., and Akkermans, H. (1993).
Towards a unificationof knowledge modeling approaches, in J.M. Davids,
J.P. Krivine, and R. Simmons (Eds.) Second-Generation Expert Systems,
Springer-Verlag. 299ñ335.