Abstract:
For many years, researchers in knowledge-based systems have worked toward the development of sharable and reusable problem-solving methods and knowledge bases. The aim is to reduce development and maintenance costs, and to build flexible, component-based systems that can be adapted to changing environments. Unfortunately, despite conceptual progress in building and connecting components, there has been little success with large-scale, cross-platform implementations of sharable component libraries.
Recently, the CORBA 2.0 specification has made this sort of component reuse possible. The software industry is moving to solve exactly those problems that have hindered implementation of reuse for knowledge-based systems. Thus, researchers interested in reusable components for knowledge-based systems should implement their ideas with these emerging standards. Although CORBA provides an implementation-level mechanism for reuse, the challenge remains to build frameworks for libraries of components that share some semantics and can interoperate to be used to build systems that solve large, real-world problems. We illustrate how CORBA might be used to implement knowledge-based systems reuse with a number of scenarios drawn from our experience with reuse and CORBA within the Protégé environment.
1. Knowledge-Based Systems and CORBA
For many years, researchers in knowledge acquisition and knowledge-based systems have described problem-solving methods as components that are re-usable in a plug-and-play manner (Chandrasekaran, 1986; Walther, Eriksson & Musen, 1992; Eriksson et al., 1995; Wielinga et al., 1993). If problem-solving methods and the knowledge bases they use are both reusable components, then this should lead to a number of benefits for the developer: Since knowledge-base construction is expensive and time-consuming, the reuse of an existing knowledge base, even if it requires adaptation or augmentation, should lead to significant savings. Likewise, the use of a pre-tested and debugged problem-solving method should reduce maintenance costs for the developer.
To date, most progress in this area has been at a conceptual level, rather than at the implementation level. For example, the KADS project and related efforts have developed large libraries of problem-solving method specifications (Breuker & Van de Velde, 1994). Although these specifications are sharable and have proved worthwhile, the implementations of these methods are not sharable, and must be rebuilt at every site that uses a particular method. Similarly, our own work has focused on a theory of mapping relations that act as the glue to interconnect components of a knowledge-based system (Gennari, Tu, Rothenfluh & Musen, 1994). Unfortunately, it has been difficult to test this theory with many examples, since the cost of building a large library of components is high. A significant impediment to further progress in research on reusable components has been technological: Knowledge-based systems have not been sharable across platforms and development environments. Different systems are built with a variety of knowledge-representation languages (e.g., CLIPS, LOOM, or OPS5), and require components to be in different formats. Researchers have built large-scale environments or toolsets for constructing components that fit together. For example, KresT (Steele, 1990; Goossens, 1995), VITAL (Shadbolt, Motta & Rouge, 1993), and PROTÉGÉ-II (Puerta, Egar, Tu, & Musen, 1992) are all architectures for the development of components and component-based systems, yet none of these can use components built outside of their own environment; these systems cannot interoperate. Thus, each group of developers must build up its own library of components at significant cost, rather than being able to start with components shared from other architectures. This seems almost ironic, since the research is aimed at reuse, where interoperation of shared components should be a primary benefit.
In December 1994, the Object Management Group approved version 2.0 of the Common Object Request Broker Architecture (CORBA), a specification that describes how software components can interoperate across networks, languages and platforms. More significantly, since the release of this specification, a number of software companies have aggressively moved to develop environments and products that conform to or incorporate this standard, including Sun Microsystems, Microsoft (via bridging mechanisms to OLE), and other major vendors. In short, the software industry is moving toward standards that will enable large-scale reuse of components and distributed computing.
This technological advance presents an opportunity and a challenge to developers of knowledge-based systems. We believe that academic ideas about reuse must be matched with this emerging industry standard for the implementation of distributed, reusable components. This is an important marriage, for several reasons. First, the ideas are complementary: CORBA is primarily about specifying component syntax and distributed implementations, whereas research in knowledge-based systems has addressed issues of component design, component glue, and component semantics. Second, we cannot evaluate our ideas without real-world, large-scale implementations of reuse component libraries. The CORBA standard should help distribute the cost of building reuse libraries: Any developer who subscribes to the standard can add to a world-wide library of CORBA components. Finally, if we fail to meet this challenge, part of our research effort could become obsolete and irrelevant if the software industry succeeds in creating a world of reusable, distributed software components.
To argue for this position, we begin by reviewing reuse for knowledge based systems, and then by briefly describing the CORBA approach, emphasizing those ideas that are most applicable for knowledge-based systems. Next, we provide a series of detailed reuse scenarios: examples of knowledge-based systems reuse implemented with CORBA components. These scenarios include our experiences building CORBA components and integrating these with the Protégé project. Finally, we conclude by discussing weaknesses of the CORBA framework and by describing the differences between building an ontology for a component versus specifying the interface of a component using CORBA's interface definition language (IDL).
2. Knowledge-Based Systems and Reuse
In addition to those who argue for the reuse of these large-scale components, a number of researchers support the idea that problem-solving methods themselves are decomposable into smaller-grained reusable components (Steels, 1990; Tu et al., 1995). For the PROTÉGÉ-II project, a decomposable problem-solving method is viewed as in Figure 1 (Walther et al., 1992; Eriksson et al., 1995). This figure shows a top level task, or problem description, being matched by some problem-solving method. This method might be either custom-developed for this particular task, or it might adapted from an existing problem-solving method in the library. In either case, the method should be decomposable: in Figure 1, the method is shown as decomposable into three sub-tasks, {x,y,z}. In turn, these sub-tasks need to be matched to lower-level problem-solving methods. The degree to which a method is decomposable is dictated by the amount of reuse expected: The aim is to share the lower-level methods that solve general-use tasks, such as the methods {a,b,e,f} in Figure 1, that implement the tasks {x, y, z}.
For any type of reuse, components must be described in an abstract fashion to allow developers to find and understand components in a library; the abstractions act as indexes used to search the library of components. For knowledge-based systems, these abstractions are represented in ontologies (Gruber, 1993; Guarino & Giaretta, 1995). An ontology is a model for the component; For example, a domain ontology is a set of hierarchically organized classes that model the information in a knowledge base. A method ontology serves the same purpose for a problem-solving method, describing the inputs and outputs used by the method. Our aim is to build a large library of method and domain ontologies so that developers can access and reuse knowledge bases and problem-solving methods. As we will see, the CORBA 2.0 specification is also aimed at the construction of large libraries of components. Thus, we argue that methods and knowledge bases should be implemented as CORBA components.
3. CORBA: Reuse and Distributed Computing
Although a full description of the CORBA effort is beyond the scope of this paper (for example, see Orfali, Harkey & Edwards, 1996), we will give a brief presentation of the main ideas, highlighting those aspects that mesh tightly with ideas about reuse in the context of knowledge-based systems. Figure 2 provides a very high-level view of a CORBA implementation of components. At the bottom of the figure is the CORBA Bus, representing the network of machines with CORBA-compliant object request brokers (ORBs). Any developer makes his or her component available for reuse by registering the component into the local interface repository. A naming convention scheme is used to ensure unique component names across the CORBA bus. As part of this registration process, the developer must describe the component with an interface definition language (IDL) specification. In Section 5, we describe IDL specifications in more detail, and contrast them to ontologies for knowledge-based systems.
The match between this model and the ideas for decomposable problem-solving methods shown in Figure 1 should be clear. As we will show in greater detail in Section 5, the lower-level generic methods in Figure 1 can be implemented as CORBA components, allowing developers of knowledge-based systems to plug-and-play the components as needed for particular tasks. In addition, the CORBA infrastructure allows for distributed computing and platform independence. Thus, components to be reused need not be imported into a local environment. In some situations, distributed computing can create run-time inefficiencies due to network costs. For example, a call from a client at machine C to the statistical analysis component on machine A must go through C's ORB, across the network to A's ORB, and then likewise travel between machine A and machine B (when calling the cluster analysis subroutine), before finally returning solutions back across the network to machine C. However, these run-time costs are usually balanced by the benefit of decreased installation and maintenance costs: If a CORBA component resides on a single machine, its maintenance costs are amortized over the set of all clients that use that component.
For developers of knowledge-based systems, the primary value of CORBA is its ability to provide a standard for communication among a library of sharable, platform-independent components. In the next section, we provide three examples of knowledge-based systems reuse, where components are implemented to use the CORBA standard for interoperation and reuse.
4. KBS Reuse Scenarios with CORBA
Scenario 1: A knowledge base server, with PSM clients
Different developers may want to apply a number of different problem-solving methods to a given knowledge base. The problem-solving methods designed by these developers are clients, making requests of a server for access to a knowledge base. For example, Figure 3 shows three developers using the Internist-1 knowledge base implemented within the Protégé environment (Miller, Pople & Myers, 1982; Musen, Gennari & Wong, 1995). This knowledge base includes medical diseases and diagnoses, patient symptoms and findings, links between diseases and symptoms, and a large set of current medical references, including their links to particular diseases and/or symptoms. A CORBA implementation of this knowledge base would include three parts: (1) a persistent store system for the actual data (shown in Figure 3 at the top), (2) the server object that implements accessor functions into the knowledge base, and (3) an IDL specification of those accessor functions. Because the Protégé environment imposes a common storage format for all knowledge bases, we can build a single server that provides CORBA access to any Protégé knowledge base. This approach is generalizable beyond Protégé Any development environment or system for a common persistent store could have a single CORBA front-end that would make those knowledge bases available over the network for remote clients or users of the knowledge bases.
Scenario 2: The PSM as a server, with knowledge bases as clients
A single problem-solving method, if it is designed for reuse, may be used with a number of different knowledge bases. Figure 4 shows this scenario where the reusable problem-solving method is a constraint-satisfaction engine. For the knowledge-acquisition research community, a familiar use of such a problem-solving method is with the Sisyphus-2 elevator design problem (Schreiber & Birmingham, 1996). Another domain where tasks may require constraint satisfaction is molecular biology: For example, to determine the three-dimensional configuration of a ribosome, a number of atomic distance constraints between parts of the ribosome must be satisfied (Altman, Weiser & Noller, 1994).
Within PROTÉGÉ-II, Gennari et al. (1995) have demonstrated exactly the reuse indicated by Developers B and C in Figure 4. We used a single problem-solving method to solve both the ribosome configuration problem and the Sisyphus-2 elevator design problem. In PROTÉGÉ-II, the components were implemented as different knowledge bases and methods within the CLIPS production-system language. Because of this common base, communication problems among components were not an issue. To achieve interoperability across different development environments and platforms, we would need a component communication standard such as CORBA, and an implementation of components as outlined in Figure 4.
This scenario provides clear motivation for designing problem-solving methods as modular CORBA components. If problem-solving methods are designed to be independent of knowledge bases, then they can be called and reused with any number of different versions of the experimental data. For example, the RiboKB knowledge base includes experimental data about ribosomal sequence information and atomic distance constraints. Developer B may question the validity of this experimental data, and may want to build a different or modified knowledge base. As long as Developer B's knowledge base follows the same form as specified by the IDL for knowledge-base access included in the shared problem-solving method, then the same constraint satisfaction problem-solving method can be applied to different versions of the knowledge base. Any developer can use the shared constraint satisfier because this CORBA component does not care where the knowledge base resides�it simply makes accessing calls and uses the CORBA bus to find the implementation of those KB accessing calls.
It should be clear that the knowledge base for ribosomal information in Figure 4 could be reused just as the Intern knowledge base was in scenario 1; thus, this knowledge base could be used with several different problem-solving methods, not just constraint satisfaction. Note that knowledge bases sometimes act as CORBA servers (scenario 1) and sometimes as clients (Developer B in scenario 2). This is not inconsistent: a single component can act both as a client and as a server. Thus, the IDL specifications for the constraint-satisfaction PSM component includes both methods for constraint satisfaction that are implemented as a service, and specifications for KB access that the component uses as a client and does not implement; it is assumed that these access functions are implemented by some other component on the CORBA bus.
Scenario 3: The problem-solving method as a server, with other PSMs as clients
For a number of years, the knowledge-based systems community has envisioned a library of problem-solving methods as reusable components, so that developers could build up these method components to satisfy the task at hand (Wielinga et al., 1993; Walther et al., 1992; van Heijst et al., 1995; Tu et al., 1995). However, until now, we have lacked an environment in which these ideas could be readily implemented. As described in Section 2, CORBA is designed to allow submethods to be called remotely as sharable components of a large problem-solving method.
As an example, Figure 5 shows the decomposition into components of a method to perform diagnosis using the Internist-1 knowledge base. The top-level problem-solving method is a loop that iterates through three sub-tasks: (1) elicit patient findings from the user, (2) retrieve potentially relevant diseases from the knowledge base, and (3) rank these diseases to produce a differential diagnosis based on known findings. As shown in the figure, these sub-tasks can be implemented by different sub-methods. For example, depending on needs for retrieval efficiency, the diseases may be stored in different sorts of persistent storage systems, requiring different implementations for the retrieve subtask. Similarly, different developers may prefer different algorithms for sorting and weighting the retrieved information to do the diagnose subtask.
Figure 5. A composite problem-solving method: the top-level task is broken into three subtasks, which may be implemented by different lower level problem-solving methods. In Figure 6, these three subtasks are implemented by CORBA components.
The ability to configure the Diagnosis problem-solving method with different choices of lower-level methods can be supported by implementing each of the submethods of Figure 5 as a CORBA component. Using a commercial product for our object request broker (ORB), we have completed an implementation of these components, using the Java language for the GUI that implements the Elicit findings sub-task, and C++ for the other two sub-tasks for the Diagnosis method. A complete scenario for reusing these CORBA components is shown in Figure 6, where Developers A and B build two different configurations of the same top-level diagnosis problem-solving method. Our initial implementation (Developer A) uses a flat file for knowledge base storage, and a simple version of the original Internist-1 algorithm. In contrast, DeveloperB may need a relational database (RDBMS) implementation for the knowledge base access (Intern_V2), and may build a diagnosis algorithm based on cover-and-differentiate. Although we have built a Java GUI front-end for eliciting patient findings, Developer B may want a special-purpose GUI that is more suited to local platform capabilities.
Figure 6. Reuse scenario 3: Two developers build alternative versions that implement the top level problem-solving method Diagnosis (see Figure 5), by calling CORBA components that implement various subtasks of the method.
The extent to which IDL specifications are shared dictates the ease of interoperation. Thus, if the Intern_V1 and Intern_V2 knowledge bases share the same IDL (as indicated in Figure 6), then developers can seamlessly interchange these two components. Likewise, if a developer wants to connect Diagnosis_B with Intern_V1, then the latter component must implement those access functions called by the PSM component. Of course, the IDL need not match exactly -- presumably, the problem-solving method IDL differs between Diagnosis_A and Diagnosis_B, and this difference would be reflected in any clients.
Whereas the ability to interchange and reuse components of a knowledge-based system is not a new idea, the development of the CORBA standard for inter-component communication, and of commercial products that support this standard is new. This standard allows developers to ignore lower-level communication details, and to specify components above the level of programming languages or development environments. As more developers use this standard, it will become easier for researchers and developers to reuse components built at other sites and in other environments.
5. IDL Specifications and Ontologies
CORBA is an object-oriented standard for component architectures. Therefore, IDL specifications are object-centered, and allow for inheritance, polymorphism, and encapsulation. Each object in IDL includes a declaration of shared methods and static attributes; these are interfaces, since they specify the syntax of the public interface into the object. Figure 7 shows the IDL specification for the Internist-1 problem-solving method, including the specifications for accessing an Internist-1 knowledge base. This specification includes two interfaces, KB_Loader and PSM_Session, each with attributes, methods and subsidiary data structures. Since this IDL must be compiled into components that either implement or use these interfaces, we have compiled it with each of the three components in our re-implementation of the Internist-1 system:
1) The problem-solving method component, which implements the methods listed under the PSM_Session interface, and makes client calls to the KB_Loader methods.
2) The knowledge-base server component, which implements the methods listed under the KB_Loader interface. Currently, these methods are implemented by interacting with flat files containing the Internist-1 database.
3) The Java front-end component, which makes client calls to methods in both PSM_Session and KB_Loader.
/*----------------------------------------------------------------------------* /* File: INTERN.IDL /* The idl specification for access to the Internist-1 knowledge base, and /* use of a re-implementation of the Internist-1 algorithm for diagnosis. /* /* Programmer: Adam Stein (w/John Gennari) /*------------------------------------------------------------------8/28/96-*/ module internist { enum evidence_status {POS, NEG}; typedef sequenceStringSeq; /************************* /* Data structures: /* Manifestation, Evidence, & Classification. /* Each of these are used by accessors into the Internist-1 /* KB. The attributes "evok" and "freq" have specific /* meanings for the Internist-1 problem-solving method. /*************************/ struct Manifestation { string name; short evok, freq; }; typedef sequence ManSeq; struct Classification { string name; ManSeq manifestations; }; struct Evidence { string name; short importance; evidence_status status; StringSeq classifications; }; /************************* /* The KB Access interface: /* Includes methods that use the above data structures /* to access a knowledge base of Internist-1 data. /*************************/ interface KB_Loader { readonly attribute string KB_name; Evidence getEvidence(in string evidence); Classification getClassification(in string classification); StringSeq getClassNames(in string startName, in short len); StringSeq getEvidenceNames(in string startName, in short len); StringSeq scanEvidence(in string search); }; /************************* /* The PSM session: /* includes methods for initializing a session (by reading /* a KB), adding or removing evidence, and getting the /* conclusions or diagnoses (using the Internist-1 PSM). /*************************/ struct Conclusion { string classification; short score; }; typedef sequence ConSeq; interface PSM_Session { void initSession(in KB_Loader loader) void clear(); void addEvidence(in string evidence, in evidence_status status) void undoEvidence(in string finding) ConSeq getConclusions(in short n); }; }; /* end of internist module */
Figure 7. The IDL specification for the Internist-1 problem-solving method and for access into an Internist-1 knowledge-base.
These three components correspond exactly to the three subtasks of the Diagnosis method shown in Figure 5 . Thus, KB_Loader is an interface to the CORBA component that implements the retrieve disease subtask of Figure 5 . Likewise, PSM_Session specifies the interface to the component that implements the Internist-1 algorithm for performing the diagnose subtask of Figure 5 . As specified by the IDL in Figure 7, these three components cannot be further sub-divided into subtasks or submethods. For CORBA, the granularity of components is defined by the interface construct. Thus, while the Java front-end could replace PSM_Session with an alternative implementation of the diagnose task, it cannot change any lower-level details of the Internist-1 algorithm.
The IDL specification of Figure 7 defines the exact syntax of the inputs and outputs of each component, and therefore allows us to plug-and-play components across networks and with any CORBA-compliant object request broker system. For example, we plan to replace our implementation of KB_Loader with a more generic interface for access into Protégé knowledge bases, as described in Figure 3. As long as the new component continues to support the original IDL specification for KB access, we can replace the knowledge base component with an alternative knowledge base and/or knowledge base server system, without affecting the front-end component or the problem-solving method component.
As currently specified, the KB_Loader in Figure 7 is custom-tailored for the diagnosis problem and for our Java GUI. This component is used as a service by both the problem-solving method component and by the Java front-end component. The former uses the methods getEvidence and getClassification to retrieve information about specific diseases and findings in the knowledge base. The front-end component uses the latter three methods to allow the end-user some browsing capability into the knowledge-base. For example, getEvidenceNames(�fever�, 30) returns 30 alphabetically sorted finding names, beginning with �fever�.
Unfortunately, although IDL specifies the syntax necessary for interoperation, it does not describe the semantics for what these methods do, or how they are implemented. For example, the behavior of getEvidenceNames is clear only by explanation or documentation, and therefore susceptible to misinterpretation and misunderstanding. Because the implementation of such functions is hidden, clients must (1) trust that they understand the developer's intended semantics, and (2) trust that the developer correctly implemented those semantics. CORBA provides only the syntax for interoperation; semantics are implicit and unenforced.
While CORBA makes it possible for developers to independently contribute to a library of components across platforms and languages, it offers little or no help with the knowledge-level task of ensuring that particular components actually can work together. For components to work together, they must share some semantics, and these must be made explicit. In fact, the Object Management Group recognizes this weakness, and is trying to develop standards beyond CORBA 2.0 for building suites of inter-connected components called Frameworks and higher-level components that share semantics called Business Objects. This work is still in the development stage, but would seem to be similar to research in knowledge-based systems aimed at integrating suites of problem-solving methods.
Acknowledgments
This work has been supported in part by grants LM05157, LM05652, and LM05208 from the National Library of Medicine, and by support from the Defense Advanced Research Projects Agency (NRAD contract #N66001-94-D-605). Dr. Musen is the recipient of National Science Foundation Young Investigator Award IRI-9257578.
We wish to thank Samson Tu for comments on an earlier version of this paper, and the entire Protégé group for our ideas about CORBA and Protégé We also wish to thank all attendees of the First Protégé Users & Developers Workshop held in March, 1996; this paper includes many ideas from that workshop.
Chandrasekaran, B. (1983). Toward a taxonomy of problem-solving types. AI Magazine, 4(4), 9-17.
Steels, L. (1990). Components of expertise. AI Magazine, 11(2), 28-49.