Mappings for Reuse in Knowledge-Based Systems

John Y. Park, John H. Gennari, Mark A. Musen
Stanford Medical Informatics
Stanford University School of Medicine
Stanford, CA 94305-5479
email: {park,gennari,musen}@smi.stanford.edu

Abstract

By dividing the world into domain knowledge and problem-solving methods that reason over that knowledge, knowledge-based systems seek to promote reusability and shareability of the given components. However, to effect a working system, the components must be mated subsequently within some global structuring scheme. We created a knowledge-based design for one such structuring scheme, for composing working applications from reusable knowledge-based components. The design is based on the concept of declarative mapping relations, which are explicit specifications for the syntactic and semantic connections between entities in the knowledge and method components. We structured the design of the mapping relation types into a principled mapping ontology. We built a functional implementation of this design within the Protégé system. To assess its utility, we applied the mapping system to three evaluation studies of component-based reuse.

1. REUSABLE COMPONENTS AND MAPPINGS IN KNOWLEDGE-BASED SYSTEMS

Knowledge-based applications often make the distinction between declarative domain knowledge and the problem-solving methods that reason over that knowledge. This dichotomous abstraction promotes a degree of independence among the given components, and therefore offers the hope of being able to share and reuse these components. Work over many years in knowledge-based systems has resulted in large numbers of knowledge bases for many diverse domains, and of tested and refined problem-solving methods. Given a new task, there is a good chance that the relevant domain knowledge and an appropriate, implemented method already exist. The windfall would be realized if we could take this instantiated knowledge base and problem-solving method, and, with limited syntactic and semantic glue, bind them into an application to accomplish our task.

As part of the Protégé project, we are exploring the entire spectrum of knowledge-based system design, sharing, and reuse [Eriksson et al., 1995; Musen et al., 1995]. We are also addressing the background issues of problem-solving-method creation, formal description, and selection [Grosso et al., in press]. The research presented in this paper starts at the point where the candidate components have been selected and analyzed. We address the problem of binding the chosen elements into a working application entity, and providing the infrastructure that such a reuse task entails. Our work is concerned with both the theoretical and the engineering-related issues surrounding how best to connect these components, and the nature of translating objects and concepts between components. We propose a method of performing this intercomponent connection based on building sets of declarative mapping relations that define the translations, and we demonstrate a system that implements this approach to solve three diverse problems through component-based reuse.

2. COMPONENT-BASED REUSE

Our research focuses on synthesizing knowledge-based applications by combining discrete, independent components. Knowledge-based systems can be divided into components in many ways. A simple, natural decomposition common to many frameworks, and that used within the Protégé system, is to divide knowledge-based systems into two fundamental classes: domain knowledge bases, and the domain-independent problem-solving methods that can reason over these knowledge bases [Eriksson et al., 1995]. Another common interpretation of component-based reuse is the composition of problem-solving methods from existing component submethods. The work covered here can easily be adapted to the task of mapping between submethods, but that is not the subject of this paper.

The domain knowledge base encodes knowledge about the task domain: the subset of the knowledge in a problem domain (e.g., medicine) that is relevant to the particular task of interest (e.g., diagnosis). It can be modeled as taking form in two entities: a structured definition of the types of objects and relations that can exist in our domain (the domain ontology), and the set of instances of members of that structure (the knowledge-base instances) [Grosso et al., in press]. In such a model, knowledge per se can be represented either in the structure of the ontology or in the instances themselves; a given decision of where to encode what knowledge is often neither uniquely correct nor optimal.

The second major component of knowledge-based systems is the problem-solving method. A method can be thought of as a knowledge-processing algorithm or inferencing system. Researchers have done much prior work on reusable, domain-independent tasks and methods [Chandrasekaran, 1986; McDermott, 1988; Breuker and van de Velde, 1994]. We can also structure the input and output requirements of the problem-solving method into a method ontology, describing the structure that the knowledge is supposed to be in, for the method to be able to process that knowledge [Gennari et al., 1994]. Protégé uses a method ontology; furthermore, the method ontology is a formal one, in the sense that it not only describes accurately the conceptual expectations of the problem-solving method, but also yields an implemented, working set of class definitions in the system in which the method is implemented. We are also analyzing and modeling the various ontologies that can be associated with problem-solving methods [Gennari, Grosso, and Musen,in press].

In the component-oriented model, an underlying design goal for the various entities is to support and encourage sharability and reusability. Therefore, the domain ontology, although oriented to a given task, is not intended to be matched directly to any one of the problem-solving methods that can achieve that task. Similarly, the method ontology is neutral with respect to application domains. Furthermore, even for simple domains, any particular instantiation of the domain ontology usually will reflect the personal abstractions of its designer, and there will be variability in structure across designers. Therefore, incompatibilities are bound to occur between the domain ontology and the method ontology, no matter how well matched one is to the other. So, to bridge the resulting conceptual impedance mismatch between the two ontologies, we need to define a syntactic and semantic bridge, or glue, to bind the components together.

Given a specific task, involving a given problem-solving method and problem domain, the various knowledge-based system development methodologies use a range of solutions for how the method and domain should be brought together [Fensel, 1994]. Some methodologies treat the problem as one of modeling: they start from a model of a generic task and its knowledge roles, and refine the model until it becomes domain-relevant. An interesting variation of this scheme uses evolutionary refinement and domain-grounding of the task's knowledge roles to drive task-specific knowledge acquisition [Gil and Melz,1996]. However, these evolving-model schemes do not work well in the situation where both the domain knowledge base and problem-solving method components already exist. Some other approaches model the domain and method components independently, and fuse the two together with an explicit conversion layer. We believe that this model more closely approximates the situation in most practical reuse scenarios, and we build our system upon such a model.

The framework for our model is to conceptualize the binding of the domain-knowledge base and method components as generating mappings between concepts in the domain and analogous concepts in the method's universe of discourse (i.e., its ontology). We therefore need to define what mappings are in our framework.

2.1 Definition of Mappings

Throughout this paper, we refer to mappings between the knowledge base and problem-solving method components. In the global scheme, mappings are defined as whatever mechanisms are used to convert between structures existing in one component and analogous structures expected by another. In the research described in this paper, we focus specifically on mappings between structures in the domain-knowledge-base component and structures expected by the problem-solving-method component. However, mappings would also be relevant to conversions of structures passed among component submethods in a composite problem-solving method, or to any other intercomponent interface.

We classify mappings into three principle types: implicit, procedural, and declarative. Each is described in the following sections.

2.2 Implicit Mappings

Implicit mappings are conversions someone performs by specializing one or the other component (or both) to make the object definitions in one fit the requirements of the other, to achieve the task at hand. This is sometimes also called component adaptation. For example, modifying public-domain software to work on someone's own data structures would be a form of implicit mapping. In general, implicit mappings can take the form of either a modification of a pre-existing artifact, or a customization of the overarching design in anticipation of creation of the specialized component. In our dichotomy of knowledge base and problem-solving method, either the knowledge base can be modified to match the method's expectations, or the method can be altered to work with the native knowledge-base objects. In one case, we have to modify the domain ontology, and either perform a one-time conversion of the instances in the knowledge base, or work with the domain expert to reengineer the knowledge base. In the other case, both lexical and structural changes to the method implementation might be necessary.

The advantages of this approach are that it is conceptually direct and straightforward. It also has problems, mainly in the areas of specialization and maintenance. Because the resulting artifacts are specific to the application task at hand, multiple instances of reuse of a given component lead to parallel sets of knowledge bases or method implementations each of which must be maintained individually. Another problem is that the specialization of working components can lead to the introduction of errors in the modified versions of both the knowledge base and the method implementation. Also, the modifications are often neither principled nor well specified.

2.3 Procedural Mappings

Creating procedural mappings involves writing translation code that converts instances from the domain knowledge base to the types that are required by the problem-solving method. The code has, embedded procedurally in its logic, knowledge of what domain instances map to what method requirements. This approach is taken by procedural mediators, such as those from the database systems community [Wiederhold 1992]. We have also experimented with such mediators in Protégé [Gennari, Cheng, Altman and Musen, in press].

The advantages of this kind of mapping technique are that it is procedural, usually is direct, and therefore is efficient in execution, and neither of the original artifacts (the knowledge base or problem-solving method) is modified. The disadvantages are that the procedural mapping modules are specific to the application task, domain knowledge base, and problem-solving method, and might be difficult to reuse. The code might be familiar to a system engineer, but is likely to be too complex to mean much to the domain experts; this division between people who understand the domain and people who understand the implementation often leads to problems. The procedural mappings also have the same difficulties with perspicuity and maintainability that other bodies of code often have. Low-level programming language code is often complex, and changing lines in the code without fully comprehending the inner workings of the procedure is risky.

2.4 Declarative Mappings

Declarative mappings constitute a descriptive method for defining the conversions between entities in the components [Gennari et al. 1994]. The run-time environment consists of two modules: a set of declarative mapping relations and a mapping interpreter. Mapping relations are explicit specifications that define the various conversions that must take place for translating objects between the knowledge base and the method. (In Protégé, they are embodied in a frame-based syntax.) The mapping interpreter is a translation engine that parses the mapping declarations and performs the run-time conversion to provide input to the method. For example, Figure 1 shows a declarative mapping that specifies that all instances of the domain class domain-widget-A are to be converted to instances of method class method-widget-B for input to the problem-solving method, by scaling the X, Y, and Z slot values by 2.54 (to convert units from inches to centimeters), and by dropping all other slots (since these are of no use to the method). The mapping interpreter would parse these specifications, then search for all instances of domain-widget-As and perform the conversions to produce proper method-widget-B instances. We give a detailed description of declarative mappings in Section 4. For now, we claim certain advantages to the declarative approach. First, a large part of the work-the mapping-interpreter implementation-is a generic engine common to all application tasks, and completely reusable. Second, the declarative mapping specifications convey the intent of the mappings, and are not encumbered with the mechanisms of the mapping process, which are embedded in the mapping interpreter engine. By abstracting out the what from the how of the conversion specification, declarative mappings convey the intent of the designer more efficiently and clearly than procedural mappings would; this is important for future analysis of the mappings. The mappings are designed to meet the dual goals of being transparent to humans and of being efficient to implement; both goals were stressed in our design.

Figure 1. A diagrammatic example of a mapping that scales some slots numerically and drops others that do not correspond to method inputs.

3. MAPPINGS IN PROTÉGÉ

The core purpose of Protégé is to support component-based development, sharing, and reuse of knowledge-based systems. Protégé comprises both a framework and a tool kit, and supports practically every step in the development of such systems. First, it provides an ontology editor that aids the user in structuring the domain knowledge to be acquired, in the form of a domain ontology. It can then use this ontology to generate domain-specific knowledge acquisition tools for filling the structure with knowledge instances. It also supports the structuring of the knowledge needs of problem-solving methods into a method ontology. It also explicitly supports the use of declarative mappings to couple the knowledge base and method components into complete working systems. The concept of ontologies is central to all these structuring tasks.

3.1 The Central Role of Ontologies in Protégé

Protégé uses explicit ontologies to organize information into regular, well-structured hierarchies. There are three primary ontologies involved in any given application task: (1) the domain ontology, (2) the method ontology, and (3) the mapping ontology. The domain ontology describes the structure of knowledge about the domain.1 The domain ontology is designed with an awareness of the application task, and covers all necessary data for the problem-solving method, but it makes no promises to be in the structure exactly corresponding to that desired by the method. In other words, the sum of the explicit and implicit knowledge outlined in the domain ontology should be equivalent to, or a superset of, the input requirements of the method, but the form of that knowledge might need to be restructured and relabeled before the method can operate on it. The domain ontology has a structure and terminology designed to be meaningful to domain experts-for example, in a medical domain, it can refer to entities such as "drugs" and "symptoms." We use the domain ontology to drive the actual building of the knowledge base via an application-specific knowledge acquisition tool; thus, it explicitly describes the structure of the knowledge base instances.
The second Protégé ontology is the method ontology [Gennari, Grosso, and Musen, in press]. It specifies the structure and format of the informational needs (inputs) and solution instances (outputs) of the problem-solving method. The method ontology is domain-independent, and its elements are described in domain-neutral terms; for example, instead of "diseases," it might refer to "faults." As with the other Protégé ontologies, the method ontology is an explicit and functional representation, and its class definitions are used by the method when requesting instances from the domain knowledge side.
The third ontology is the mapping ontology [Gennari et al., 1994]. The mapping ontology describes the range of declarative mapping types that are supported by Protégé for bridging between the classes in the domain ontology and method ontology. Our motivations for organizing the instances of mappings into a formal structure are also covered in subsequent sections. Again, the ontology is concrete: The mapping ontology is used to guide the acquisition of instances of mappings, and by design, the ontology circumscribes the set of conversion types that are supported by the Protégé mapping engine.

3.2 Mapping Classes and Mapping Relations

Mapping relations in Protégé are formal, concrete entities. They are instances belonging to a prescribed set of types, and these types are meant to be common across all Protégé projects. The relationships among the mapping ontology's member classes, the mapping relations created from these classes, and the larger application context in general are diagrammed in Figure 2. At the top of the figure, we have a set of general classes of mappings, which are defined in the mapping ontology, and which cover the broad types of mapping functionalities that might be needed in any given component-based integration application. From this set, the application integrator picks the ones that are most appropriate for the conversions at hand. The integrator looks at each required input object type (i.e., class) in the problem-solving method's ontology, and matches that input class with the best semantically corresponding class in the domain ontology. The type of conversion we need for this pairing will drive the selection of the mapping type. The specific mapping is then instantiated as a mapping relation: The slots for the mapping type are filled in with the specifics of the required conversion. Thus, we have two levels of expansion-a fanning-out process. At the first level, the complete mapping description for a project comprises a collection of mapping relations, which are instances of the classes defined in the mapping ontology (each mapping class will likely give rise to many member instances in the mapping description). In turn, each mapping relation specifies the conversion from one or more classes in the domain ontology into a corresponding method class; thus, each mapping relation drives the conversion of a set of many instances in the domain knowledge base into instances for the problem-solving method.

Figure 2. The roles of the mapping ontology and mapping relations: this diagram shows the relationships among the mapping-related entities, domain objects, and method inputs. The mapping ontology defines the types of transformations that are possible under the system. The user selects from this ontology to define a set of mapping relations, which describe the specific mappings from domain classes to method classes. At run-time, the mapping interpreter, which implements the mapping types of the mapping ontology, scans for matching instances from the domain classes and creates corresponding method instances.

As an example of this process, we briefly describe our mapping of the elevator-configuration domain ontology to the propose-and-revise constraint satisfaction method [Rothenfluh et al., 1996]. The entities in the elevator configuration domain-system components, such as doors, motors, and cables, and operational rules, such as operating limits on the components and permissible component combinations-must be mapped to the state-variables, constraint expressions, and other inputs expected by the propose-and-revise method. The system designer examines each given class of method inputs that propose-and-revise needs, determines which class (or classes) in the elevator domain corresponds to these method inputs, and then decides how to convert the slots in the latter into slots in the former. The designer then determines which of the prescribed set of mapping classes will suffice for this conversion. She might choose a simple renaming mapping for translating elevator component classes to propose-and-revise state variables, since the renamed slots' contents do not need any modification. On the other hand, she might need complicated expression manipulations for translating elevator component operating limits into constraint expressions. Once she has scoped the extent of the necessary mappings, she then explicitly specifies the conversion by instantiating the mapping class with the particulars of this class-to-class mapping. This design process is repeated for all the classes required by the method, and the corresponding classes in the domain.

At run-time, the mapping interpreter reads each mapping relation in and parses it, finds the domain class that it applies to-elevator doors, for example-and gathers all the instances of that domain class. Then, for each domain instance it has found-each elevator door type, to continue our example-it applies the mapping specifications in the mapping relation to create an appropriate method instance and fill its slot values. This process loops until all the mapping relations in the mapping declaration file are processed. At that point, the propose-and-revise method is passed a set of instances defined by the classes of its own ontology, which it can easily process.

Our decision to structure the mapping functionalities into a formal ontology of mappings was based on three reasons. First, it allows us to derive the general benefits of having formal ontological descriptions, including improved organization and comprehension. The placement of the various classes of mapping types in a structured context aids in the selection and application of the appropriate mappings. Second, the general mapping task is open-ended, and constraining the set of supported mapping types makes it tractable to build a generic mapping engine that supports all the possible mapping types. Third, the explicit mapping ontology allows us to reuse the Protégé tools to design and input mappings. The Protégé Ontology Editor, which is used to enter domain and method ontologies, also helps us design and enter the mapping ontology itself. Analogously, the Protégé Knowledge Acquisition Tool Builder, which is usually used to create domain-specific tools for acquiring domain knowledge bases, is used to build a custom interface for building and examining the declarative mapping "knowledge bases" for our various reuse projects, as described in Section 5.

4. A NEW DESIGN FOR THE MAPPING ONTOLOGY

From the beginning, Protégé focused on the advantages of using explicit, declarative mappings to connect knowledge-based components [Gennari et al., 1994]. The early results were an initial design for a mapping ontology, and a series of evaluation experiments of this design. As we gained more experience with using mapping relations, we evolved the design in an incremental, and somewhat ad hoc, fashion. This work culminated in a useful but limited first-generation design for the mapping ontology.

As the number of applications of the mapping ontology grew, we began to notice some deficiencies in the design. These problems were simply due to a lack of experience with mappings, and our tendency to modify the ontology in a demand-driven, somewhat ad hoc manner. Now that we have an initial set of example mappings, we sought to resolve these problems with a principled, ground-up redesign of the ontology of mappings.

4.1 Six Desiderata for a Mapping Ontology

Our formal redesign of the ontology of mappings started with a declaration of some desiderata for our new system:

1. Expressiveness:
The set of mapping types should be fairly powerful, allowing almost arbitrary mappings if desired; in other words, while we are not obliged to optimize the mapping tool for every possible contingency, we should at least allow for many possibilities of the kinds of mappings users might want.

2. Ease of use:
The mapping ontology should have a broad range of classes, from simple to complex, to make the designing of mapping relations easy and straightforward, given the task at hand.

3. Clarity:
The created mapping relations should be easy to peruse and comprehend, and the designer's motivations should be readily apparent from the structure.

4. Parsimony:
The set of mapping relations needed for a given project should be minimized, and the number of mapping classes in our ontology should also be kept small.

5. Efficiency:
Mappings should be efficient to implement and execute.

6. Principled design/natural distinctions:
The mapping ontology should be based on careful analysis of theoretical and practical requirements of mapping tasks, and should embody any natural distinctions common to these tasks.

Following naturally from this last point, we will now present a different conceptualization of the mapping task that should shed some light on what an appropriate design should embody.

4.2 A Four-Dimensional Conceptualization of Mapping Properties

In formulating a new design for a mappings ontology, we start from the perspective of considering what the new mapping ontology should provide, instead of how it should be constructed. This leads us first to decompose the chore of mapping whole, complex objects into that of mapping a single datum at a time, which equates to a single slot value in an instance. From this, we can derive four relatively orthogonal dimensions for describing attributes of these mapping tasks: power/complexity, scope, dynamicity, and cardinality. We then enumerate the range of each of these dimensions (see Figure 3). This gives us the following properties:

· power/complexity:
This dimension deals with the spectrum of allowed expressive power and complexity of the transformation of the datum or data into a new mapped value. This ranges from the simplest null transformations-the renaming of a slot-all the way to arbitrary, functional transformations of multiple input slots into a single composite target slot. The principle here, and one of the recurring themes in our design, is that the mapping functionality should not be the limiting step in reuse, although it should reflect the cost of the complexity of the mapping.

· scope:
The second dimension describes the range of domain classes the mapping should apply to. We should be able to restrict the mapping to specific classes, or allow it to be inheritable to controllable depths.

· dynamicity:
This dimension controls when and how the mappings should be invoked: whether all the instances should be mapped prior to method invocation, or mapped on-demand at run-time.

· cardinality:
The last dimension specifies whether the mapping should be: a simple one-to-one mapping, converting single domain instances into corresponding method instances; or one-to-many, generating a set of related instances for each input instance; or even many-to-one, compositing objects from several instances across multiple classes, into a single method class instance.

Figure 3. The four-dimensional space of mapping features.

This analysis results in a four-dimensional, continuum-based conceptualization of what we are trying to accomplish. By organizing our properties as points along a spectrum, instead of as placements in some structural hierarchy, we gain a distinct perspective on the mapping process: one centered more on what mappings should accomplish than on how they should do it. From examining previous mapping experiments, the answer seems to be that any particular instance of a desired mapping can be a nearly-arbitrary point somewhere in this four-dimensional space, and that the four axes are mostly independent; for any given application, the set of mappings will range over this space. Thus, as well as being a useful conceptualization scheme, preserving the nature of these four axes would be useful for any actual reification of a design for mappings.

4.3 The Structure of the New Mapping Ontology

In designing the new mapping ontology, we started with the lessons learned from our earlier-generation mapping ontology. Then, structuring our design to cover all the axes of our multidimensional conceptualization, and guided by our six functional desiderata, we created a new ontology of mappings, shown diagrammatically in Figure 4. We will start our analysis of the new mapping ontology by studying the set of possible mappings for individual slot values. We will then see how these slot mappings are composited into mapping classes. Lastly, we will examine the ancillary features that implement properties such as slot composition and recursion.

Figure 4. The new mapping ontology.

4.3.1 The Spectrum of Slot Mappings

A mapping relation is composed of a set of component slot mappings, each of which specifies the mapping instructions for creating a single slot value in the method class instance. Since the prime component of any given mapping relation is the set of slot mapping specifications, understanding these slot mappings will give a solid picture of what a mapping does.

The set of slot mapping types covers a spectrum of mapping power and complexity (functionality that covers the other dimensions of our space of mapping features is addressed later) . Thus, the mapping creator selects the appropriate slot mapping type on a per-slot basis. Only having to use an appropriately minimal level of complexity for any given slot mapping enhances both ease-of-use and clarity of the resulting mapping relations. The set of slot mapping types are:

· renaming-slot-map:
This slot mapping type is used where a slot simply needs to be copied from the domain class instance to the method class instance, with only a possible change in slot name.

· constant-slot-map:
This slot mapping type is used to make constant value assignments-usually default values-to the method class slots that don't have corresponding slots in the domain class instance, because that slot's concept is either foreign to the domain, or was omitted during domain knowledge acquisition.

· lexical-slot-map:
This slot mapping type allows method slots to be composed from lexical concatenations of multiple domain slots. It is most often used for minor syntactic variations on domain slot values, or for composing expressions from multiple fields.

· regular-expression-slot-map:
This is an enhanced version of the lexical-slot-map; regular-expression-slot-maps allow arbitrary regular expression editing of a composed field, which allows the contents of slot values to be modified, unlike lexical-slot-maps.

· numerical-expression-slot-map:
This slot mapping type allows arithmetic processing of one or more domain instance slots into a numerical method instance slot.

· functional-slot-map:
This type is the fallback mapping for arbitrarily complex transformations of slot values. It allows user-supplied functions to compose the method slot value from any subset of the domain class slots, using any transformational logic.

Using these six slot mapping types, the developer defines a class mapping into a method class one slot at a time. Examples of some of these slot-maps are shown in Figure 5.

Figure 5. The mapping relation "1" is composed of the three slot-mapping relations 2, 3, and 4, which are instances of a renaming slot map, a constant slot map, and a lexical slot map, respectively.

4.3.2 The Spectrum of Classes of Mappings

The set of slot mapping types described in the preceding section are integrated into four mapping classes (see Figure 4). Mapping classes are instantiated into mapping relations, which are used to declare how a set of instances in a class are to be mapped from the domain's structure to the problem-solving method's required structure. The choice of which mapping class to instantiate a mapping relation from will restrict the mapping relation that can be constructed. When the application mapping designer designs a mapping, she is specifying the transformation of an entire class (or classes) from the domain ontology into a class in the method ontology; at run-time, the mapping interpreter uses this specification (the mapping relation) to drive the conversion of a set of instances from and to the respective classes.

For a given class-to-class mapping, the mapping designer must first select one of the four classes of mapping types-renaming, direct, lexical, and transformational-which cover a spectrum of mapping capabilities of increasing complexity. There is a clear parallelism between the spectrum of slot mappings and the spectrum of mapping classes. Each of the mapping classes is described below:

· renaming-mapping:
This class can only contain renaming-slot-maps; thus, it allows only the simplest of domain-class-to-method-class mappings, where all corresponding slots need simply to be renamed.

· direct-mapping:
In addition to simple slot renamings, as per the renaming-mapping, the direct-mapping class adds constant-slot-maps for constant value assignments. This is still a relatively simple type of mapping.

· lexical-mapping:
In addition to slot renamings and constant values as in the direct-mapping, this class adds the ability to compose lexical concatenations of multiple slots. This enables us to compose some interesting string expressions for input to the method.

· transform-mapping:
This class allows all possible slot mapping types; in addition to renamings, constants, and lexical concatenations, it adds regular expression, numerical, and functional transformations of agglomerations of multiple slots. As the name implies, transform-mappings include slot-mappings that modify the contents of slot values as they are copied or used in a composition.

The higher-complexity mapping classes are implemented by adding additional slot mapping types, creating a strictly subsuming hierarchy. This strict ranking means that more complex mapping classes can accomplish everything that the relatively simpler mapping classes can. The motivation for using the simpler mapping types is that the choice of mapping class then conveys information about the level of expressive power needed for the mapping, and therefore gives cues about the level of mismatch between the domain and method classes being paired. For example, the choice of a direct-mapping class for a given mapping implies that only slot renamings and constant value assignments are necessary, and indicates that this is a relatively straightforward mapping between similar classes. Carefully selecting the appropriate level of mapping complexity enhances the understandability and reusability of the mappings at a high level, without requiring the user to perform a detailed analysis of the specifics of the mapping relations.

4.3.3 Cardinality, Substructures and Composite Instances

In addition to the structured spectra of complexity in the slot and class mappings, the new mapping ontology defines a simpler, and much more intuitive way to specify the other functional dimensions outlined in Section 4.2 (and shown in Figure 3).

We will first address the issue of cardinality. During the mapping task, it is sometimes necessary to compose multiple domain instances into a single method instance. Conversely, it is also sometimes necessary to create multiple method instances from a single domain instance. The most common causes of either of these scenarios are domain or method ontologies that contain nested, recursive definitions, in the form of slots that are themselves complete instances of another domain class. This kind of structured decomposition is often used in ontologies to convey information about common substructurings in classes. For example, a slot called "position" might be common to several classes in the domain, and the slot would probably be implemented as a nested substructure instance with x, y, and z slots. These kinds of constructs-which we refer to as substructurings-are now handled by a generalized form for slot specifications. Using a path-like syntax, substructures on the domain side are now handled in-line to arbitrary depths. For example, the syntax: "<car.position.x>" would imply that the class "car" has a slot "position", which is actually an embedded instance with its own slot "x". On the method side, the mapping engine supports semi-automated creation of sub-instances to allow complex method class structures. An example would be a reference to the target method slot "vehicle.description.color", which would first create a nested instance of the correct type for the "description" slot, insert it into the parent structure, and then map the "color" slot value.

Another common case of higher-order cardinality occurs when the domain ontology is structured hierarchically, partly into classes whose instances are individual domain elements, and then into classes whose instances encapsulate descriptions about sets of other domain instances. For example, we might have an ontology of vehicles, where the class cars has an instance for each car model, and the class body-types has instances covering whole subclasses of cars (e.g., one instance describing minivans, another pickups). We would run into problems if the method expects all of the information for any vehicle to come in as slots in the individual car instances. However, the member instances are not likely to have pointers to the group instances they are conceptually inheriting from, so our substructuring scheme will not work here.

The solution is to composite the individual instances from our two classes into meta-class instances by using key slots to correlate related instances. In essence, we are creating virtual instances for the purposes of the method.

The handling of meta-class instances (i.e., composite domain instances formed from combinations of instances from multiple domain classes) is partially supported in the mapping ontology by allowing multiple supporting domain classes in a single class mapping, and via an extension to the syntax of slot mappings and conditioning expressions. However, the current implementation does not yet support the necessary enumeration over combinations of multiple independent instances; we are currently working on ideas for controlling the combinatorics of the problem.

4.3.4 Scope, Inheritance and Recursion

Our new design supports the scope dimension through explicit control of mapping inheritance: in addition to acting on its targeted domain class, a mapping relation can be optionally extended to apply itself to the specified domain's immediate child classes, or even to all descendant classes. This has proven quite useful for defining a single mapping that applies to many related domain classes, in the case where the distinctions among the domain subclasses are not of importance to the problem-solving method.

The explicit recursion control allows specification of how mappings are to be applied to substructures that are copied over in slot mappings. This allows mappings to be designed for conversions of substructures (instances embedded in slots), which would not be triggered to create undesired independent top-level instances.

4.3.5 Dynamicity

Dynamicity refers to the timing and control of the mapping process itself. When we refer to static mappings, we are talking about mappings that are preprocessed by the Mapping Interpreter before the method begins execution; the parsing and method-instance creation are all done in advance. In contrast, dynamic mappings would be those executed incrementally on demand in the course of the execution of the problem-solving method. This would be an important issue if space or computational load issues were a factor, or for numerous other reasons. Currently, this area is still under development. At this point, we are planning to provide four modes: full static mapping, dynamic mapping by domain or method class (in which the method calls the mapping interpreter with a domain class to map from, or a method class to map to), dynamic mapping by domain instance (where all mappings that apply to that instance's class are applied), and dynamic mapping by mapping relation (where the given mapping instance is applied to all matching domain instances).

4.4 Mapping Patterns

One additional mapping functionality that we experiment with, not covered by the four dimensions of mapping features, is something we term mapping patterns. Our motivation for creating mapping patterns is that, even with the new mapping ontology, there would often be much unnecessary repetitive specification across a set of mappings that belonged to some kind of common conceptual group; in other words, a set of very similar mapping relations was being applied to similar domain-to-method class conversions with only minor variations-more in specification details than in intent. However, the necessary mapping specifications were syntactically just divergent enough to require discrete mapping relations. In our design, mapping patterns are user-defined meta-mapping relations. These create new meta-mapping classes, which are instantiated via a new Protégé knowledge-acquisition tool instance. These instances, combined with the meta-mapping class description, are used to automatically create the actual mapping relations in a boiler-plate fashion. The resulting final instances are members of the standard mapping ontology classes, and are therefore fully understood and executable by the mapping interpreter.

The output of this multi-step process is a set of mapping relations that are identical to those we had originally manually crafted, so the mapping interpreter will function just as before, and the problem-solving method will be given the same inputs. However, we are now able to describe the mappings in a much sparer fashion than before, in a form that distills out the salient features that are unique across the members of the set.

Mapping patterns can be considered to be a form of knowledge-based macro facility for mappings. As such, they reprieve the user from the repetitive mechanics of generating multiple, similarly configured mapping relations. The automated expansion also enhances reliability of the mapping relations, by eliminating the replication errors that are likely with repetitive data entry. Just as important, the new pattern-based instances further reduce the verbosity of the mappings, which enhances perspicuity, one of our desiderata. Also, by abstracting the mappings into a small set of pattern descriptions, and a complementary set of invocations that distill out the unique elements of the mappings, mapping patterns allow us to convey the intent of the mappings, free from much of the mechanics.

5. RESULTS

Our new ontology of mappings was used to re-engineer three systems that we had developed in prior Protégé mapping research. The fact that these component-based applications had also originally been composed and mapped under the old mapping ontology gives us an effective basis for comparing and evaluating our new design.

The first of the three projects is a re-creation of the Protégé implementation of the Sisyphus-II project [Rothenfluh et al., 1996], and involves the composition of a knowledge base on elevator configurations with a generic implementation of the propose-and-revise problem-solving method. We map instances of elevator components, specifications, and performance bounds, and component upgrade rules to the propose-and-revise inputs of state variables, constraint specifications, and violation fixes. The original set of mapping relations was large and complex, and therefore rather opaque to casual examination. Under the new mapping ontology, we are able to significantly reduce the size of the total mapping relation set. A promising result is that the new mapping relations are not only more terse, but surprisingly, also more understandable; in other circumstances, these properties-parsimony and perspicuity-would normally be traded off against each other. An example of a mapping relation from this application is shown in Figure 6. One of the significant improvements over the original mappings stems from the decoupling of the complexity level of individual slot mappings from the chosen level of the class mapping, via the new per-slot specification of mapping functionality. This finding is echoed in the other experiments we describe here. The new generalized syntax for controlling the cardinality by accessing target (method) substructure slots directly is also very useful. Lastly, for this project, the use of mapping patterns is especially helpful, allowing us to reduce the size of the mapping relation set ultimately to one-quarter the original size.

Figure 6. Sample mapping relation from the elevator configuration/propose-and-revise system.

The second project in which the new mapping ontology was used is the reuse of propose-and-revise in a completely different application domain: ribosomal conformation prediction [Gennari, Cheng, Altman and Musen, (in press)]. The task goal is to posit plausible three-dimensional conformations for the set of ribosomal subunits. Again, we map domain class instances like protein clusters and RNA helices, along with geometric positioning constraints and biological evidence, to propose-and-revise's state variables and constraints. Some examples of mappings from this application are shown in Figure 7. In addition to the new spectrum-based range of class and slot mappings, notice the in-line specification of domain class substructure slot accesses in the domain-slot specification for instance 9. Another interesting part of the mapping task is the mapping of an array of possible ribosomal unit positions in the domain ontology into an incremental constraint fix expression in the method ontology. We did this by synthesizing an index variable in a mapping relation to iterate over the array, assigning the next available position as a fix.

This project also benefits from the explicit control of recursion, since substructure instances must be left intact. The new mapping ontology allows us to achieve a two-fold reduction in the size of the mapping relation set here, while again increasing the understandability of the intention of the mappings.

Figure 7. Example of new mappings for Ribosome/propose-and-revise.

The third project is applying a different problem-solving method-the Protean n-way yoking method [Altman et al., 1994]-to the ribosomal structure domain. The result are very similar to the second project. Since Protean is a constraint satisfaction problem-solver, the ontology for this method is very similar to that for the propose-and-revise method. Therefore, we were able to reuse much of the experience from the ribosome/propose-and-revise mapping to help design the mappings for this application, since the mappings are very similar in intent and execution. The original Ribosome/propose-and-revise experiment is already a reuse project, and we are conceptually reusing the artifacts of reuse-the mapping relations-here, so this experiment is as much a lesson in reusing mappings (meta-reuse, in a sense) as in creating mappings,.

6. DISCUSSION

In this research, we have investigated the nature of mappings between knowledge-based components, and designed a new ontology of mapping classes to facilitate creating mappings between ontologies for reuse. Our mapping ontology was designed using a spectrum-based approach, which is reflected in both the range of mapping classes and the slot-mapping types from which the classes are composed. The mappings permit a wide range of transformations, from simple slot renamings and constant value assignments, to arbitrary functional transformations. The new ontology also supports explicit control of cardinality, run-time context, recursion, inheritence, and conditional interpretation. It also supports mapping patterns, a form of knowledge-based macro operators that further enhances both the parsimony and the perspicuity of the mapping declarations. We will now critique the design from the standpoint of our initial design goals.

6.1 The New Mapping Ontology and the Six Desiderata

In Section 4.1, we enumerated six guiding desiderata for our new design. These desiderata serve as good criteria by which to evaluate our work. We will now examine whether we have satisfied these six criteria.

1. Expressivity:
The full range of slot mapping types allows specification of everything from simple slot renamings all the way up to arbitrary functional manipulations of multiple source slot values. Additionally, the many-to-one and one-to-many substructure specifications allow full control of recursion. Also, inheritance is controllable.

2. Ease of Use:
The spectrum of mapping classes and slot mapping types allows intuitive selection of the proper level of mapping complexity both on a per-class and per-slot basis.

3. Clarity:
At the class mapping level, the mapping relations convey design intent through the mapping type selected for each class. Furthermore, within a given class mapping relation, the individual slot map types that were used further illuminate the purpose.

4. Parsimony of mappings:
The new mapping ontology is relatively simple: there are only four classes of mappings, and six basic slot mapping types. Nevertheless, the power of these mapping classes allows simple and direct specifications of the data conversion being defined. Also, allowing mixtures of slot mapping types in a single class map minimizes the required complexity for the overall mapping relation. Inheritance allows single mappings to apply to entire families of domain classes. Lastly, mapping patterns can significantly further reduce the size of the mapping descriptions.

5. Efficiency:
The new orthogonal, spectrum-based design allows simpler mapping interpreter design, implementation, and execution, by decoupling the various functionalities into independent attributes.

6. Principled design:
The new mapping classes were conceived in a componential, task-independent manner, instead of an ad hoc, case-driven fashion. Also, a multidimensional, spectrum approach was applied to formulating the relevant mapping attributes.

6.2 Future Work

We have only begun to explore the universe of possible mapping relations, and have many promising, parallel directions to explore. Locally, we will further explore the dimensions of recursion, inheritance, and dynamicity in our current design. More globally, we need to flesh out other aspects of the ontology as it stands. For example, we mentioned in Section 4.3.3 that we had only partially designed the support for composite instances; we need to develop this further.

Our larger research goal is to ascertain the scope of our proposed mapping design, to circumscribe the class of tasks for which it is sufficient, and to determine what its failings are for other classes of mapping tasks. The most critical task ahead of us is to find a way to evaluate the ideas in a disciplined manner, and to further the theoretical bases for this research.

At a more abstract level, we will be examining the relation of dynamic mappings to the more procedurally defined concepts of database and knowledge base mediation. Also, we would like to explore the possibility of semi-automated mappings: some means of using syntactic and semantic analysis to partially automate the process of creating mappings. We would like to see how far it is possible to automate the process of mapping knowledge bases to method inputs.

Lastly, we are very interested in the concept of "reuse of reuse": learning and adapting from other, related reuse cases. What are the dimensions of this meta-reuse (i.e., can prior reuse-through-mapping tasks inform subsequent tasks)? Mapping patterns may be important here.

7. CONCLUSIONS

We conclude that component-based reuse is a viable concept. Here, and in much previously published work on Protégé (e.g., [Gennari, Altman, and Musen, 1995]), we have demonstrated many examples of reuse, covering the gamut of domains from elevator configuration to ribosomal conformation prediction to medical diagnosis.

The next conclusion-the main one from this paper-is that declarative mappings provide a promising way to achieve component-based reuse. By analyzing our mapping needs in a more global, task-independent manner, we were able to create a set of mapping types and mapping attributes that generated mapping descriptions which met all of our design desiderata, achieving both greater expressive power and simultaneously greater simplicity and understandability.

The last conclusion concerns meta-reuse, in this case, conceptually reusing mapping relations from previous projects. When reusing components, whether they are pre-existing domain knowledge bases or legacy problem-solving methods, a good start for the task of defining the mappings is to examine prior mappings involving either of the components. In fact, one might start with the original mapping relations as a basis for evolving a new mapping project. Thus, the clarity of intent and implementation of the mapping relations is important. Our claim is that our new ontology of mappings enhances the prospects for this level of meta-reuse.

8. REFERENCES

Altman, R.B., Weiser, B., and Noller, H.F. (1994). Constraint satisfaction techniques for modeling large complexes: Application to central domain of the 16s ribosomal subunit. Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 10-18, Stanford, CA.

Breuker, J.A. and van de Velde, W. (Eds.) (1994). The CommonKADS Library for Expertise Modelling. Amsterdam: IOS Press.

Chandrasekaran, B. (1986). Generic tasks in knowledge-based reasoning: high-level building blocks for expert system design. IEEE Expert, 1(3).

Eriksson, H., Shahar, Y., Tu, S.W., Puerta, A.R., and Musen, M.A. (1995). Task modeling with reusable problem-solving methods. Artificial Intelligence, 79(2), 293-326.

Fensel, D. (1994). A comparison of languages which operationalize and formalize KADS models of expertise. The Knowledge Engineering Review, 9(2), 105-146.

Gennari, J.H., Altman, R.B., and Musen, M.A. (1995). Reuse with Protégé-II: From elevators to ribosomes. Proceedings of the ACM-SigSoft 1995 Symposium on Software Reusability, pp. 72-80, Seattle, WA.

Gennari, J.H., Cheng, H., Altman, R.B., and Musen, M.A. (in press). Reuse, CORBA, and knowledge-based systems. International Journal of Human-Computer Studies.

Gennari, J.H., Grosso, W., and Musen, M.A. (in press). A method-description language: An initial ontology with examples. 11th Workshop on Knowledge Acquisition, Modeling, and Management 1998.

Gennari, J.H., Tu, S.W., Rothenfluh, T.E., and Musen, M.A. (1994). Mapping Domains to Methods in Support of Reuse. International Journal of Human-Computer Studies, 41, 399-424.

Gil, Y., and Melz, E. (1996). Explicit representations of problem-solving strategies to support knowledge acquisition. Proceedings of the Thirteenth National Conference on Artificial Intelligence, pp. 469-476, Menlo Park, CA: AAAI Press/MIT Press.

Grosso, W., Gennari, J.H., Fergerson, R., and Musen, M.A. (in press). When knowledge models collide. 11th Workshop on Knowledge Acquisition, Modeling, and Management 1998.

Mcdermott, J. (1988). Preliminary steps toward a taxonomy of problem-solving methods. In S. Marcus (Ed.), Automating Knowledge Acquisition for Expert Systems, Boston, MA: Kluwer Academic.

Musen, M.A, Gennari, J.H., Eriksson, H., Tu, S.W., and Puerta, A.R. (1995). Protégé-II: Computer support for development of intelligent systems from libraries of components. Proceedings of Medinfo '95, pp. 766-770, Vancouver, BC.

Rothenfluh, T.E., Gennari, J.H., Eriksson, H., Puerta, A.R., Tu, S.W., and Musen, M.A. (1996). Reusable ontologies, knowledge-acquisition tools, and performance systems: PROTÉGÉ-II solutions to Sisyphus-2. International Journal of Human-Computer Studies, 44(3-4), 303-332.

Wiederhold, G. (1992). Mediators in the architecture of future information systems. Computer, 25(3), 38-49.

1
In previous papers on Protégé [Gennari et al., 1994], we made a distinction between a domain ontology and an associated application ontology. In the descriptions in those papers, the domain ontology was method-independent, and described abstract, general knowledge. It was thus free of restraints regarding how the task was to be accomplished. However, in our experience, the original Protégé domain ontology turned out to be mostly a mental structuring construct; we never instantiated one as a concrete entity in any project, and in every case, we skipped straight to building application ontologies. Also, what we were calling application ontologies corresponded better to what other researchers in the knowledge acquisition community call the domain ontology. Therefore, in this paper, we exercise a terminology shift, and use the term domain ontology to denote what we previously called an application ontology: a method-relevant version of the pure domain ontology.