Seamless Development of Structured Knowledge Bases

Päivikki Parpola
Helsinki University of Technology
Department of Computer Science and Engineering
Hauenkalliontie 2 B 54, FIN-02170 Espoo, Finland
E-mail: pparpola@cc.hut.fi

Abstract:
Models, corresponding to different stages of knowledge-base developing, are integrated using seamless transformations. Knowledge is initially elicited through dependency graphs (DG), which can be seen as a form of conceptual graphs. This enables combination of knowledge from different sources at an early stage of development. DGs, together with associated descriptions, can be transformed (seamlessly) into an initial inference structure with associated verbal descriptions. Its components refer to components of the domain model, constructed simultaneously with the DGs. Other seamless transformations are based on sharing the inference structure between analysis, design and implementation descriptions. The structured models can be implemented using object-oriented programming.

1. INTRODUCTION

1.1 The gap between phases of KA

Knowledge acquisition(KA), i.e. development and maintenance of knowledge bases (KB), can be divided into several phases, performed sequentially and iteratively. The most commonly recognized phases are requirements definition, analysis, design, and implementation. The disintegration, or gap, between phases has been recognized during early stages of KA [Marcus, 1988a; Motta, Rajan and Eisenstadt, 1988].

The problem has been overcome in narrow-focused automated tools, e.g. S-SALT [Leo, Sleeman and Tsinakos, 1994] and PROTÈGE II [Eriksson et al., 1995]. In problems that cannot use scope-restricting heuristics, results achieved in an earlier phase of KA often cannot be fully utilized, but part of the work has to be duplicated. In general cases, the gap problem has been attacked for example through maintenance of the analysis model structure (recommended in structured KA methodologies like CommonKADS [Schreiber et al., 1994; de Hoog et al., 1994]). However, this maintenance is manual.

1.2 Integration in SE: OO models and seamless transformations

Object-oriented software engineering (OOSE) [Jacobson et al., 1992] views software engineering (SE) as an industrial process, and stresses the advantages of using objects in it. OOSE acknowledges that, in almost all systems, requirements change unpredictably, and the systems have to be designed for incremental development over their life-cycle.

Analysis and Construction process in OOSE
Figure 1.1. The analysis and construction parts of the OOSE process.[Jacobson et al., 1992].

OOSE, partially illustrated in figure 1.1, consists of the iterative processes of analysis, construction, and testing. Requirements are used as input. Object-oriented (OO) models are created or modified during each phase. The requirements model consists of three parts, a domain model, a use-case model and an interface description. The analysis model is based primarily on the use case model, not the domain model, as there is experience indicating that the logical structure is more stable than the domain structure. By focusing on the more important aspects at an early stage, a base is laid for a maintainable system structure.

The aim is to keep the logical structure of the analysis model in the final system. An object identified during analysis must be found again in the code, so that the system is easy to understand and durable under easy modifications. Here, built-in traceability of object-orientedness (OO) is an important characteristic. Different parts of the system can simultaneously be at different stages of development. Some parts may be reused, possibly with slight modifications.

The transitions between models are seamless - we are ideally able to tell in a foreseeable way how to get from objects in one model to objects in another model. This is absolutely crucial for an industrial development process, as the result must be repeatable. Rules for transformations are defined separately between each two models. To be able to maintain the system it is also necessfary to have traceability between the models. This will come as a side-effect of the seamless nature of model transformations.

1.3 A Proposal for bridging phases of KA

A major factor causing the gap is apparently the difference between the observation- oriented nature of gathering knowledge, and the operationality-oriented nature of producing a KB. Different approaches often lead to different representations, so that the analysis models normally cannot be directly utilized in developing the executable formalisms. For bridging the gap, it is proposed that representations of different phases be integrated. In addition, representation of knowledge is structured. To strengthen the effects of uniform representation and suitable structuring, seamless transformations between models corresponding to different phases are defined.

1.4 Contents of this paper

The paper is structured as follows: Section 2 explains the three proposals for producing more integration in the development of KBs: unified representation, suitable representation structure, and seamless transformations. Section 3 describes seamless development of a KB, presenting a technique that is described step by step. Section 4 gives an example of using seamless development. Section 5 contains discussion and conclusions.

2. AVOIDING DISINTEGRATION

2.1 Uniform representation

2.1.1 Formalisms considered

As mentioned in the introduction, one apparent cause of disintegration is the use of different representation formalisms during different phases of development. Two representations are proposed for use in integration:

Conceptual graphs (CG) [Sowa, 1984] is a logic-based formalism, in which graphs consist of concepts and conceptual relations.
The OO paradigm distributes representation over a number of active entities, called objects, defined to represent abstract or concrete concepts. The essential components of OO have been defined slightly differently by different frameworks, e.g. OOD [Booch, 1991] and OMT [Rumbaugh et al., 1991], and different programming languages, e.g. Common Lisp Object System (CLOS) [Steele, 1990] and Smalltalk [Goldberg, 1984]. The features focused on can equally well be abstraction, encapsulation, modularity and hierarchy as classes, methods, multiple dynamic inheritance, and an extensive meta-object protocol.

aggregation	An attribute that is itself an object
association	A relation between object classes. For each class involved, it can be defined, whether the association applies to one single instance or an arbitrary number of instances of the class.
attribute	Description of a feature of the corresponding concept.
(object) class	A counterpart for a concrete or abstract real-world concept, or a generalization of other concepts. An object class is a template for all its instances.
inheritance, subclass, superclass	Inheritance is a mechanism allowing object classes to form a hierarchical structure, called an inheritance hierarchy or a class hierarchy. Parent nodes in this hierarchy are called superclasses, whereas child nodes are called subclasses. In this structure, subclasses by default inherit properties (attributes, methods) from their superclasses.
(object) instance	An individual realization of an object class. Attributes of the class are assigned values in an instance.
meta-class	A class, the instances of which are themselves classes.
method	Definition of a class-specific behaviour as a reaction to calling a method of an object instance with certain attributes.
object	Normally denotes an object instance. However, the term can be used in a more abstract sense, leaving unspecified whether an object class or an object instance is used. The term can also refer to the entire OO paradigm.

2.1.2. Conceptual graphs compared to OO networks

Features and capabilities of CGs and OO will now be compared in order to examine possibilities of implementing the former with the latter:

Origin and capabilities. CGs have a theoretical origin, providing direct translation to first-order logic. An entire theory of logic properties and manipulation of CGs exists, providing possibility for proofs. The OO approach is more practical. It provides a straightforward translation from development models to implementation of computer systems - through OO analysis, design and programming (OOA, OOD and OOP). Different (logical) properties may be defined through (meta) class methods
Components. CGs, like most OO networks, consist of concepts and (conceptual) relations, defined between one or more concepts. Concepts and conceptual relations of CGs could be implemented as object classes.
Instances. CGs define instances of each concrete concept c as entities recognized by its image, i.e. real-world correspondent. Abstract concepts have no image, and thus no instances. All object classes may have instances, but only instances of concrete concept classes have correspondents in the real world. In implementing CGs, method 'instances' could be defined for each object class representing a concept.
Hierarchies. In CGs, types of concept and relation labels form two type hierarchies (that are also lattices), defining subtypes and supertypes. Also CGs themselves form hierarchies, defined through generalisation and specialisation (modification and combination) operations. The OO paradigm provides a possibility for defining inheritance hierarchies for object classes. Subtypes and supertypes are defined in these hierarchies. Conformity between types and referents in CGs is similar to the relation between object classes and instances. Type hierarchies can be implemented by defining separate object classes (instances of a meta-class 'concept-type' or 'relation-type'), forming a hierarchy. Each class or relation would have an attribute 'type' referring to the appropriate type class. For determining generalisation and specialisation of CGs, firstly special objects for representing CGs, and secondly methods for determining (performing) modification and combination operations should be defined.

It seems possible to implement CGs using OO networks. An alternative way of implementing CGs might be to enlarge the CG paradigm with attributes and methods, associated with concepts and conceptual relations.

2.2. Structured representation

Uniform formalism will now be combined with structured representation of knowledge, consisting of three main types of networks:

domain model, consisting of concepts (with attributes) and (structural) relations,
dependency graph, consisting of attributes (of concepts in the domain model) and (inferential) dependencies, and
inference structure. consisting of roles (collections of domain concepts) and inferences (inference rules based on dependencies in the dependency graph). Inferences have associated analysis, design and inference descriptions.

The three networks are described in more detail in sections 2.2.1, 2.2.2 and 2.2.3. The idea of an inference structure referring to the domain model comes from KADS [Wielinga and Breuker, 1986; Hesketh and Barrett, 1989] and CommonKADS [Schreiber et al., 1994; de Hoog et al. , 1994] methodologies. Here, however, attributes instead of concepts are referred to. The inference structure is shared among different phases of development. Dependency graphs have been added.

The three types of networks that are connected to each other are illustrated in figure 2.1. Connections between CGs, via use of the same concepts, are illustrated by exporting these concepts, and drawing lines to identical concepts in different graphs. The notation for identical concepts in different CGs is adopted from problem maps, presented by Lukose [Lukose, 1996].

Domains, dependencies and inferences around attributes and dependencies.
Figure 2.1. CGs used in seamless structured KA. Different CGs are separated by thickened lines. Concepts, appearing outside CGs, indicate interdependencies between CGs through use of the same concepts.

2.2.1. Domain model

A domain model (DM) is a description of relevant parts of the domain. Relevant means here "involved in the task at hand". A DM consists of concepts, with attributes describing (the state of) different features, and relations, describing more or less permanent relationships between concepts or attributes of concepts. Relations between concepts include inheritance relations.

2.2.2. Dependency graphs

A dependency graph (DG) consists of concepts and binary directed dependency relations (dependencies) between concepts. Concept nodes have unique names, used as labels. If all dependencies of a DG are of the same type, the corresponding arcs (arrows) need not be labelled, If several relation types are used, the type of each dependency in a DG has to be indicated. Untyped DGs have been used in a system called Matias [Kontio, 1991]. The automated tools MOLE [Eshelman, 1988] and SALT [Marcus, 1988b] use untyped and typed dependencies between events.

If multiple sources of knowledge are considered, a DG can be created separately for each source. These initial DGs can be combined, as will be described below. In order to be able to combine knowledge acquired from different sources, it is first necessary to homogenize the terminology used. Also a domain model (see section 2.2.1), relating different concepts to each other, is necessary.

DGs can be considered as special forms of conceptual graphs CGs (see section 2.1). Dependencies are taken as conceptual relations with one arc, in which the label may be omitted. In DGs, concept labels may also be considered as types, i.e. often type(c)=c. If a superclass of c has the same essential properties from the point of view of the application, type (c) can also be this superclass. DGs can be combined using rules, described below.

Theorem A group of DGs can be developed using the following rules:

Copy. An exact copy may be made of any DG u.
Remove.
1. Any unnecessary result concept c (a concept which no other concept depends on, and the value of which is not requested) can be removed (figure 2.2).
  
  Figure 2.2. An unnecessary result concept can be removed.
2. If c is an intermediate concept that depends on concept d, and that only one concept a depends on, then c can be removed and a set to depend directly on concept d. This rule is applicable also when c depends on several concepts (figure 2.3).
  
  Figure 2.3 Join of two DGs.
Join. If a concept c in DG u is compatible with concept d in DG v, i.e. c=d then d can be deleted and all dependencies that had been linked to d, can be linked to c. (both to concepts that depend on c, and those that c depends on). See figure 2.4.

Figure 2.4 Join of two DGs.
Simplify. If two dependencies, r and s in DG u, are duplicates (of the same type, and defined between the same concepts a and b, in the same direction), then r may be removed from u (figure 2.5), Any information associated with r must be combined with the information associated with s.

Figure 2.5. One of two duplicate dependencies may be removed, however, preserving text descriptions of both.

Proof

Copy. Dependencies between concepts stay the same, independent of the DGs.
Remove
1. The validity of the DG is maintained, as no concept depends on the one removed, and the value of the removed concept is not needed.
2. Dependency (<-) is a transitive relation, i.e. if a <- c and c <- d then a <- d. If c depends on several concepts, the rule can be applied to each one separately.
Join. Let c u, d v, c=d. The common generalization [Sowa84, definition 3.5.5] w of DGs u and v would be a DG with the single concept d=d_w. The join of u and v is truth-preserving if ₁:w -> u and ₂:w -> v are compatible [Sowa, 1984, definition 3.5.6]:
- Type(₁ d_w) type(₂d_w)=type(c) type(d)=type(c) > ,
- both referents of c and d conform to c, i.e. type(₁d_w)type(₂d_w)::referent(₁) and type(₁d_w)type(₂d_w)::referent(₂) (type(c)::referent(c) and type(c)::referent(d)), and
- there are no individual markers as referents.
Simplify. As a dependency relation only states the existence of some kind of inferential dependency between two concepts, r and s represent redundant information. The associated text information, however, is not necessarily redundant, so all of it has to be preserved with s.

Concepts can also be generalized, [Sowa, 1984, p. 100]. The validity of combining graphs that have not been compatible before generalizations of concepts depends on the domain, i.e. whether essential properties of the original concept hold also in the generalized concept.

It has to be kept in mind that dependencies presented in a DG need to be true only in a certain situation, and that these situations are not necessarily the same for all dependencies. In other words, the combined DG is only a hypothesis. Its value is in bringing together different pieces of knowledge, and showing a way in which they might be combined. Descriptions and possibly overlapping contexts provide more information.

DGs with undefined contexts present possibilities, not definite logical implications. Thus, combination of DGs cannot be described with normal terms of logic, like unification.

2.2.3. Inference model

The inference model is actually an integration of three models, the analysis, the design and the implementation models. These three models share the inference structure (IS), consisting of roles and inferences. Roles refer to a number of attributes of concepts in the domain model. To each inference are attached three descriptions that can contain different numbers of blocks (figure 2.6.a). Submodels (e.g. the analysis model) consist of the shared IS, together with corresponding (e.g. analysis) descriptions of inferences. Thus, to the IS are attached three sets of descriptions, each belonging to one of the models forming the inference model (figure 2.6.b).


Figure 2.6 a) Different descriptions of an inference can contain different numbers of blocks.	b) The inference model includes three models, each consisting of the (shared) inference structure (IS) and an attached set of descriptions.

In KB construction, logical details of knowledge can be presented in the analysis models. Here it must be remembered that in the iterative process of development each phase is visited several times, so 'analysis model' does not always mean the initial model. Need for changes may be acknowledged in other phases of development, but through traceability between models the corresponding parts in the analysis model can be detected. The changes can then be made first to the analysis model and only after that be propagated to other models. These modifications include modifications of the IS.

Keeping in mind that the process is iterative, it can be said:

In constructing a KB, all details can be first presented at the analysis phase, and the effort required in the design and implementation phases is to formalize them.

2.3. Seamless transformations (in structured KA)

In addition to using the same formalism in all models, and integrating the representation structure, one more way is used to integrate development of KBs. Seamless transformations are in KA templates for semi- automatic transformations between different models. Most of these transformations can be defined in both directions. Use of seamless transformations provides traceability between different models and careful utilization of work already done.

3. SEAMLESS DEVELOPMENT OF KBs

Seamless development of KBs consists of a sequence of seamless transformations. Figure 3.1 gives an overview of a technique for seamless development, called Seamless Structured Knowledge Acquisition (SeSKA). Three kinds of transformations are used: DGs are combined, the combined DG is transformed to the analysis model, and the analysis model is transformed via the design model to the final implementation model. The four rightmost transformations in figure 3.1 (DG <-> IS, DG <-> analysis, analysis <-> design, and design <-> implementation) are truly seamless, i.e. also two-directional. Transitions from an IS and analysis descriptions of inferences to DG with descriptions are ambiguous.

DG(1...n), DG+desc, DM+attr, analysis, IS, analysis, design and impl. interconnected.
Figure 3.1. An overview of the SeSKA technique.

3.1. Combining knowledge from multiple sources

DG(1...n) pointing to DG+desc.
Figure 3.2. Combining acquired DGs in SeSKA.

When pieces of knowledge elicited from different sources are presented in the form of DGs, they can be combined (figure 3.2) according to the rules presented in section 2.2.2. The rules 'copy', 'remove', 'join' and 'simplify' can be used to develop and combine a group of DGs.

Combination of conceptual graphs is useful in building a hypothesis. More information is provided through descriptions and contexts.

3.2. Creating a DM based on a DG

DG+desc pointing to DM+attr.
Figure 3.3. Forming the DM based on the DG in SeSKA.

The term 'class', in context of a DG, refers to an attribute class of the DM, i.e. either an actual aggregate attribute class, or a simple attribute of a concept class. The DM in SeSKA is constructed in parallel with DG(s) (figure 3.3). As domain concept attributes of the DM are concepts in the DG, relevant concepts and attributes are automatically selected to the DM. Descriptive comments may be attached to all dependency relations.

3.3. A DG producing an initial description of a KB

DG+desc pointing to analysis and IS.
Figure 3.4. Forming the inference structure and the analysis descriptions of inferences based on the DG in SeSKA.

A role refers to a collection of attributes of different concepts. These attributes are called member attributes of the role. The initial inference structure and the associated analysis descriptions can initially be constructed based on inferential dependencies with their associated descriptions (figure 3.4). One way to do this [Parpola, 1995], is to utilize dependencies defined, so that a role is formed from attributes that one attribute depends on, as illustrated in figure 3.5. Also other heuristics can be used, as well as manual construction and editing. If descriptions of dependencies have been given, the combination of descriptions of relevant dependencies form the first approximation of the analysis-level description of an inference.

Dependencies and roles in a bank loan example.
Figure 3.5. Forming roles based on dependencies, in an application for granting a bank loan. c=customer, l=loan

3.4. Creating a DG based on the analysis model

Analysis and IS point to DG+desc.
Figure 3.6.Forming a DG based on the inference structure and the analysis descriptions

Forming a DG based on the analysis model (figure 3.6) is not unambiguous, as several different DGs can produce the same analysis model. One way to form a DG is to take the roles connected by an inference, and set all concepts referenced by the conclusion role to depend on all concepts referenced by the premise role. The analysis description of the inference can be attached to all dependencies formed. Descriptions of dependencies can then be edited to include only information concerning a particular dependency between two concepts. The process can be repeated for each inference.

3.5. Formalizing descriptions of inferences

Analysis, pointing to implementation via design form IS.
Figure 3.7. Formalizing descriptions of inferences in SeSKA.

Analysis descriptions are formalized to implementation descriptions, e.g. rules, via design descriptions (figure 3.7). The process of refinement is iterative and modular. Instances of member attributes of roles are used as variables in the inference process. Inferences defined between two roles use member attributes of the premise role as input, and produce values of member attributes of the conclusion role as output. The internal structure of an inference is either a set of rules, or a function written in some programming language. Different inferences can be defined in different ways.

When an inference uses rules, the format of pseudo-representation available is a rule table, containing a premise part (on the left) and a conclusion part (on the right). Different rules are presented one below the other. In an individual rule, different premises can be presented in a kind of structured block diagram (figure 3.8). Conclusion values, to be assigned to conclusion variables, are presented in columns for different conclusion variables. Rules presented in a rule table are directly converted into rules presented in the implementation formalism used.

Premise and concl. role member attribute table.
Figure 3.8. Rule table of an Inference between Roles with member attributes (A B C D) and (E F G H).

If functions or procedures are used to define inferences, the input parameters must be member attributes of a premise role. Attributes modified (either through side effects during execution, or as result values) must be member attributes of the conclusion role. For reuse purposes, formal parameters are defined through ordinal numbers of attributes referenced.

3.6. Changing earlier models based on implementation errors

Implementation pointing to analysis via design, form the IS.
Figure 3.9. Tracing back to earlier models.

Often a need for change is acknowledged in implementation (or design). According to principles of seamless development, the changes, however, are not done directly in these models. Instead, the corresponding parts are traced in the analysis model, possibly via the design model. using the shared inference structure (figure 3.9). Corrected forms of action are first verbalized and then propagated to the more formal models. It may often be the practice of course to describe changes that have already been done, but the important thing is to keep the logical description up to date. However, representing the problem on the abstract (analysis) level may sometimes help in finding the solution.

4. AN EXAMPLE OF USING SESKA

4.1. About the domain

Multiple sclerosis is a disease of the central nervous system, causing malfunctions mainly in the motory and sensory nerves or sight nerves. The symptoms are caused by auto-immune reactions against the protective myelin sheath of nerves in the spinal cord or in the white matter of the brain. The mechanism of damage is quite well-known, but there is disagreement on the causes. Mainstream research on MS concentrates on developing drugs. However, the best achievement until now only decreases the probability of new symptoms (for those who can tolerate it).

There has also been some research on the effects of nutrition on MS, mainly on fatty acid levels, absorption and effects. Antioxidants, for example, have also been investigated. Research on nutritients is regrettably scattered: only a small number of researchers try to build a conclusive hypothesis. The technique described in this paper will now be used to build a hypothesis of how the results of different research workers might be combined.

Figure 4.1 presents a hierarchy of fatty acids to provide a chemical background.

FA, sat.FA, UFA, PUFA, omega-3 & -6, linolenic & linoleic acid.
Figure 4.1 Inheritance hierarchy of fatty acids.

4.2. Forming initial dependencies

Some research results concerning certain groups of fatty acids, vitamins and minerals have been elicited from article abstracts, and been presented as DGs, by the author (figures 4.2 - 4.9). Captions show the research results elicited, used as descriptions associated to dependencies. A few generalisations, e.g. replacing linoleic acid by Omega 6, have been made in the DGs. This particular replacement is based on some properties that are common to the Omega 6 group of fatty acids [FAO, WHO, 1993]. Captions show the original form of results elicited. All of these figures describe the situation in an MS patient, possibly compared with a healthy person.

1)

Body.metabolicMalfunction dep.on Body.omega-3 & -6 & sat.FA.

Figure 4.2. a) Omega

3 (levels) < normal [Cunnane et al., 1989, Neu, 1985]; b) Omega

6 < normal [Cunnane et al., 1989;Fisher et al., 1987; Navarro and Segura, 1988; Navarro and Segura, 1989]; c) saturated FA > normal[Navarro and Segura; 1988, Navarro and Segura, 1989]. d) A predisposing factor causing MS seems to be related to a disturbance of the lipid and fatty acid metabolism [Neu, 1985], A common aspect appears to be a lipid imbalance involving the essential fatty acids (EFA), linoleic and linolenic, and trace fatty acids which result from faulty lipid metabolism [Marshall, 1991].

2)

Dependencies between Body, Diet, air & antioxidants.

Figure 4.3. a) main dietary source of linoleic acid is vegetable seed oils. b) main dietary sources of linolenic acid and its products are leaves and fish oils.; c) EFAs are easily peroxidized in air. d) vitamin E prevents peroxidation. [Sinclair, 1984]

3)

Dependencies between omega-3 & -6 & fishOils in Body & Diet.

Figure 4.4.a) Omega

3 is absorbed from dietary fish oils [Nightingale et al., 1990] b) linoleic acid showed significant correlations with diet.[Fitzgerald et al., 1987]

4)

Condition.benefit depends on Diet.omega3 and Diet.omega6.

Figure 4.5. Supplementation of Omega

3 and

6 fatty acids caused reduction of the severity and frequency of relapses and a mild overall benefit [Bates, 1990].

5)

Condition.degradation dep.on Diet.saturatedFA dep.on Diet.animalFats.

Figure 4.6. a)Use of saturated FA increased deterioration and lethality; b) Animal fats are saturated. [Swank, 1991]

6)

Condition.degradation dep.via Body.peroxidation on oxygenFreeRadicals.

Figure 4.7. The raising of pentane (a peroxidation product) always coinsided with relapses (exacerbations). It has been concluded that oxygen-free radical activity is enhanced during exacerbations of multiple sclerosis [Toshniwal and Zarling, 1992].

7)

Body.peroxidation & antiox dep.on Diet.antiox dep.on vitE,vitC,Se.

Figure 4.8. Antioxidants(selenium, E, C) normalized abnormalities, i.e. a) lowered increased peroxidation rates [Clausen, Jensen and Nielsen, 1988]; and b) raised lowered selenium levels [Clausen, Jensen and Nielsen, 1988].

8)

Dependencies of Condition, Leukotrines, Body and Diet.

Figure 4.9. a) dietary fish oils may be beneficial; b) dietary antioxidants+UFA may be beneficial; c) fish oils lead to production of leukotrienes with less inflammatory properties. d) antioxidants inhibit leukotriene synthesis; e) leukotrienes might be the underlying cause of certain symptoms (retrobulbar neuritis). f) visual solar radiation releases rhodopsin with vitamin A, in the visual pigment of the retina [Hutter, 1993].

4.3. Forming the domain model simultaneously with the DGs

The DM, illustrated in figure 4.10. has been formed, using the concepts and attributes appearing in the DGs of section 4.2. Simultaneous formation of DGs reveals what attributes the concepts should have.

Diet,Body(dep.onInt.&Ext.factors),BodyLeukotrines,&Condition.
Figure 4.10. Domain model for nutrition of a person with MS. The rounded rectangles represent concepts, and the bullets inside them attributes. The stabile relation between concepts 'Diet' and 'Body' is 'Intake'. 'Condition' and Leukotrienes' are aggregate attributes (section 2.1.1) of 'Body'.

4.4. Combining knowledge from different sources

An example of combining DGs will now be given. DGs 1 and 3b, illustrated in figures 4.2 and 4.4, are combined, based on the appearance of node 'Body. Omega 6' in both DGs, and rule 'join' in section 2.2.2. The resulting DG is illustrated in figure 4.11.

Fatty acid content of the body depends on diet and metabolic fault.
Figure 4.11. Combination of DGs 1 and 3b.

After joining DGs 3a and 2ab (in figures 4.4 and 4.3), there are duplicate relations between 'Diet.fishOils' and 'Diet. Omega 3', so rule 'simplify' has to be used. After joining DGs 4 and 5 also, the DG in figure 4.12 is achieved.

A DG with 12 attributes and 13 dependencies
Figure 4.12 Combination of DGs 1, 2, 3, and 4.

Starting from a different point, DGs 6, 7, and 8 are combined. After joining DGs 6 and 7 by node 'Body.peroxidation', DG 8 can also be joined. There are several nodes that have to be joined, 'Condition.degradation', 'Diet.antioxidants' and 'Body.antioxidants', causing a duplicate relation to be simplified bet ween 'Diet.antioxidants' and 'Body.antioxidants' The combined DG is illustrated in figure 4.13.

A DG with 13 attributes and 14 dependencies.
Figure 4.13. Combination of DGs 6, 7 and 8.

When DGs 1 - 8, presented in figures 4.2 - 4.9, or alternatively the two combined DGs in figure 4.12 and figure 4.13, are combined, the DG illustrated in figure 4.14 is achieved. Two additional (descriptive) links have been added. These links are unlabelled. Explanations associated with dependencies can be found from the captions of the DG figures in section 4.2 (numbers attached to dependencies refer to the numbers of DGs, letters to the dependencies in these DGs).

A DG with 19 attributes and 22 dependencies.
Figure 4.14 The combined DG. The numbers associated to dependency arcs refer to DGs in section 6.2.

The combined diagram is in agreement with the results of some conclusive authors [Marshall, 1991; Hutter, 1993], as well as with the author's own experiences.

4.5. Forming the initial KB

The IM, illustrated in figure 4.15, is based mainly on dependencies in figure 4.14, using the technique described in section 3.3 - combining attributes that a certain attribute depends on, as a role. Roles have then been given identifiers and descriptive names. The contents of roles are described in the caption. Inferences simply refer to the dependencies between components of roles. Explanations of the inferences are those of component dependencies.

A simple example of a transformation will now be given. 'Diet. Omega 3' depends on 'Diet.fishOils' and 'Diet.leaves', which are combined to form the role '(A)DietarySourcesOf Omega 3' in figure 4.15, Capital letters in parenthesis are used only for identification of roles in the caption, the actual name is supposed to be more descriptive. 'Diet. Omega 3' itself is a member of '(B) Omega 3factors'. A more complex example is that 'Condition.benefit' depends in figure 4.14 on 'Body. Omega 3', 'Body. Omega 6' and 'Body.antioxidants' that are collated to the role '(E)BenefitFactors' in figure 4.15.

Roles and inferences
Figure 4.15. The initial inference structure with analysis descriptions. The numbers associated with dependency arcs refer to initial DGs in Section 4.2. Contents of ROLES: (A){Diet.fishBodyOil, Diet.leaves}; (B){Diet. Omega 3, Body.metabolcMalfunction}; (C){Diet.vegetableSeedOils}; (D){Diet. Omega 6, Body.metabolicMalfunction)}; (E){Body. Omega 3, Body. Omega 6, Diet.antioxidants}; (F){E,C,A,selenium}; (G) ?; (H){Body.oxygenFreeRadicals, Diet.antioxidants}; (I){Diet.saturated(Animal)Fats}; (J){Diet.saturatedFats, Body.metabolicMalfunction}; (K){Leukotrines.inflammProp, Leukotrines.synthesis}; (L){Body.saturatedFats, Leukotrines.activity, Body.peroxidation}; (M){Condition.degradation, Condition.benefit}

4.6. Formalizing descriptions

Only an example is taken of rule formalisation. The verbal description of rule number '1bd,3b' is "(Due to a fault in FA metabolism) Body. Omega 6 < normal; Omega 6 is absorbed from dietary supplementation." The same matter can be presented in a rule table in figure 4.16.a, which directly converts into the rules in figure 4.16.b, presented in the implementation language. As mentioned in section 3.5, the development process is iterative and modular.

	`if Body.metabolicMalfunction = true and Diet.6 is normal then Body.6 < normal if Body.metabolicMalfunction = true and Diet.6 is supplemented then Body.6 = normal`
Figure 4. 16.a) Rule table describing rule '1bd,3b', i.e. transformation from role D to role E.	b) Rules formed from the rule table beside

5. DISCUSSION AND CONCLUSIONS

The gap between phases of knowledge acquisition (KA) is a problem in general cases, when no specific heuristic is applicable. Results achieved in an early phase of KA often cannot be fully utilized, but part of the work has to be duplicated. Object-Oriented Software Engineering (OOSE) uses seamless transformations to integrate object-oriented (OO) models, corresponding to different phases of development. Seamless transformations are characterized in the following way: "We are ideally able to tell in a foreseeable way, how to get from objects in one model to objects in another model".

It is proposed to attack the problem of disintegration between phases by combining three approaches: unified representation formalism, structured representation, and seamless transformations. The representation formalisms considered are conceptual graphs (CG), to be implemented, it is suggested, using OO networks. The representation structure includes dependency graphs (DG; a specialization of CGs describing inferential dependencies), a domain model, and an inference structure (IS) that is shared by descriptions for different phases of development. Sharing of the IS is possible as the presentation structure can be different in different descriptions, and modifications of the IS, required in other phases, can be made in the analysis phase. Seamless development of knowledge bases (KB) is a series of seamless transformations, following a certain template. The transformations are performed iteratively. Templates are defined for a number of seamless transformations: combining DGs, transforming a combined DG to an initial inference structure, building the domain model based on components in DGs, and transitions based on the shared inference structure.

SeSKA has been tested manually. A tool supporting SeSKA is under development using the programming language Java. Network templates will be implemented for the domain model, the dependency graph and the inference model. Components of these networks will be templates for components of the actual networks created by the user.

The approach provides three kinds of contribution to KA:

Rules developed to combine DGs allow combining pieces of knowledge acquired from different sources, at an early stage of development. These combination rules for DGs, however, are not definite rules of logic. Both initial and combined DGs are networks of possible dependencies. Nevertheless, combining DGs is useful in developing a hypothesis.
Mechanical transformation from dependency graphs to an initial description of a KB reduces the effort needed for design of the KB. The first approximation for a KB may (have to) be edited manually. Different applications might need different ways of performing this transformation, but until now only one way has been defined.
Use of a shared inference structure guarantees traceability between different descriptions of inference and thus the initial description and the final implementation. This helps in utilizing work already done in development, and makes maintenance easier. However, transformations between different descriptions are only semi-automatic, i.e. details of the work have to be performed manually.

Seamless development of KBs is a rather thin approach, as it ignores, for example, all aspects of user interface development. This, however, makes it a very easy way of development, in fact ideal for fast prototyping. When used for real development, it has to be combined with other tools.

ACKNOWLEDGEMENTS

I thank Pekka A. Jussila, MSc. (tech.), for his fruitful comments, and his help in practical matters, and professor Markku Syrjänen for his advice and encouragement. I also thank Dr. Marja Mutanen for her help in finding literature on the basic properties of groups of fatty acids, and Mr. Michael Vollar for checking correctness of the English language.

REFERENCES

Bates, D. (1990). Dietary lipids and multiple sclerosis, Uppsala Journal of Medical Sciences - Supplement Vol. 48.

Booch, G. (1991). Object-Oriented Design with Applications. Menlo Park, California: The Benjamin/Cummings Publishing Company.

Clausen, J., Jensen, G.E. and Nielsen, S.A. (1988). Selenium in chronic neurologic diseases. Multiple sclerosis and Batten's disease. Biological Trace Element Research, Vol. 15 (No. 1).

Cunnane, S.C., Ho, S.Y., Dore-Duffy, P., Ells, K.R., Horrobin, D.F (1989). Essential fatty acid and lipid profiles in plasma and erythrocytes in patients with multiple sclerosis. American Journal of Clinical Nutrition, Vol. 50 (No. 4).

Eriksson, H., Shahar, Y., Tu, S.W., Puerta, A.R. and Musen, M.A. (1995). Task Modelling with Reusable Problem-Solving Methods. Artificial Intelligence, Vol. 79 (No. 2).

Eshelman, L. (1988). MOLE: A Knowledge-Acquisition Tool for Cover-and-Differentiate Systems, in Marcus, S. (Ed.), Automating Knowledge Acquisition for Expert Systems. Kluwer International Series in Engineering and Computer Science. Boston: Kluwer Academic Publishers.

FAO (Food and Agriculture Organization of the United Nations), WHO (World Health Organization) (1993). Fats and Oils in Human Nutrition. Report of Joint Expert Consultation in Rome. FAO Food and Nutrition Paper 57.

Fisher, M., Johnson, M.H., Natale, A.M. and Levine, P.H. (1987). Linoleic acid levels in white blood cells, platelets, and serum of multiple sclerosis patients. Acta Neurologica Scandinavica, Vol. 76 (No. 4).

Fitzgerald, G., Harbige, L.S., Forti, A. and Crawford, M.A. (1987). The effect of nutritional counselling on diet and plasma EFA status in multiple sclerosis patients over 3 years. Human Nutrition. Applied Nutrition, Vol. 41 (No. 5).

Goldberg, A. (1984). Smalltalk -80: The Interactive Programming Environment. Reading, Massachusetts: Addison-Wesley.

Hesketh, P., and Barrett, T. (1989). An Introduction to KADS Methodology, Esprit Project P1098 report M1, STC Technology Ltd.

de Hoog, R., Martil, R., Wielinga, B., Taylor, R., Bright, C., and van de Velde, W. (1994). The CommonKADS model set. KADS-II//M1 /DM1.1b/UvA/18/6.0/FINAL. DM1.1c.

Hutter, C. (1993). On the causes of multiple sclerosis. Medical Hypotheses, Vol. 41 (No. 2).

Jacobson, I., Christerson, M., Jonsson, P., and Övergaard, G. (1992). Object-Oriented Software Engineering, A Use Case Driven Approach. Reading, Massachusetts: Addison- Wesley.

Kontio, J., (1991). Matias: Development and Maintenance of a Large but Well-defined Application. Expert Systems with Applications, Vol. 3 (No 2).

Leo, P., Sleeman, D., and Tsinakos, A. (1994). S-SALT, A Problem Solver Plus; Knowledge Acquisition Tool Which Additionally Can Refine Its Knowledge Base, in EKAW -94, the 8th European Knowledge Acquisition Workshop in Hoegaarden, Belgium.

Lukose, D. (1996). MODEL-ECS: Executable Conceptual Modelling Language. 10th Banff Knowledge Acquisition for Knowledge-Based Systems Workshop in Banff, Alberta, Canada.

Marcus, S. (1988a). Introduction, in Marcus, S. (Ed.), Automating Knowledge Acquisition for Expert Systems. Kluwer International Series in Engineering and Computer Science. Boston: Kluwer Academic Publishers.

Marcus, S. (1988b). SALT: A Knowledge-Acquisition Tool for Propose- and-Revise Systems, in Marcus, S. (Ed.), Automating Knowledge Acquisition for Expert Systems. Kluwer International Series in Engineering and Computer Science. Boston: Kluwer Academic Publishers.

Marshall, B.H. (1991). Lipids and neurological diseases. Medical Hypotheses, Vol. 34 (No. 3).

Motta, E., Rajan, T., and Eisenstadt, M. (1988). A Methodology and Tool for Knowledge Acquisition in KEATS-2, in 3rd AAAI-Sponsored Knowledge Acquisition for Knowledge- Based Systems Workshop in Banff, Alberta, Canada.

Navarro, X. and Segura, R. (1988). Plasma lipids and their fatty acid composition in multiple sclerosis. Acta Neurologica Scandinavica, Vol. 78 (No. 2).

Navarro, X., and Segura, R. (1989). Red blood cell fatty acids in multiple sclerosis. Acta Neurologica Scandinavica, Vol. 79 (No. 1).

Neu, I.S. (1985). Metabolic aspects of multiple sclerosis (Stoffwechselaspekte der Multiplen Sklerose). Wiener Medizinische Wochenschrift, Vol. 135 (No. 1-2).

Nightingale, S., Woo, E., Smith, A.D., French, J.M., Gale, M.M., Sinclair, H.M., Bates, D. and Shaw, D.A. (1990). Red blood cell and adipose tissue fatty acids in mild inactive multiple sclerosis. Acta Neurologica Scandinavica, Vol. 82 (No. 1).

Parpola, P. (1995). Object-Oriented Knowledge Acquisition, Licentiate's Thesis. University of Helsinki.

Rumbaugh, J., Blaha, M., Premerlani, W., Eddy, F., and Lorensen, W. (1991). Object-Oriented Modeling and Design}. Englewood Cliffs, New Jersey: Prentice-Hall.

Schreiber, G., Wielinga, B., de Hoog, R., Akkermans, H. and van de Velde, W. (1994). CommonKADS, a Comprehensive Methodology for KBS Development. IEEE Expert, Vol. 9 (No. 6).

Sinclair, H.M. (1984). Essential fatty acids in perspective. Human Nutrition. Clinical Nutrition, Vol. 38 (No. 4).

Sowa, J. (1984). Conceptual Structures: Information Processing in Mind and Machine. Reading, Massachusetts: Addison-Wesley.

Steele, G. L.(1990), Common Lisp, the Language, Second Edition. USA: Digital Press.

Swank, R.L. (1991). Multiple sclerosis: fat-oil relationship. Nutrition, Vol. 7 (No. 5).

Toshniwal, P.K., and Zarling, E.J. (1992). Evidence for increased lipid peroxidation in multiple sclerosis. Neurochemical Research, Vol. 17 (No. 2).

Wielinga, B., and Breuker, J. (1986). Models of Expertise in ECAI '86, Seventh European Conference on Artificial Intelligence in Brighton, UK.