So you want to validate your PSMs?

Joost Breuker and Alexander Boer

Social Science Informatics (SWI)

University of Amsterdam

{breuker,aboer}@swi.psy.uva.nl

Abstract

This paper describes the first steps in an attempt to verify and validate problem solving methods (PSMs) which are collected and indexed in the CommonKADS Library for Expertise Modelling by using the Cokace environment. Cokace is an implementation of the conceptual modelling language (CML) which was used to specify the PSM in this Library. The aim is to verify and validate these PSMs in an empirical way. At first sight, this appeared to be a faster, easier and in particular more transparent way to test this Library, given the suspicion we have that the PSMs exhibit large functional overlaps. However, an important lesson that can be drawn from this paper is that a proper way of testing involves many preparatory steps to obtain maximal control in the procedure, so that the advantage in efficiency over the formal way may be less or non-existent: it may be another case of the law of conservation of trouble. On the other hand, these preparatory steps may have important side effects, because it turns out that at the same time an exhaustive exercise of reuse of domain knowledge is indicated, which enables us to assess the size and detailed nature of the interaction between PSM and domain knowledge (``interaction hypothesis''). A major part of the paper therefore consists of the justification of the planning and design of the experiments. As a first step, the design of a testing domain is explained. The domain is a window world, a kind of ``blocks'' world. This world is defined by an ontology described in this paper. Some examples of how this ontology is to be used are presented, and some problems we had with CML's domain specification are discussed.

1 Introduction

To the reader

This paper is intended as a discussion paper. It presents the design and justifications of a rather complex experiment in validating a large collection of PSMs. As we have noticed that what initially appeared as a straightforward implementation and testing exercise, easily grows to unmanageable complexity when all issues that come up are taken into account, it seems advisable to discuss the merits of this design before we collect the first results that are still open to multiple interpretations. We also report a first step: a world of screen rectangles (windows) which are analogous to, but richer than the classical blocks world of MIT (cf (Winograd, 1972)). Already this first step is riddled with problems, as this world should allow the specification of literally all kinds of problems. As a moral we can now already state that the intention of careful experimentation has brought the actual execution of the experiments further away, but gave us also a clearer view of what the theoretical stances on PSMs imply. We hope that the readers will bring us again closer to the actual experiments by pointing out unnecessary complexity in our design, but we rather fear to have overlooked complications.

A short history

In 1994 a library of reusable problem solving components was published as one of the results of the KADS-II project (Breuker & Van de Velde, 1994). This CommonKADS Library for Expertise Modelling (short: Library) was still a paper document and contained a few hundred problem solving methods (PSM). The PSMs were indexed by the kind of problem or ``task'' they were supposedly good for. Originally, it was intended that this Library should be part of CommonKADS workbenches. Much effort was spent in the design of the Library itself that should support the knowledge engineer to select and combine PSMs, or parts (``components'') of these, to construct a conceptual model of the application. Each PSM was to be expressed in three versions: a verbal one, one in CML, the CommonKADS conceptual modeling language (Schreiber et al.,1994), (Van de Velde,1994), and one in a formal version of CML (called (ML)2), that allows a good translation from CML (Van Harmelen & Balder, 1992). These versions were supposed to play different roles in the modeling process. The text version explained the component; the CML version, in particular in graphical format, was to be used as part of a CommonKADS workbench (conceptual model editor), and the formal version would allow consistency checking. A fourth version, useful for prototyping, could have been an operational one, but this was not foreseen as -- in those days -- no operational CML systems were available. Only part of these plans could be executed, resulting in the paper version, consisting of PSMs described in textual form and in general also as CML code categorized by the typical generic tasks (problem types).

A paper version of a library of PSMs is not an invitation for reuse. The book provides ample documentation about the methods, but it is of limited practical use, compared to the specified, but not yet operational system version. In this version a hypertext like structure should facilitate browsing and searching; the graphical CML structures lend themselves for simple direct manipulation and keeping an overview of the structure. Also, it would be a small next step to make an implemented Library available on the internet, so that it could be easily accessed world wide. In this form it also invites new contributions. These contributions were to be expected to be of two kinds. First, new PSMs could be added to the collection: it may be the largest available at the moment, but it is certainly not complete, (see e.g., (Benjamins et al., 1996) who are still extending the planning section of the Library). A second kind of contribution consists of experiences of users (knowledge engineers) who can add documented evidence about actual use, potential problems and bugs. In short, a published and publicly available, easy to use and to maintain Library would create an ideal environment for empirical validation by practical use and feedback. In summary: the Library is intended to be an open library, which allows the incorporation of practical experience and experiment (Valente et al., 1994; Van de Velde, 1994).

However, besides the managerial problems in updating and maintaining such an open library, the modelling components (PSMs) should be as correct and valid as possible. This is not only a responsibility the authors/maintainers of the Library have towards practical users, but also to obtain more accurate feed-back. In empirical science it has become clear that nature and practice do not speak by themselves, because empirical data are easily prone to divergent interpretations. One needs a theory or model that is sufficiently explicit and close to the world modelled that the feed back from experiences can lead to unambiguous diagnoses and revisions of the model(s). Moreover, individual experiences are not the most well controlled stuff on which to build a valid set of models. By controlling and manipulating parameters and comparing ``effects'', a systematic approach may reveal deficiencies, overlap and scope of validity in a more cost effective way. The a priori evidence of the validity of the PSMs in the Library varies largely. Most PSMs were collected from descriptions in the literature; in particular the methods for planning, configuration and diagnosis were obtained this way. Other methods were derived from known applications and still others were constructed from some principles. Even if PSM specifications in the Library were obtained from well documented sources, it is possible and to be expected that errors of interpretation and specification have occurred. Therefore when making the Library publicly available for practical use one should take care that the models (PSMs) are correctly and consistently described and that there is evidence that they work as specified and claimed.

Having collected the PSMs and expressing these by the same formalism allows moreover a systematic, even exhaustive comparative approach. Looking at the description of many PSMs, intuitively it appears that they are similar, have overlapping components or seem to reflect common principles (rationales). For instance, the Propose & Verify PSM (Marcus & McDermott, 1989) (see also: (Schreiber & Birmingham, 1996)) looks like a specific subtype in (Chandrasekaran, 1990)'s ``Propose, Verify, Critique & Modify'' family of PSMs. Hypothesis discrimination methods in (model based) diagnosis look similar to the verification methods in PSMs for design and for configuration (see also (Breuker, 1994b)). There is no better way to assess the real differences and similarities between the PSMs (components) than by putting them all in the same framework and to the same ``tests''. This comparative framework can then also be used to detect and abstract underlying rationales.

These were the initial motives for the CoCo project of which we will describe the first steps and problems in this paper. [1] The major part of the paper is concerned with the design of the empirical studies that are required to obtain insight into the validatity and the relative merits, characteristics and requirements of the PSM (see Section 2).

1.1 What are the questions

Assessing the validity and quality of a collection of PSMs raises more questions -- and more interesting ones -- than the validation of a single knowledge base or knowledge system. Standard verification and validation procedures are only concerned with the correctness, respectively the effectivity of a particular system. Verification addresses the question whether one has constructed the system ``right'', while validation can be captured by the question whether the system is the ``right system'' (Laurent, 1992), (Plant & Preece, 1996). The system is right when it complies to the user requirements. In CoCo the validation questions are not aimed at the question whether a system ( i.e., PSM) is right for a specific purpose, but what the scope of effectivity of a PSM is. In other words we have to establish the scope of reuse of PSMs. In the literature on reuse one will find only case studies in reuse ( e.g., (Marcus 1988)) in which a PSM is used in one or two new application domains. This leads to only incidental evidence for reuse from which it may be difficult to generalise. In CoCo we do not vary the domain, but will be able to show that some PSMs include the scope of other PSMs, i.e., are more generally applicable (reusable), in particular with respect to the types of problems they solve. As a result, we will be able to state more precisely what a PSM is good for. The kind of questions in this comparative validation and verification study are:

Correct and complete?: The PSMs have been translated and/or specified in CML and errors may have been made.
Effective?: Like any method, PSMs are used to achieve some goal. The question is whether a PSM can effectively be (re)used to obtain the results it is supposed to achieve. In other words, the question is whether the PSM works according to its (claimed) function.
Efficient?: Methods may differ largely in the costs of resources they may require. In a comparative study we can assess whether PSMs which yield the same results differ in demands on resources.

The first question refers to the classical verification problem, while the second (and to some extent also the third question) are questions about the validity of the method. As validity implies a correct and complete specification or implementation the first and major question is about the function of a PSM (see Section 1.2 for more about verification.) The function of a PSM states what kind of problems it solves. Given that for all PSMs in the Library some function is already claimed, we will check in the first place whether the PSM effectively does what it is supposed to do. For a hammer or other tool its function(s) may be obvious, but for a PSM this is not so self-evident. It is curious that this question has not been yet object of research on reuse of PSMs. Usually, the function of a PSM is described by a term that indicates a `generic' task, such as planning, diagnosis etc. For instance, Propose & Revise is claimed to be useful for configuration (Marcus & McDermott, 1989) or ``parametric design'' (Schreiber & Terpstra, 1996). [2] (Clancey, 1985) shows that heuristic classification can be used for (some?) diagnosis problems, but also for assessment problems. GDE, the `General' Diagnostic Engine, finds faulty components (de Kleer & Williams, 1987), but cannot be applied to the typical diagnosis problems that are handled by heuristic classification, e.g., identifying the disease(s) of a patient. However, there exists a very precise (set of) definition(s) of the kind of problems that GDE is able to solve (de Kleer et al., 1992). Besides the fact that the terms are somewhat arbitrarily applied, there is no consensus of what the terms mean. (Breuker, 1994) shows that there are at least three completely distinct meanings of the term ``diagnosis'' (see also (Stefik 1995; p. 671-672) who talks about ``dimensions of variation in diagnosis tasks''). However, the most intriguing problem is that task type and method appear to be highly tautologically described. When PSMs are described as task decompositions, the function of a PSM coincides with its top-level task (Breuker, 1997). We will return to this problem in Section 2.

Validating (testing) systems or devices differs somewhat from that of PSMs, because a PSM is always incomplete. It has applicability conditions and needs domain knowledge to operate. In the specification of reusable PSM these conditions and required domain knowledge are called features (in the Library, see also (Steels, 1990}), or requirements and assumptions (Fensel & Benjamins). The domain requirements go often under the name `method ontology' (Erikson et al. 1995). The effect (main function) of a PSM can also be viewed and described as an assumption or requirement that is implied by the PSM. For instance, GDE finishes successfully by the identification of one or more components of a system which can be blamed for some malfunction(s) of a system. However, heuristic classification can be used to classify the malfunctioning of a system, which is assumed to be a diagnosis. GDE therefore requires that `components' are an explicit part of the domain knowledge, while heuristic classification requires that there is some hierarchy of classes of malfunctions. As a PSM consists of subtasks, for each subtask input/output requirements can be specified. In CML these are called roles. Therefore PSMs are riddled with assumptions: not only on what constitutes the outcome in terms of domain knowledge, but for all the roles it imposes on the domain knowledge to play. Besides this ``method ontology'' (Eriksson et al., 1995) of a PSM with respect to domain knowledge, many more kinds of assumptions and requirements are involved in PSMs, that act like applicability conditions in restricting the scope of use (Fensel & Benjanmins, 1996). Therefore, we may be concerned more in general with testing the assumptions and requirements of the methods, and not only focus on those that describe the main function. This makes the problem even more complex, as the assumptions and requirements are not of a uniform nature.

The use of assumptions is also at the bottom of the next validation question: the efficiency of a method. Given two PSMs with the same effect (function), one PSM may use less resources (computations, domain knowledge) than the other one. In general, there is a trade-off between efficiency and assumptions: making assumptions instead of making inferences makes cheap problem solving. Compiled out knowledge takes less steps than derivations from ``first principles'' to arrive at the same conclusion, etc. Assumptions that make PSM efficient or even tractable put the burden of validation questions to the completeness and correctness of the specific application domain knowledge, and in particular to the credibility of the source of the knowledge. Therefore, efficiency of a PSM cannot be a single criterion to prefer it as its choice may lead to less flexible and maintainable knowledge systems than less efficient candidates.

Assumptions are called ``features'' in the Library. However it is for certain that only a small part of the assumptions and requirements of the PSMs are made explicit: most are specified to discriminate between PSMs which means that only the features that distinguish between PSM with similar functionality are described. Therefore an important goal of the CoCo project is to check the PSMs in such a way that assumptions become explicit. This can/is expected to occur in various ways, as we will explain in the next section.

1.2 Formal vs operational validation

The approved road to answer the verification and validation (V&V) questions -- i.e., in particular the questions about correctness and effectivity -- is by logical formalization. In principle, the Library allows an easy way to verify the consistency and typing of the PSMs. The pivotal specification of a PSM is an inference structure, consisting of inference functions which are connected by input-output roles. For the Library, a set of canonical inference functions was defined and formalised: a typology of inferences was constructed based upon input-output mappings (Aben, 1994; Van Harmelen & Aben, 1996). All PSM specifications were supposed to bottom out in these canonical inferences, so that the well defined semantics of these building stones could be used to propagate the consistency checking through the complete structure. However, this guideline was not followed systematically in the construction of the Library. Besides this practical reason, this process of checking the correctness of the specification would miss out the most problematic component of a PSMs: the method ontology (or: inference-domain knowledge interface of the PSM). The canonical inferences are based upon a very abstract and general ontology (a standard terminological one) which does not capture the specific demands on the domain knowledge of a particular PSM. This is only possible when in the process of V&V domain knowledge can be used that is sufficiently specific so that it is subsumed by (any) PSM.

Another way the Library can be evaluated is by translating the CML specifications into (ML)2, as was also foreseen in the KADS-II project. (ML)2 preserves the structure of the expressions in CML, so that there is an easy mapping between the two versions. This mapping allows a semi-automatic initial translation by the use of the Si((ML)2) toolset, which makes the correspondence between a conceptual and a formal specification highly transparent (Aben et al., 1994). In (Van Harmelen & Aben, 1996) the process of translating the ``informal'', but well structured CML specifications into (ML)2 specifications, and its use in specification validation is fully described. Structure preserving means in this context that the ``layers'' or types of knowledge that the CommonKADS CML distinguishes -- domain, inference and task -- are identified in both languages. However, within each layer, there is not a one to one correspondence between the syntactic categories of CML and those of (ML)2. For instance, CML distinguishes at the domain layer a number of categories, like concept, property, attribute, relation, expression, etc while this layer is expressed in (ML)2 by classical FOPL (extended with ordered sorts). The translation here implies to `descend' to less ontological commitment. However, this is not true for the inference layer, which requires a more precise commitment to the domain layer than CML requires. The CML inference layer expresses the inferences as functions between input and output roles, and these roles are to be filled with domain knowledge. Therefore, in (ML)2 the inference layer is a logical meta-theory, which also `knows about' roles and their content, i.e., the reflective predicates ask-domain-axiom and ask-domain-theorem are used to establish the connections with the domain layer. For the task layer, which specifies the control of the reasoning, (ML)2 uses dynamic logic. Therefore some `verification-by-refinement' is implied by this translation process: (ML)2 requires to specify which axioms or theories are to be consumed by a specific inference function. However, also in this case we need domain knowledge to validate the PSMs.

Besides a specification of a (generic) knowledge base of domain knowledge, one needs some dynamics as well. The formal specifications written in (ML)2 can be `animated' by the use of theorem proving procedures. These proof procedures are an instrument to verify and validate (V&V) the specification (see (Van Harmelen & Aben, 1996) for the general picture and a worked out example). The proof procedures perform consistency checking and the tracing of the validity of the specifications in one and the same general theorem proving exercise. Using this formal way certainly has its advantages, but requires still a lot of hand crafted formalization in (ML)2, which is by no means a trivial effort (Ruiz et al., 1994).

Although formalization is certainly the royal road to validation, also other procedures are available, particularly because PSMs are not quality critical products, such as e.g., safety systems. A 100 % guarantee of correctness and effectiveness is not required, and further refinements are to be expected in the use of the Library itself. The verification procedures can be separated from the validation, which does not necessarily require animation by theorem proving, but the use of an interpreter, as is for instance the traditional way rule based systems are verified (see e.g., (Murrel & Plant, 1996) for a compact overview). The semantics of production systems are well known so that a rule base can be checked on the basis of formal, ``syntactic'' properties. Validation means in this context the checking of the operations of a system against its specification. As a PSM is (also) a decomposition of a goal into subgoals -- specified as the content of output roles -- a PSM can be validated in an operational way by checking the contents of output roles during execution, given the initial input roles (data; domain knowledge).

Strictly speaking, CML has no semantics, nor an interpreter. The mapping to (ML)2 only provides `semantics-by-association'. For instance, in (ML)2 the domain layer is represented by sorted logic, and does not distinguish the syntactic categories of the official grammar.[3] However, at least three operational versions of CML are available:

Cokace: (Corby & Dieng, 1996) is built on Centaur, an environment which allows one to generate a software engineering environment, including structured editors, type-checkers and interpreters on the basis of (formal) specifications. In this way one can construct environments for classical languages such as Ada, C etc. (Corby & Dieng, 1996) used Centaur to specify CML, resulting in the Cokace environment (see further Section 2).
KLoom: (Coelho & Lapalme, 1996) is an operational version of CML expressed in terms of the Loom knowledge representation system (MacGregor, 1991). The inference and task layer are defined as Loom concepts, while the domain layer is directly expressed in Loom itself.
OCML: (Motta, 1996) stands for Operational CML, i.e., the conceptual modelling language of VITAL, which can be viewed as a dialect of the CommonKADS CML. OCML differs slightly from the latter at the domain layer, where the specification categories are the traditional ones (concepts, relations, and attributes).

Cokace

Cokace was selected for our purpose, primarily for two reasons: (1) it consisted of a ready made programming and testing environment for CML specifications, in particular because of its type-checking, and (2) Cokace contains the full CML specification, while the other two systems have some minor variations: in particular, they do not follow the CML domain categories, but have their own (like (ML)2). Additional reasons are that the Cokace environment lends itself very well for a full implementation of the Library, complete with the search and navigation tools as required by the design specification of the Library. Finally, mechanisms are under construction that allow interactive search, selection, editing and prototyping van CML specifications via internet (see http://www.inria.fr/acacia/Cokace).

The Cokace environment is constructed at INRIA and consists of three components (see (Corby & Dieng, 1996) for the details):

A structured CML editor for writing syntactically correct CML specifications.
A type checker that monitors the consistent use of types in the CML expressions.
A rule interpreter that executes CML specifications. The interpreter allows some initial ``validation by prototyping'', i.e., checking whether the CML specification works, and works as intended.

The type checker and CML editor allow us to have a first semi-automatic verification of the CML specification. The interpreter is the major vehicle for validity testing. This testing of the PSMs can only be performed if some domain knowledge is available to operate upon.

Cokace has an apparent disadvantage over the two other operational CML environments: the task/inference layer does not operate directly on the CML domain specifications. For each inference, a small knowledge base of rules has to be constructed by hand to make the interpreter run. [4] We assumed this disadvantage to be a relative one. Because of all the interactions one may expect between the PSMs and the domain knowledge, a single knowledge base may never do anyway to test all PSMs; below we will explain this in more detail.

2 Planning the experiments: the CoCo cycle

The CoCo experiments follow the classical paradigm of manipulating inputs and observing the outputs or effects. As we are not simply interested in absolute effects, i.e., whether the supposedly required inputs (including assumptions) are handled by the PSMs as claimed, but also in their relative effects, we need to keep as many factors constant and only change the relevant one, i.e., the PSMs. As we will see, this principle cannot be applied in all respects, but we will try to come as close as possible. In the next sections we will discuss the issues involved in preparing the inputs: the domain knowledge (Section 2.1.1) and the problem situations (Section 2.1.2). Then we will discuss how we will observe what effects in Section 2.2.

2.1 Specifying the inputs

2.1.1 Keeping domain knowledge constant

In order to test or identify functional similarities between PSMs, it is necessary to keep the inputs to these PSMs as constant or even identical as possible. PSMs have two kinds of inputs: the dynamic input, consisting of data and the problem statement, and the static input, which is the (generic) domain knowledge. By definition --because it defines the problem-- the dynamic input has to change with the (kind of) PSM. However, the domain knowledge can be made as uniform as possible.

The first step is to use one and the same domain for the validation procedures, so we can reuse the same terms or concepts all the time. Or rather: we can select from the same domain ontology the concepts we need for constructing knowledge bases. With domain ontology we mean also the inclusion of its ontological commitments. As many PSMs use various types of domain knowledge, we may specify these types as generic model specification. For instance, many PSMs use causal, behavioural, functional, structural etc. knowledge, often in the form of models of the problem situation, or as generic library components from which models of the situation can be constructed. Particularly, model based reasoning PSMs require these kinds of typed generic model knowledge. These generic models are not necessarily ``task neutral'' as they may contain specific interpretations of the relations. For instance, in many abductive PSMs for finding causes of abnormal states, the interpretation of the causal relation is more ``compiled out'' than in the behavioural models that are used in model based diagnostic reasoning. CML distinguishes between ontologies and models, but this is only a nominal distinction: the language categories used are the same for both. [5] These ontologies are definitely not ``task neutral'', and they are probably not meant to be so. This is not to say that they are not reusable for a limited range of tasks/problem types.

Knowledge bases are to be constructed from these ontologies and model knowledge. We will leave aside for the moment the problem of translation into different knowledge representation formalisms. CML specifications have a frame based, object oriented flavour, while Cokace uses rules. This translation can be performed largely in a syntactic way, as long as there are explicit semantics, which is to a large extent what ontologies are about and makes them `portable' (Gruber 1993). Where semantics come in we have a more general problem anyway, as an ontology and the knowledge typing cannot be directly mapped onto a knowledge base, at least not in a straightforward manner, if one keeps the notion of ontology as different from knowledge base, i.e., as task neutral as possible. It is suggested in the use of CML specified ontologies that via relatively simple mapping operations the components of these specifications can be turned into knowledge bases (Schreiber et al., 1995). However, the distance between what's in an ontology and the requirements of a knowledge base, given a PSM, is not necessarily simple. There can be two types of these ``knowledge mismatches'':

In general, the method ontology does not seamlessly fit the domain ontology (Gennari et al., 1994; Coelho & Lapalme, 1996; Schreiber et al., 1995). In practice it means that the method ontology is specified at a somewhat more abstract level than the domain ontology. It appears that the mapping or ``repairs'' required can be performed by simple data-base like operators (Coelho et al., 1996; Visser et al., 1997).
Not all knowledge that is used in solving problems in an application domain can be expressed in an ontology (see (Motta et al., 1996) for an illuminating case study). The efficiency of PSMs is based upon assumptions one can make, i.e., upon knowledge that is not to be used -- and verified -- by the problem solver. An important category of these assumptions are in compiled-out knowledge. Compiled out knowledge consists of shortcuts in inferences that often may have their roots or underpinning in domains that properly speaking do not belong to the domain. This is in particular the case where the level of detail of the domain makes a `quantum leap', i.e., is of a completely different nature. For example, medical reasoning may make many shortcuts at the level of (bio)chemistry. Not only lines of inference can be compiled out, but also the derivation of views may become problematic. For instance, medical diseases are processes, and a ``correct'' ontology would specify diseases as such. However, for pragmatic reasons these process descriptions can be abstracted or reified as static classes with symptoms as properties. Characterizing processes by properties that make up (pseudo-)taxonomies, e.g., by using a taxonomy of micro-organisms that are the sources of disturbance of the normal physiological processes, one puts a view (= model) on the ontology. [6]

It is obvious that in the latter case the construction of a knowledge base from an ontology (via models) is by no means a straightforward mapping. It may mean complex transformations and even additional knowledge acquisition. In the CoCo project we may choose or construct an artificial domain, where we can avoid these complications in domain modeling and knowledge base construction. In the domain we have designed, no (relevant) knowledge has to be imported to explain the working of lower grain size levels (Section 3) However, as the method ontologies of many PSMs will require compiled-out types of knowledge, we may have to transform the domain ontology either by hand, and/or by the use of machine learning techniques (in particular, explanation based generalization). However, it is too early to say now whether we will need more, as we have no exhaustive inventory yet of the method ontologies of the PSMs. It is difficult to see how a formal testing procedure could have avoided these problems. Some experience in constructing formal, (ML)2 versions from CML specifications and reusing these suggests similar problems, i.e., only reuse of highly generic parts (Ruiz et al., 1994). [7]

A small complication in the use of Cokace as our testing environment is that fact that the CML task/inference layer does not operate on one single knowledge base. Each inference requires its own knowledge base. It means that the knowledge has to be distributed in advance over these inferences. It will have as an additional effect that we will be able to study also the reusability of specific domain inference knowledge. We expect this reusability to be rather limited as the KADS inferences define formal functions rather than content on which they operate. Figure 1 summarizes the specification steps for the static inputs (domain knowledge) for our experiments.

Figure 1: Dependencies (steps) in constructing knowledge bases for the CoCo test.

2.1.2 Generating tests

The dynamic input of a PSM consists of a problem situation description, which can be summarised into a problem statement and data. The problem statement is a generic description of the solution, for instance, a design, a diagnosis etc. (Breuker, 1994b). The data consist of values for domain parameters. Given the type of problem and the PSM, the set of these parameters is fixed. [8] An ideal, exhaustive test would consist of all possible combinations of values for the set. This is not only impossible because of the combinatorics involved, but also because the values may come from continous parameters. Therefore we need to use qualitative values. Like in qualitative reasoning we may also look for landmarks in the ranges of the values, or perform sampling operations. Obviously, given that the set of PSM is already large -- a few hundred -- we have to be as parsimonious as possible in generating the inputs for the PSM.

Assessing the interaction hypothesis

This (exhaustive) testing of the PSMs also implies an experiment on the exhaustive reuse of a domain ontology. However, the expectation that several, probably many knowledge bases have to be constructed to test all our PSMs, is the consequence of the famous interaction `hypothesis' which states that knowledge is geared to its use (Chandrasekaran, 1987). Although there is no doubt that this interaction between knowledge and task/method exists, it is not clear to what extent this is true. First there is the problem of what is meant by `knowledge'. We assume here that knowledge as specified in an ontology, i.e., the terminology of a domain, is use-neutral, but also in no way directly usable by a PSM. By restating the interaction hypothesis that all operational knowledge is geared to its use, we make it almost trivial and tautological. Therefore, an operational definition for the interaction effect is that not all knowledge that can be used in a PSM can be reused in another PSM. That is exactly what this CoCo experiment is going to find out. Therefore a side effect of this project is the assessment of the size and nature of the interaction effect. In KADS the interaction effect was only recognized late (end 80-ies) and assumed to be of relatively little importance (Wielinga et al., 1992) talk about the ``limited interaction hypothesis''). It was assumed that a large part of interaction effects were due to sloppy modeling and knowledge acquisition which took expert reasoning as its first guideline and paid no attention to the underlying, reusable domain principles, much in the spirit of (Clancey, 1985). At the other extreme we may (still?) find the Generic Task approach, hypothesizing that no reuse should be possible if the PSMs have nothing in common. In the 90-ies this potential controversy was not followed up, but it is clear that an assessment of this effect may have large implications the to be expected revenues of reuse of domain knowledge for problem solving.

2.2 What is to be observed?

Validation implies a match between intended behaviour and actual behaviour. The behaviour that is of main importance for a PSM is its `final state'. In teleological terms we want to see whether the intended effect coincides with the actual result of the execution. However, as briefly explained in the introduction, the specification of the intended effects is sloppy. The terms used have only a common sense meaning and are often ambiguous. In designing the Library, we have tried to give a more precise meaning to these terms and to relate these terms in such a way that they become exclusive and cover the full range of types of problems (Breuker, 1994b; Breuker, 1994a). This ontology is not to be represented as a taxonomy. It turns out that problem types are dependent on one another. A device can only be diagnosed when its design (structure description) is known. A prediction can only be made when input values have been assigned, etc. Moreover, it turns out that these dependencies between problem types can also be observed within PSMs. For instance, the verification of a design involves the specification of input values (test generation) and comparing the predicted output values with the required values, etc. In Figure 2 these dependencies are shown, while Table 1 summarizes the (still informal) definitions of the types.

type of problem generic solution

modelling separating system from environment

design structure of elements

planning/reconstruction sequence of actions

assignment/scheduling/configuration distribution/assignment of objects or values

prediction state of system, value of parameter

monitoring predicted or discrepant states

diagnosis faulty elements

assessment class/grade attribution

Table 1: Problem types are characterized by the generic solution they aim at.
Figure 2: Dependencies between types of problems: a problem type requires the solution of its dependent problems. The structural view focuses on the internal structure of a system; the behavioural view focuses on the interactions between system(s) and environment.

**Table 1:** Problem types are characterized by the generic solution they aim at.
type of problem	generic solution
modelling	separating system from environment
design	structure of elements
planning/reconstruction	sequence of actions
assignment/scheduling/configuration	distribution/assignment of objects or values
prediction	state of system, value of parameter
monitoring	predicted or discrepant states
diagnosis	faulty elements
assessment	class/grade attribution

However, we will not only be interested in the final state and match it against the problem type. This can in many cases also be accomplished by a static verification of the kind of domain knowledge that is supposed to fill the solution-role of a PSM. In fact this is what Cokace can easily achieve for these cases. In operational validation we can be more precise:

We also want to check whether individual solutions are correct ones. In complex, real life domains, it is often not possible to specify correct solutions in advance and only pseudo golden standards can be elicited from domain experts. However, in our case we can keep the domain relatively simple and transparent, so that the correctness of a solution can easily be checked or obtained in an exhaustive way ( i.e., by very simple theorem proving.). In fact, in the domain proposed -- a window world -- the solutions can be demonstrated in a pseudo-Wittgensteinian way (Section 3.1).
To compare PSMs it is not sufficient to check whether they obtain correct solutions but we should also being able to identify to what extent these solutions are obtained in the same manner: the dynamics should be compared as well. This can be read from the successive states (contents) of the dynamic roles, which are the temporary data stores of the problem solver. In fact, we are not only interested in the effect, but also in assessing the specific method ontology for each role.
As the effect is still a major index for reusing a PSM we also want to establish the scope of a PSM. Can we reuse heuristic classification for identifying a defective component, as (Clancey, 1985) claims for the SOPHIE-III system? Can Cover & Differentiate (C & D) indeed be used for all types of problems, as (Duursma, 1992) argues, or can it only generate explanations in terms of semi-causal networks. Therefore, we will also apply PSM to problem types beyond what they have been identified for by the original authors. In this way we can assess part of the competence of a PSM. Note that implicitly we may stretch also the specification of the full method ontology: not only that of the last output role. However, this will not occur in the same systematic way as for the effects. Of course, in many cases the violation of the method ontology will lead simply to a stop or crash, which is not very informative. To extend the test to its limit we may have to stretch the terms in which the method ontology is specified to more ``task neutral'' terms, as proposed by (Beys et al., 1996). The causal path in C & D may be reduced to its symbolic version: any directed graph, so that also knowledge structures that do not satisfy this specific knowledge level characterization may be handled adequately (Breuker, 1997).

Assessing the dynamics by comparing what happens to the intermediary roles instead of only looking what is left in the last role-slot provides also an empirical test to the hypotheses that have lead to the construction of the suite of problem types: that the dependencies within a PSM follow the suite. It sounds plausible that the testing procedures for selecting a good and correct design or plan follow the paths that lead to assessment and/or diagnosis, but we need far more detailed and systematic observations to support, or reject this hypothesis. A straightforward static comparison between PSMs is not only obscured by the fact that the terms for roles and subtasks are common sense and often arbitrary ones, but also because the PSMs in the literature are presented as decompositions and the dependencies are only visible in an indirect way. In CoCo we can operationalize the content of the dependencies by reference to the same domain entities.

2.3 Summarizing the CoCo cycle

In the next section the first steps of the CoCo project are presented: the specification of a screen windows world by means of an ontology, some model fragments and some examples of how problems can be specified in this world. We do not expect to have this world constructed ``right'' from the start. Therefore we foresee that we better start the testing by try-outs and explorations in which the actual use may learn us about the behaviours of this small but virtual world. This ``formative'' way of testing is expressed in Figure 3 as a cycle.

Figure 3: The CoCo validation cycle of problem solving methods (PSM).

3 The design of a domain

In this section we describe the first steps and some of its results in the CoCo validation cycle. As stated above, to ensure good comparison between PSMs, and in particular to find similarities and overlapping components between them, we need to apply the methods to the same domain. The choice of such domain is crucial and below we first discuss its requirements. Then we propose a domain - windows in graphical user interfaces.

3.1 Clean windows

Some of the requirements for the testing domain have been addressed implicitly in the previous sections. Here we review these and others:

Meaningful problem situations: The domain should enable the specification of all types of problem situations in a meaningful way. If the domain ontology can be mapped onto the full range of method ontologies, the domain is suitable for exhaustive testing of all PSMs. Although one may believe that as a matter of principle all problem types in some way apply to all subject domains, it is obvious that many natural domains have typical problems. For instance, in electro-mechanical domains, planning problems are more difficult to specify than design or diagnosis problems.
No functional bias: Related to the previous requirement we need a domain whose ontological specification does not introduce functional biases, and is really ``task neutral''. Most natural domains do not satisfy this requirement. This is problem is well exemplified in the Sysiphus-II studies (Schreiber & Birmingham, 1996). The ontology prepared by (Gruber et al., 1996) that was the basis for this study can only be reused for configuration (assignment) tasks, despite all efforts to make it as task-neutral as possible (see also the comments of (Motta et al., 1996) on this issue). For instance, the ontology assumes that the spatial and mechanical design of the elevator has been fixed and can therefore be reified to simple parameters. Therefore it is impossible to use the VT domain ontology as a basis for design, for planning and for diagnosis problems.
Transparent: The domain knowledge should be as transparent as possible, i.e., its features should be explicit. This requirement can be satisfied in the first place by making ontological commitments explicit by specifying which top ontologies are included. However, if we are not completely explicit or correct, in the formative evaluation cycles hidden ontological commitments may become apparent during the construction of the various knowledge bases.
Simple: The domain knowledge should be represented as accurate as possible. There should be no hidden, low level layers that may give rise to unexpected interactions (Simmons, 1992), and there should not be levels of aggregation that imply qualitative shifts in conceptualisation, as e.g., in going from physical descriptions to chemical processes. This means that the domain should be rather simple. Simplicity is of course also associated with transparency. However, by keeping the domain simple, we may not be able to assess the efficiency of PSMs. As PSMs differ in particular in the way they handle the complexity of most natural domains, this puts a real limit to our original goals. We may not be able to assess in an easy way how PSMs scale up, other than by formal analysis -- the classical way --, or by augmenting the complexity of the domain by significant steps. The latter option looks simple and attractive, but it adds an independent variable to the design of CoCo , which increases the required efforts by at least a factor two. [9] Probably a more efficient way to assess the efficiency of PSMs is by sampling extremes and use the inclusion relations between the assumptions of methods to derive qualitative metrics for efficiency.
Observability: The effects of the execution of the PSMs should be easily inspectable. For instance, visualisation allows an easy grasp of what changes and what remains persistent in a world, as our perceptual apparatus for visual input modes is tuned to the the comparison of complex patterns to distinguish differences and changes.

These requirements fit typical toy domains used in AI like the blocks world. A disadvantage of the blocks world is that it does not satisfy the first requirement. The blocks do not exhibit active processes, or we must add a world of physics that includes gravity, which makes it not very ``lively'': it is a typical steady state domain. For instance, it is difficult (but not impossible) to see what kind of ``devices'' can be constructed from connection blocks (design problems), or how blocks can be defective components. Therefore, we rather create a virtual blocks world in the machine in which the blocks themselves can be active elements. Such a blocks-world can easily be found in the basic notions underlying computer windowing systems. When we abstract from the content of these windows and their underlying management systems and exclusively specify what happens on the screen, we have a simple world of ``two-and-half'' spatial dimensions, where the windows themselves have processes which can affect their own spatial behaviour and/or that of other windows. The windows can be programmed as minimal `agents'.

The PARC Windows user interface has been the leading paradigm for graphical user interfaces (GUIs) for more than a decade, and with good reasons. It exploits an intuitive analogy between information structure and graphical layout. Our abstract domain is about these windows. We will strip away all additional features and end up with a domain as abstract as the blocks domain. For instance, our windows have no scrollbars, as they have no content: they are really transparent, ``clean windows''. The clean windows are empty information containers, which have the capacity to open/close, move and to change size. These actions can be triggered via buttons and the buttons can be connected as to propagate the initiation of specific actions. In this way all states are visible on the screen. We are implementing this relatively simple domain in such a way that it can be interfaced (via X-Windows) to Cokace, so that all problem solving states can be directly mapped onto the screen. In the next section we will present the ontology that is the specification both for Clean Windows and the domain knowledge for CoCo.

3.2 An ontology for a window world

This section describes the ontology we constructed for the CoCo project and as an example relates it to one of the problem types to which it should be mapped. The CML specification is not included for practical reasons; It specifies nothing that cannot be expressed in the textual description and we hope to save some trees by omitting meaningless syntax. An overview of the most important concepts in the windows domain is given by the generalization tree in Table 2. The subtypes imply disjoint, but not necessarily exhaustive, membership and the features that differentiate between these subtypes will all be mentioned in the description.

individual - m-individual - - region - - - screen - - - device - - - - frame - - - - - window - - - - component - - - - - display - - - - - - fixed-size-display - - - - - - - icon - - - - - - - label - - - - - - variable-size-display - - - - - - - text-view - - - - - - - image-view - - - - - controller - - - - - - button - - - - - - - label-button - - - - - - - icon-button - - application - - state - - event - - - operation - - - - open - - - - close - - - - resize - - - - move - - - - trigger - - - action - - - - press - parameter - - boundary - - - x-boundary - - - y-boundary - - stimulus - connection

Table 2: main concepts in GUI domain
Any object visible on a screen can be described by its spatial and temporal extents. An object can be described by the region it occupies in space and the interval it occupies in time. Three objective measurement scales exist: the x-axis, y-axis and time. If an object is furthermore part-of another object, it is inside that object or equal to it in both a spatial and temporal sense. Composite objects on the screen can be characterized by their cohesion over time. Cohesion can be axiomatized by the connectedness of devices in spatial terms. A design may put constraints on both cohesion (layout topology) and spatial extension (fixed or minimal/maximal extension) of objects. Cohesion in this sense is a variety of causal connection; If x and y are connected in this way, moving x might for instance cause y to move in the same direction and with the same displacement. Since these views are quite common, this ontology reuses the mereology and topology definitions of the PHYSSYS ontology (Borst & Akkermans, 1997) and applies Allens' thirteen primitive relations between intervals (e.g. Allen and Kautz in (Hobbs & Moore, 1985)) on both time and space.

**Table 2:** main concepts in GUI domain
individual - m-individual - - region - - - screen - - - device - - - - frame - - - - - window - - - - component - - - - - display - - - - - - fixed-size-display - - - - - - - icon - - - - - - - label - - - - - - variable-size-display - - - - - - - text-view - - - - - - - image-view - - - - - controller - - - - - - button - - - - - - - label-button - - - - - - - icon-button - - application - - state - - event - - - operation - - - - open - - - - close - - - - resize - - - - move - - - - trigger - - - action - - - - press - parameter - - boundary - - - x-boundary - - - y-boundary - - stimulus - connection

The primary entities visible on the screen usually go by the name of objects or devices in GUI designers' parlance. Since object is too general and already has a fixed meaning in CML, we adopt the name device as a general term for the window-system and its subsystems. The notion of a device in a programming environment does not correspond directly to that of a physical device. The most notable difference is that devices on the screen are not easily described in terms of distinguishable and static inputs and outputs. The basic meaning however is fixed, stating that it is some man-made artifact that shows some behaviour that is useful. Note that the screen is not a device. The screen cannot be designed and is therefore for all practical purposes not a device, but something like a place: an environment in which the two-dimensional spatial extension of devices makes sense. The screen itself and the devices displayed on it share some parameters inherited from the concept region; Both are rectangular, observable spatial entities. Rectangleness implies that regions can be described with exactly two x-boundaries and exactly two y-boundaries; We call these their x-start, x-end, y-start and y-end. The region is subject to the constraints x-start =< x-end and y-start =< y-end. Two other parameters of regions, width and height, are derived from the boundaries and can be defined by the constraints width = x-end - x-start and height = y-end - y-start. Other derived measures, like area and aspectratio, will spring to mind but these are not actually used to classify types of regions.

Boundaries , and differences between boundaries of the same type, can be ordered on a dimension (measuring scale). All obvious ordering relations apply to boundaries (=,>,<,=<,>=). Rectangles are usually defined as polylines which are in turn defined by points, but this solution is more complex and ignores a concrete fact about screens; Screens can only display rectangles. A point, or more accurately a pixel, is a type of region subject to the constraints x-start = x-end and y-start = y-end. It thus has a width and height of 0, but would still be a rectangle on the screen. Our regions and pixels are "places" according to Hayes (e.g. in (Hobbs & Moore, 1985)), not positions. A region defines two intervals in two dimensions, (x and y) on which all thirteen primitive relations between intervals identified by Allen and Kautz (e.g. in (Hobbs & Moore, 1985)) apply for each single dimension. Note that a qualitative calculus based on these primitives does not always suffice for reasoning in two dimensions as it does in one dimension; Numbers are be assigned to boundaries in order to prevent ambiguity. A region includes its boundaries as its starting (s) and finishing (f) subintervals. The boundary interval represents a row of pixels on the screen and cannot be any further decomposed. The width and height intervals of the screen define what is commonly known as its resolution. Furthermore we have a 2.5th dimension: the order in which devices should be displayed.

Regions All regions are mereological individuals (m-individuals). The difference between screens and devices is that the screen cannot be a proper-part-of a device, while devices can be proper-part-of a screen. The screen is thus composed of devices. The binary relation s-proper-part-of is a one-to-many relation between a screen and devices. Another difference is that a device has one or more stimulus (boolean) parameters and a named parameter display-status, with the valueset {open, closed}, its value depending on whether the device is at present visible on the screen or not. At least some devices can be decomposed. To this end serves the distinction between frames that have other devices as their proper parts, and components that cannot have any parts (simple-m-individuals). The relation between a frame and its parts is modelled by the one-to-many relation f-proper-partition-of. A frame must have at least one device as its part and only frames can be directly s-proper-part-of the screen. The enumeration of the proper partitions of the frame is exhaustive (and of course disjoint) meaning that any arrangement of devices can be assembled in a frame. For now we will skip the window and continue with components.

Controllers and displays Another important differentiation is that between controllers and displays. Exactly one action acts on a controller and it will usually trigger operations through a stimulus changing the state of the Clean Windows GUI (but dummy controllers are not disallowed by this definition). Controllers have a parameter control-status. Displays achieve the function of displaying some type of information: e.g. a text or image. Controllers always display some type of information indicating the consequences of manipulating it. Most often this is text, since actions are notoriously hard to picture. The only controllers we are concerned with are buttons. Buttons add the value set {pressed, depressed} to control-status and realize the action press. Full dialog including e.g. line- and text-editors is not considered for this domain, since it is too complex and irrelevant. If, for instance, we consider a standard "open file" frame, we will see a line-editor, a list of choice item buttons displaying filenames and a button displaying "open" or something like that (and a "cancel" button that closes the frame). Clean Windows will not react to selection of a filename and will either open some type of window or do nothing except closing the frame if the "open" button is pressed. Editor components can be pressed and will become "active" wrt. keyboard events, but this event never changes the interface. The application events are irrelevant to the window behaviour in which we are interested. In the kind of applications we will design, all events have direct visible consequences on the screen.

Fixed vs variable The last differentiation we make wrt. components is that between fixed-size and variable-size components. Fixed-size-components always have a predefined width and/or height and thus constrain the assignment of x-start, x-end, y-start and y-end. Buttons are always of fixed-size wrt. to both width and height. A text-view on the other hand does not have to display the entire text. Optimality requirements may however specify a preferred or minimal size. Note that width and height more or less correspond with "duration" of a temporal interval in Allen and Kautz (in Hobbs & Moore, 1988). Fixed-size components can not be resized. The leaf component concepts, like label and icon, are not transparent to this ontology.

Application The application concept seems similar to the frame and has only been recently added to disambiguate the frame. The application is an m-individual that has a parameter name (a string) and serves as a wrapper for coherent, possibly reusable ("case knowledge") designs or classes of known systems to be used in classification problem-solving. Behaviour, state and layout of frames in relation to each other can be encapsulated in application descriptions. An application contains at least one region; this is modelled with the one-to-many relation a-proper-part-of. The difference between frames and applications is that applications are not necessarily rectangular; An application cannot be described in terms of boundaries.

Mereology extensions The relations s-proper-part-of and f-proper-partition-of imply a transitive layout relation: inside. A region that is contained by another region is inside that other region. If a component is f-proper-partition-of a frame that is s-proper-part-of the screen, the component is r-proper-part-of the screen and thus inside the screen. Insideness is transitive. The screen as well as the frame could be a-proper-part-of an application and the component would be transitively proper-part-of the application, while not being inside it. Obviously, the proper-part-of relation is irreflexive and asymmetric. The proper-part-of generalization tree is shown below in Table 3 with the appropriate domains.

proper-part-of (m-individual, m-individual) - r-proper-part-of (region, region) - - s-proper-part-of (screen, device) - - f-proper-partition-of (frame, device) - a-proper-part-of (application, region) - p-proper-part-of (process, event)

Table 3: decomposition relations in the GUI domain
Layout Layout constraints have thus far been mentioned only indirectly. The transitive and asymmetric relation inside has already been mentioned as a layout consequence of a r-proper-part-of relation between two appropriate regions. A number of other layout relations between regions share this feature of transitivity: left-of, right-of, above, below, covers, is-covered-by. The symmetric relations outside and overlaps are not transitive. All layout relations are subtypes of l-related and some l-relation always applies to any two regions. L-related is irreflexive. Layout-relations reflect obvious constraints on the defining boundaries of subject regions. Some of these constraint definitions are quite complex, like e.g. overlaps, but their meaning is clear and can be sorted out with a sheet of paper.

**Table 3:** decomposition relations in the GUI domain
proper-part-of (m-individual, m-individual) - r-proper-part-of (region, region) - - s-proper-part-of (screen, device) - - f-proper-partition-of (frame, device) - a-proper-part-of (application, region) - p-proper-part-of (process, event)

Only in the case of devices that overlap or are inside each other, a third (or 2.5th) display dimension comes into play; Some constraints on the order in which devices should be displayed must be made. If a device is r-proper-part-of another device, this device is the one that is displayed first. If two devices overlap, no clear criterion can be specified. In that case a device covers the other because it has been displayed last. Covers only adds asymmetry to the symmetric overlaps relation. The generalization hierarchy in Table 4 shows these layout-relations. All concrete (leaf) relations are asymmetric. The meaning of l-relations wrt. Allen and Kautz (in Hobbs & Moore, 1988) is added as a disjunction of applicable primitives on the x-axis and y-axis. Note that the rationale of the hierarchy is based on quantitative constraints, not these primitives. The meets (m mi) primitive has a specific meaning with respect to boundaries between intervals: m(region1, region2) implies (x-end of region1 - x-start of region2= -1).

l-related {x,y:(= < > s si d di f fi m mi o oi)} - outside {x,y:(< m mi >)} - - above {y:(< m)} - - - above-attached {y:(m)} - - below {y:(> mi)} - - - below-attached {y:(mi)} - - left-of {x:(< m)} - - - left-attached {x:(m)} - - right-of {x:(> mi)} - - - right-attached {x:(mi)} - inside {x,y:(= s d f)} - has-inside {x,y:(= si c fi)} - overlaps {x,y:(o oi)} - - covers (inv is-covered-by) {x,y:(o oi)}

Table 4: some layout relations in the GUI domain
Topology extensions Although the layout relations can be very helpful during design, we distinguish between layout relations as a descriptive aid and connections to actually define coherence between devices. Connections connect boundaries. Again the meets (m mi) primitive plays a pivotal role. Table 5 shows the available connections. Connections mediate causality and are fixed over time. B-connections are asymmetric. The difference between the values of the connected boundaries remains constant at {-1,0,1} depending on the type of connection.

**Table 4:** some layout relations in the GUI domain
l-related {x,y:(= < > s si d di f fi m mi o oi)} - outside {x,y:(< m mi >)} - - above {y:(< m)} - - - above-attached {y:(m)} - - below {y:(> mi)} - - - below-attached {y:(mi)} - - left-of {x:(< m)} - - - left-attached {x:(m)} - - right-of {x:(> mi)} - - - right-attached {x:(mi)} - inside {x,y:(= s d f)} - has-inside {x,y:(= si c fi)} - overlaps {x,y:(o oi)} - - covers (inv is-covered-by) {x,y:(o oi)}

connection (parameter, parameter) - b-connection (boundary, boundary) - - eq-b-connection (boundary, boundary) (=) - - m-b-connection (boundary, boundary) (m) - - mi-b-connection (boundary, boundary) (mi) - eq-connection (parameter, parameter) (=)

Table 5: topology in the GUI domain
Assignment example To show how a PSM should exploit this kind of domain knowledge we create an assignment problem for the layout of a window design model. The Assignment problem type is characterized by two disjoint sets of elements, where each element of one set, the demand set, must be assigned to exactly one element from the other set, the supply set, satisfying given requirements while not violating given constraints. The assignment problem type is covered by Sundin (1994) in the CommonKADS Library document. Assignment is related to parametric design/configuration. The difference between these two problem types is apparently that in the latter problem type some trivial redesign can be included, affecting the assignment sets, and that the supply set can be further carved up into disjoint sets of possible candidates or hypotheses for each demand element. In order to cast a problem into the assignment mold we must show how it involves two disjoint sets and how the solution to the problem is characterized as a set of relation tuples between elements of the two sets, where each element of one set necessarily participates in a relation tuple.

**Table 5:** topology in the GUI domain
connection (parameter, parameter) - b-connection (boundary, boundary) - - eq-b-connection (boundary, boundary) (=) - - m-b-connection (boundary, boundary) (m) - - mi-b-connection (boundary, boundary) (mi) - eq-connection (parameter, parameter) (=)

In the window ontology it may not be immediately obvious how a non-trivial assignment problem can be specified. We do however have a example of an assignment problem that is as complex or simple as we want, depending on which knowledge and assumptions we use to solve it. The configuration of a window gives us all ingredients we need to formulate an example of an assignment problem. Our standard window concept is visualized in Figure 4 below. We assume that a model of this device has been designed. It specifies that we have a window s-proper-part-of a screen that has a number of devices as its proper-partitions: an icon-button (to close), a label to display the title and a device that is no further specified (which we will call the canvas). Two out of three parts of the window are fixed-size. The concept screen and device introduce a number of parameters assigned to an integer describing a two-dimensional region: x-start, y-start, x-end, y-end. The relations between the screen and devices as well as devices and devices introduce a number of constraints on the values these parameters can take. Constraints come from the following sources:

The four devices are all inside the screen.
The three devices part-of the window are all inside the window frame.
Two of the parts put constraints on width and/or height. The icon has both width and height fixed. The label has a fixed height.

The graphical layout of the window model introduces additional constraints. The close-icon is always at the top-left of a window and both the title and close-icon are at the top of the Window. The three parts never overlap. A good window satisfies at least the following constraints in this ontology:

The devices inside the window do not overlap.
The title and icon are above the canvas.
The x-start and y-start of the icon have an eq-b-connection with the x-start and y-start of the window.
The y-start of the label has an eq-b-connection with the y-start of the window.
The y-end of the label has a eq-b-connection with the y-end of the window.
The icon is left-attached to the label (a m-b-connection).

We have a bag of parameters; the boundaries of the screen, the window, icon, title and canvas. These thus number 5*4=20 in total. We have two bags of boundaries (integer values) that can be assigned to these parameters, value-sets, and we have a number of constraints specifying that the assigned values to a device or the screen form an actual description of a region, or x_start =< x_end and y_start =< y_end for all devices and the screen. Furthermore we have the constraints already summed up above. Although this problem is very simple, we have already assembled 37 constraints on the values that the parameters can take, not counting the not overlap constraints. The other constraints already exclude the possibility of overlap of devices. A requirement may be for instance a preferred size of the canvas. A graphic representation of the spatial constraints on the devices of interest is given in Figure 4.

Figure 4: A spatial representation of the imposed order of x-boundaries and y-boundaries of devices on the screen. Ordering constraints are imaged by the relative locations of the devices. Fillpatterns show width and height constraints. The grey arrows denote connection of boundaries.

Significant reductions in the number of constraints can be achieved by taking into account transitivity of <, =<, >, >= and the possibility of unification of =, >= and =< for x-boundaries and y-boundaries. Unifying >= and =< does add strong commitments to the ontology. In some cases it is clearly correct to assume that boundaries that can be equal are indistinguishable. In other cases this assumption may prevent finding a valid solution to the problem. In most non-trivial cases, for instance, one cannot assume that the window can be "maximized" (spatially indistinguishable from the screen). Solving this problem in a generate & test approach (propose any boundaries and then test all constraints) is extremely complex. This PSM is obviously warranted by the ontology, but is extremely inefficient for this problem.

Typical assignment PSMs, such as Propose & Revise (Marcus & McDermott, 1989) and other lookalikes, e.g. (Poeck & Puppe, 1992), suggest exploiting constraints in a more productive way. If we take into account transitive relations to do static presorting, we are left with 10 x-boundaries and 10 y-boundaries ordered to increasing value and a total order of increasing values. If we exploit unification of equality operators (including width and height definitions) and the m-b-connection to define calculations this is reduced to seven key design parameters for which a value should be proposed. Note that changing equality constraints to assignment calculations introduces the requirement of no cyclic dependencies between parameters (Marcus & McDermott, 1989). This requirement is trivially satisfied by the two boundary-scales. Transitivity can be exploited to deduce incremental fixes, but for this problem this only makes sense if we do no presorting. Fixes usually contain two types of revision knowledge; redesign knowledge and dynamic sorting knowledge (in effect changing the order in which parameters or values are tried) both based on diagnostic knowledge. This type of problem does not involve dynamic sorting criteria on parameters and values. Dynamic assignment problems can however be contrived for this domain if we take behaviour and causation into account.

Behaviour In order to be able to design, plan, predict and diagnose we have to address the behaviour of devices and the way in which connections in a design mediate causation. First we will describe states and change. A state of a device is an abstraction of the parameter values and roles of a device. Sets of parameter-value tuples referring to one device can be grouped together as a state. States should always maintain integrity wrt. specified constraints on the parameter-values. Most constraints are concerned with coherent layout, one constraint is concerned with coherent display-status; The display-status of a frame cannot be closed if the display-status of any of its parts is open. A state persists until it changes to be consistent with the structure description of its device. Shoham (1988) discusses persistence and the assumptions that must be taken into account in order to make unique and consistent prediction of the consequent history of a device.

An event is a change of state of some device and the type of an event is defined by constraints on the initial and final state of the device. A derived process can be characterized as a sequence of events, or an ordering of states induced by change. For this purpose there is the relation before (<) and its inverse after (>) between events imposing a temporal scale on events. Note that often a description of process in terms of its parts will be sufficient (during interval etc.). We may, for instance, not care to know to know the sequence (or lack of it in a concurrent system) in which parts of a frame close. All interesting events happen to devices and change the values of the parameters x-start, x-end, y-start, y-end, width, height and/or display-status. Open and close are defined as a change of display-status from closed to open or open to closed, respectively. Resize changes the value of one or more of the parameters x-start, y-start, x-end and y-end; Move changes all of these while width and height remain constant. Any number of events can act on a device. This is modeled with the many-to-one relation acts-on between events and devices. A least open and close act on any device. The events discussed thus far are operations. An example description of the operation close is given below in Table 6:

operation: close description: This operation changes the state of a device from open to closed acts-on: device initial-state: final-state: display-status = open display-status = closed constraints: none

Table 6: An example event
An added complication is that the flow of causality between devices should be derivable from the structural connections between devices. This presents a special problem to actions on controllers. The flow of causality can be inferred in part for operations and the known constraints on states preserving coherence with structure description. We for instance know that the closing of the parts must accompany the closing of a frame. Similarly, resizing operations can be ordered in multiple ways to achieve a coherent layout. The consequences of acting on a controller are not constrained at all.

**Table 6:** An example event
operation: close description: This operation changes the state of a device from open to closed acts-on: device initial-state: final-state: display-status = open display-status = closed constraints: none

We will use the window model as an example for behaviour. The window has three parts that have the same display-status as the window itself. These four display-status parameters are all connected by eq-connections. If the window closes, it changes from a state in which display-status is open to one in which it is closed. Since the same change happens to the parts of the window, we have established that during this process its parts have closed. The same thing applies to other operations. If the window resizes by changing its y-start (becoming longer) and the y-start of the icon and label have an eq-b-connection with this parameter, the same change happens to the y-start of the icon and label. These however have a fixed height. To keep height constant the icon and label change their y-end too, resulting in two move operations that occurred during the resize process of the window. If the y-start of the canvas would have a m-b-connection with the icon and label, the canvas would resize. Since we do not know in which sequence these events must have happened, we assume concurrency.

The events that can occur concurrently, or events that have consistent conditions, cannot have inconsistent consequences. Resize and move as defined are able to break constraints if, for instance, they are allowed to move a device out of the screen. The only way to prevent this from occurring is giving the device nonlocal access to other parameters: the resolution boundaries of the screen. This violates the principle of locality of behaviour description (a more specific version of no-function-in-structure), unless we introduce more complex connections to a.o the screen to accommodate conditions to events. The disadvantage to this solution is that these connections cannot be visualized. This is an accurate reflection of this domain. We have to assume some underlying invisible machinery to explain things like this. Systems dynamics concepts can only give an incomplete description of behaviour in this domain: some global constraints simply cannot be factored out. Our guesses about the structure and behaviour of this underlying machinery that explains these constraints are as good as anyone elses.

The only root cause of any process is some trigger event caused by an action on a controller. The action is an unqualified input to the system. The trigger effect in this case is a stimulus. The action press acting on a controller will change the control-status of the button to pressed, which is the precondition to a trigger operation that sets a stimulus to true. The stimulus can have an eq-connection to another stimulus in another device. This stimulus in turn can be precondition to some individual event, for example the resize already described. The stimulus is a bit of a poor man's solution to this problem, similar to the notion of effort (the cause of change or flow) in physical domains (see e.g., (Borst & Akkermans, 1997)). Connections between stimuli are the main source of degrees of freedom in design.

Models

Spread out over this section we have introduced parts of a domain model of a single window. The window is a device that can be acted upon by an external system by pressing a close-button that closes the window. The window can be decomposed into an icon-button, a label and a frame. To complete the design in order to achieve the desired behaviour a connection topology has been specified that includes several b-connections between boundaries, eq-connections between display-status and the eq-connection between stimuli. The resulting model is an example of a reusable "case" for design case libraries. After that an assignment problem has to be solved: assigning numbers to the boundaries and values to parameters in accordance with the design. The resulting parametrized state suffices to predict the behaviour of the window, which is quite trivial in this case.

We have thus far focussed on the first part of the problem dependency graph in Figure 2. Both the graph and common sense suggest that we could not have done it the other way around We have tried to cater for two mainstream approaches to design: hierarchical/case-based design and arrangement of connection topologies. Nontrivial planning is possible in this domain, as opposed to, for instance, common electronic domains or the blocks domain. The problem wrt. locality of behaviour descriptions we mentioned before is not a design, planning or prediction problem. It presents a problem if we want to localize faulty behaviour in a structural entity in a design. This is the goal of several diagnosis PSMs (see e.g., (de Kleer & Williams, 1987)). This need not be a serious problem, since we for instance established that a Cover & Differentiate PSM (Duursma, 1992) can be applied to postdict a specific subset of design models (barring feedback loops). PSMs need not be universal for this world: we just want to know why they are not.

A recurring problem in constructing this ontology was the lack of a problem-specific context (or the abundance of them). For instance, for every primitive introduced the distinction between candidates and facts arises wrt. relations ( can be vs. is a variants of basically the same relation). These roles are part of what is now commonly known as a method ontology (and we could add a task and problem ontology). Problem-specific contexts introduce their own semantics as the Propose & Revise assignment example, for instance, showed for the change of the equality operator to an assignment operator. Only then the cyclic dependency problem comes into existence. The accuracy of knowledge bases may depend less on their meaning wrt. this ontology than it does on method-specific features. In a way this is an encouragement, if it means that assumptions in general have less to do with the domain than with our way of looking at it.

4 Conclusions

The major conclusion that we can draw is that the actual empirical testing exercises appear only further away than our initial expectations. Instead of a straightforward empirical cleaning-up operation using Cokace as an operational environment for CML specifications, we went into theoretical explorations which made clear that appropriate verification and validation involves a number of well controlled steps. The first and most important step involves the use of a well defined, transparent domain that can feed the PSM with required knowledge. A second step is the definition of problem situations that form the dynamic inputs to the methods. Because the indexing of the PSMs requires that their similarities and differences have to be assessed, the validation also involves a large scale comparative analysis of the results. In total, the design of the experiment shows that the apparent straightforwardness of operational validation compared to formal validation is somewhat misleading as many control activities have to be built in. One of the questions we have now is whether we can obtain the same results in a somewhat more efficient way, in particular by reducing the number of test cases that we have to generate for testing each PSM.

However, the other side of this coin is that CoCo provides a unique way to assess the relative importance of the ``interaction hypothesis'', which states that domain knowledge is never task neutral. This is indeed unique because thus far this effect has been only assessed in studies of reuse where domain knowledge was used for other purposes (tasks), e.g., (Coelho et al., 1996), or in rational reconstructions as in the Sisyphus-II project (Schreiber & Birmingham, 1996). Another side effect is that another hypothesis can be tested: that problem types, and therefore also the decompositions implied by the PSM are dependent upon one another in a fixed way, as specified by a ``suite of problem types'' (Breuker, 1994b).

One of the conclusions that this project is supposed not to yield is about the specification language, CML. Whether specified in CML or any other language the working of the PSMs should be the same. However, in passing we have seen that CML is in fact not well suited for operationalization, in particular with respect to the domain layer specifications. Not one of the operational versions of CML in fact uses the specification categories of the domain layer, and neither do the formal versions! [10] Therefore, we will probably use classical terminological representation for the domain layer, both for specification and for building the knowledge bases. As modern terminological systems allow us to specify and verify ontologies in the T-Box, it seems only a small step to construct knowledge bases on top of that using the A-Box.

References

Aben, 1994: ABEN, M. (1994). Canonical functions: CommonKADS inferences. In Breuker, J. & Van de Velde, W., (Eds.), CommonKADS Library for Expertise Modelling. Amsterdam, IOS Press.
Aben et al., 1994: ABEN, M., BALDER, J., & VAN HARMELEN, F. (1994). Support for the formalisation and validation of KADS expertise models. Deliverable DM2.6a ESPRIT Project P5248 KADS-II/M2/TR/UvA/63/1.0, University of Amsterdam.
Bauer & Karbach, 1992: BAUER, C. & KARBACH, W., (EDS.) (1992). Proceedings Second KADS User Meeting, Munich. Siemens AG.
Benjamins et al., 1996: BENJAMINS, R., DE BARROS, L. N., & VALENTE, A. (1996). Constructing planners through problem solving methods. In Gaines, B. & Musen, M., (Eds.), Proceedings of 10th Knowledge Acquisition for Knowledge-Based Systems Workshop.
Beys et al., 1996: BEYS, P., BENJAMINS, R., & VAN HEIJST, G. (1996). Remedying the reusability-usability trade-off for problem solving methods. In Gaines, B. & Mussen, M., (Eds.), Proceedings of the KAW-96, Banff, Ca.
Borst & Akkermans, 1997: BORST, P. & AKKERMANS, H. (1997). Engineering ontologies. International Journal of Human-Computer Studies, 46:365 -- 408.
Breuker, 1994a: BREUKER, J. (1994a). Components of problem solving. In Steels, L., Schreiber, G., & van de Velde, W., (Eds.), A Future for Knowledge Acquisition: proceedings of the EKAW-94, European Knowledge Acquisition Workshop, pp. 118 -- 136, Berlin. Springer Verlag.
Breuker, 1994b: BREUKER, J. (1994b). A suite of problem types. In Breuker, J. & Van de Velde, W., (Eds.), CommonKADS Library for Expertise Modelling, pp. 57--87. Amsterdam, IOS Press.
Breuker, 1997: BREUKER, J. (1997). Problems in indexing problem solving methods. In Benjamins, R. & Fensel, D., (Eds.), Proceedings of the IJCAI'97 Workshop on Problem Solving Methods.
Breuker & Van de Velde, 1994: BREUKER, J. & VAN DE VELDE, W., (EDS.) (1994). CommonKADS Library for Expertise Modelling. Amsterdam, IOS Press.
Chandrasekaran, 1987: CHANDRASEKARAN, B. (1987). Towards a functional architecture for intelligence based on generic information processing tasks. In Proceedings of the 10th International Joint Conference on Artificial Intelligence, pp. 1183--1192, Milano.
Chandrasekaran, 1990: CHANDRASEKARAN, B. (1990). Design problem solving: a task analysis. AI Magazine, pp. 59--71.
Clancey, 1985: CLANCEY, W. J. (1985). Heuristic classification. Artificial Intelligence, 27(4):289--350.
Coelho & Lapalme, 1996: COELHO, E. & LAPALME, G. (1996). Describing reusable problem solving methods with a method ontology. In Gaines, B. & Mussen, M., (Eds.), Proceedings of the KAW-96, Banff, Ca.
Coelho et al., 1996: COELHO, E., LAPALME, G., & PATEL, V. (1996). From KADS models to operational problem solving methods. In van Harmelen, F., (Ed.), Proceedings of the KEML-96.
Corby & Dieng, 1996: CORBY, O. & DIENG, R. (1996). Cokace: A Centaur-based environment for CommonKADS Conceptual Modelling Language. In Wahlster, W., (Ed.), Poceedings ECAI-96, pp. 418--422.
de Kleer et al., 1992: DE KLEER, J., MACKWORTH, A., & REITER, R. (1992). Characterizing diagnoses and systems. Artificial Intelligence, 56(2--3):197 -- 222.
de Kleer & Williams, 1987: DE KLEER, J. & WILLIAMS, B. C. (1987). Diagnosing multiple faults. Artificial Intelligence, 32:97--130.
Duursma, 1992: DUURSMA, C. (1992). Interpretation models and problem solving methods. In Neumann, B., (Ed.), Proceedings of the Tenth European Conference on Artificial Intelligence, Vienna, Austria.
Eriksson et al., 1995: ERIKSSON, H., SHAHAR, Y., TU, S., PUERTA, A., & MUSEN, M. (1995). Task modelling with reusable problem solving methods. Artificial Intelligence, 79:293--325.
Fensel & Benjamins, 1996: FENSEL, D. & BENJAMINS, R. (1996). Assumptions in model based diagnosis. In Gaines, B. & M.Musen, (Eds.), Proceedings of the 10th Banff Knowledge Acquisition for Knowledge--Based Systems Workshop, pp. 5--1 --- 5--18.
Gennari et al., 1994: GENNARI, J. H., TU, S. W., ROTENFLUH, T. E., & MUSEN, M. A. (1994). Mapping domains to methods in support of reuse. International Journal of Human-Computer Studies, 41:399--424.
Gruber et al., 1996: GRUBER, T., OLSEN, G., & RUNKEL, J. (1996). The configuration design ontologies and the VT elevator domain theory. International Journal of Human-Computer Studies, 44:569 -- 598. special issue on Sisyphus-VT.
Gruber, 1993: GRUBER, T. R. (1993). A translation approach to portable ontology specification. Knowledge Acquisition, 5(2):199 -- 220.
harmelen & Aben, 1996: HARMELEN, F. & ABEN, M. (1996). Structure preserving specification languages for knowledge-based systems. International Journal of Human-Computer Studies, 44:187--212.
Hobbs & Moore, 1985: HOBBS, J. R. & MOORE, R. C. ., (EDS.) (1985). Formal theories of the common sense world. Norwood, Ablex Publishing Company.
Laresgoiti et al., 1996: LARESGOITI, I., ANJEWIERDEN, A., BERNARAS, A., CORERA, J., SCHREIBER, A. T., & WIELINGA, B. J. (1996). Ontologies as vehicles for reuse: a mini-experiment. In Gaines, B. R. & Musen, M. A., (Eds.), Proceedings of the 10th Banff Knowledge Acquisition for Knowledge-Based Systems Workshop, Alberta, Canada, November 9-14, volume 1, pp. 30.1--30.21. SRDG Publications, University of Calgary.
Laurent, 1992: LAURENT, J. (1992). Proposals for a valid terminology in KBS validation. In Neumann, B., (Ed.), Proceedings of the ECAI-92, pp. 829--834. John Wiley & Sons.
MacGregor, 1991: MACGREGOR, R. (1991). Inside the LOOM classifier. SIGART Bulletin, 2(3):70--76.
Marcus, 1988: MARCUS, S. (1988). Automating Knowledge Acquisition for Expert Systems. Amsterdam, Kluwer.
Marcus & McDermott, 1989: MARCUS, S. & MCDERMOTT, J. (1989). SALT: A knowledge acquisition language for propose-and-revise systems. Artificial Intelligence, 39(1):1--38.
Motta et al., 1996: MOTTA, E., STUTT, A., ZDRAHAL, Z., O'HARA, K., & SHADBOLT, N. (1996). Solving VT in VITAL. International Journal of Human-Computer Studies, 44:333 -- 372.
Patil, 1988: PATIL, R. S. (1988). Artificial intelligence techniques for diagnostic reasoning in medicine. In Shobe, H. E. & AAAI, (Eds.), Exploring Artificial Intelligence: Survey Talks from the National Conferences on Artificial Intelligence, pp. 347--379. San Mateo, California, Morgan Kaufmann.
Plant & Preece, 1996: PLANT, R. & PREECE, A. (1996). Verification and Validation. International Journal of Human-Computer Studies, 44(2):123--126. Editorial special issue.
Poeck & Gappa, 1992: POECK, K. & GAPPA, U. (1992). An interpretation model for heuristic classification using the hypothesize--and--test strategy. in: [Bauer & Karbach, 1992].
Ruiz et al., 1994: RUIZ, F., VAN HARMELEN, F., ABEN, M., & VAN DE PLASSCHE, J. (1994). Evaluating a formal modelling language. In Steels, L., Schreiber, G., & van de Velde, W., (Eds.), A Future for Knowledge Acquisition, Proceedings of EKAW'94, pp. 26--45, Berlin. Springer Verlag.
S & Plant, 1996: S, M. & PLANT, R. (1996). On the validation and verification of production systems:a graph reduction approach. International Journal of Human-Computer Studies, 44:127--144.
Schreiber & Birmingham, 1996: SCHREIBER, A. T. & BIRMINGHAM, W. P. (1996). The Sisyphus-VT initiative. International Journal of Human-Computer Studies, 43(3/4):275--280. Editorial special issue.
Schreiber & Terpstra, 1996: SCHREIBER, A. T. & TERPSTRA, P. (1996). Sisyohus-VT: A CommonKADS solution. International Journal of Human-Computer Studies, 43(3/4):373--402.
Schreiber et al., 1995: SCHREIBER, A. T., WIELINGA, B. J., & JANSWEIJER, W. H. J. (1995). The KACTUS view on the 'O' word. In IJCAI Workshop on Basic Ontological Issues in Knowledge Sharing.
Schreiber et al., 1994: SCHREIBER, G., WIELINGA, B., AKKERMANS, H., VAN DE VELDE, W., & ANJEWIERDEN, A. (1994). CML: the CommonKADS conceptual modelling language. In Steels, L., Schreiber, G., & Van de Velde, W., (Eds.), A Future for Knowledge Acquisition: Proceedings of the 8th EKAW, pp. 1--25. Springer Verlag.
Shoham, 1988: SHOHAM, Y. (1988). Reasoning about Change. Cambridge, Massachusetts, MIT Press.
Simmons, 1992: SIMMONS, R. G. (1992). The role of associational and causal reasoning in problem solving. Artificial Intelligence, 53(2--3):159--207.
Steels, 1990: STEELS, L. (1990). Components of Expertise. AI Magazine, 11(2):29--49.
Stefik, 1995: STEFIK, M. (1995). Introduction to Knowledge Systems. San Francisco, CA, Morgan Kaufmann.
Sundin, 1994: SUNDIN, A. (1994). Assignment and scheduling. In Breuker, J. & de Velde, W. V., (Eds.), CommonKADS Library for Expertise Modeling, pp. 107---156. Amsterdam/Tokyo, IOS-Press/Ohmsha.
Valente & Breuker, 1996: VALENTE, A. & BREUKER, J. (1996). Towards principled core ontologies. In Gaines, B. & Musen, M., (Eds.), Proceedings of 10th Knowledge Acqusition for Knowledge-Based Systems Workshop, pp. 301--320.
Valente et al., 1994: VALENTE, A., DE VELDE, W. V., & BREUKER, J. (1994). The CommonKADS expertise modelling library. In Breuker, J. & de Velde, W. V., (Eds.), CommonKADS Library for Expertise Modeling, pp. 31---56. Amsterdam/Tokyo, IOS-Press/Ohmsha.
Van de Velde, 1994: VAN DE VELDE, W. (1994). A constructivist view on knowledge engineering. In Cohn, T., (Ed.), Proceedings of the European Conference on Artificial Intelligence, pp. 729--734. John Wiley and Sons.
van Harmelen & Balder, 1992: VAN HARMELEN, F. & BALDER, J. R. (1992). (ML)2: a formal language for KADS models of expertise. Knowledge Acquisition, 4(1). Special issue: `The KADS approach to knowledge engineering'.
Visser et al., 1997: VISSER, P., JONES, D., BENCH-CAPON, T., & SHAVE, M. (1997). An analysis of ontology mismatches;heterogeneity vs interoperability. In Proceedings of AAAI-Symposium on Ontological Engineering, pp. 164 -- 172, Stanford, CA. AAAI.
Wielinga et al., 1992: WIELINGA, B. J., SCHREIBER, A. T., & BREUKER, J. A. (1992). KADS: A modelling approach to knowledge engineering. Knowledge Acquisition, 4(1):5--54. Special issue `The KADS approach to knowledge engineering'.
Winograd, 1972: WINOGRAD, T. (1972). Understanding natural language. Cognitive Psychology, 3:1--191. reprinted as a book by Academic Press.

Notes

[1]

CoCo is a joint research project between SWI (University of Amsterdam) and INRIA at Sophia Antipolis (F). CoCo stands for `` CommonKADS Library in Cokace''.

[2]

It has never been explained in what respect these two terms differ!

[3]

In this respect it looks similar to the relationship between KIF and Ontolingua, where KIF is equivalent to FOPL and Ontologingua is KIF extended with a frame ontology. The difference is that this frame ontology is ``written'' in KIF; this is not the case for CML.

[4]

We lived for a while under the assumption that this interpreter could be replaced by a theorem prover, but this lead to some technical problems and would moreover not have changed the validity checking procedures or the knowledge base specifications.

[5]

An example of this confusing way of distinguishing ontologies from models is that in the KACTUS framework, which is also uses CML, the generic model specifications are described as separate ontologies that represent ``viewpoints'': so there are process, structural, reconfiguration etc. ontologies [Laresgoiti et al., 1996]. The confusion is even more pronounced where in KACTUS when both ontologies and ``theories'' are used to indicate viewpoints [Schreiber et al., 1995]

[6]

This strongly suggests that taxonomies of diseases shouldn't belong to a medical ontology, but to medical models. In a very principled way this is true, but as medicine, law, engineering etc. are domains of practice with a strong focus on certain types of tasks, one may assume that medical ontologies can be less task-neutral. Indeed, [Valente & Breuker, 1996] recommend a strong functional perspective in constructing the core-ontology that should index the ontologies of the varies domains in fields of practice like medicine or law. However, in the case of medicine one may find both process and taxonomic views in the reasoning [Patil, 1988].

[7]

The article is not very precise in which specific reuse problems have occurred with respect to domain knowledge and other types of tasks/methods.

[8]

In fact, the PSM assumes a type of problem (see Section 1.1) so it is somewhat pleonastic to see type of problem and PSM as independent determinants of the data-set. An actual data-set may in practice be smaller than the required one, as some PSM may be robust for (some) lacking data. We are not going to investigate this complication.

[9]

We would need at least three values for the complexity variable to see the differences in the efficiency rates, so two more test cycles are needed, which have moreover increasing complexity.

[10]

CML is under revision. The current version (2.8) certainly has improvements, but the descriptors for inference, task and PSM knowledge have changed and it is not sure whether operationalisation is still possible.

About this document ...

So you want to validate your PSMs?

The command line arguments were:
latex2html -show_section_numbers -split 0 kaw-98-valid-psm.tex.

The translation was initiated by Alexander Boer on Wed Nov 26 14:54:14 MET 1997