Simulated Expert Evaluation of Multiple Classification Ripple Down Rules

Byeong Ho Kang
School of Computing
Hoseo University
Asan, Choongnam 336-795, Korea

Phil Preston and Paul Compton>
School of Computer Science and Engineering
University of New South Wales,
Sydney NSW 2052, Australia

Email : kang@computer.org

Abstract

Ripple Down Rules (RDR) is an approach to building knowledge based systems (KBS) which allow an expert to build and maintain a KBS without the assistance of a knowledge engineer or knowledge engineering expertise. Whilst early version of RDR provides a single classification for a set of data, Multiple Classification Ripple Down Rules (MCRDR) allows multiple independent classifications but has the same ease of use of RDR. It is important to prove the soundness and performance of MCRDR, as an extension of RDR. The evaluation of knowledge acquisition (KA) method using simulated experts was proposed to overcome the KA evaluation problems. In this paper, the construction of KBS are simulated by MCRDR system and simulated experts using another KBS. These studies indicate that MCRDR performs at least as well as RDR even in a single classification domain. It seems likely that as well as dealing with multiple classifications, MCRDR will provide a basis for building systems for other problems beyond simple classification but without requiring knowledge engineering assistance.

Introduction

Most expert system developments are based on the assumption that there exists knowledge which can be extracted from an expert using some type of knowledge engineering technique. The knowledge engineering process generally focuses on how to find or extract complete knowledge from the expert and finding a suitable representation for it. Of course, this assumes that the expert is able to communicate his expertise in a way that the knowledge engineer can use. However, an expert is usually good at judging cases, but is not good at providing knowledge in abstract forms (Manago and Kodratoff 1987). Also, an expert's explanation for a particular set of data can be influenced by the situation (Clancey 1993). Following Newell's idea of a knowledge level (Newell 1982), much current knowledge acquisition (KA) research is concerned with modeling problem solving and focuses on an initial analysis of the task being undertaken, the reasoning required and a suitable domain model (Chandrasekaran 1986; Wielinga, Schreiber et al. 1992).

The Ripple Down Rule (RDR) approach similarly eschews any notion of extracting or mining the expert's knowledge. RDR grew specifically from the experience gained in maintaining an early medical expert system, the GARVAN-ES1, for a number of years (Compton, Horn et al. 1989; Compton and Jansen 1990). Observation of experts during maintenance suggests that experts never provide information on how they reach a specific judgment. Rather the expert provides a justification that their judgement is correct. The justification they provide varies with the context in which they are asked to provide it (Compton, Horn et al. 1989; Compton and Jansen 1990). PEIRS(Pathology Expert Interpret Report System), expert system used to add clinical interpretations to chemical pathology laboratory reports (Compton, Edwards et al. 1992; Edwards, Compton et al. 1993) is the major success with RDR. PEIRS went into routine use with about 200 rules with the rest of rules(up to now about 1800 rules) added while in routine use. Most importantly, all rules have been added by an expert without any knowledge engineering or programming assistance or skill. A knowledge engineer/programmer was required only for the initial data modeling.

A limitation of RDR is that only single classifications can be produced. The aim of MCRDR is to have the same ease of knowledge acquisition in a system that can handle multiple classifications. Providing multiple conclusions is a major step towards systems that can cope with a wide range of task types without the necessity for analysing or specifying the nature of the problem solving required (Compton, Kang et al. 1993). This study aims the objective evaluation for MCRDR method. Simulated experts, knowledge based systems those replace the role of human experts.

Multiple Classification Ripple Down Rules

Inference
MCRDR can be represented as an n-ary tree. MCRDR evaluates all the rules in the first level of KB. It then evaluates the rules at the next level of refinement for each rule that was satisfied and so on. The process stops when there are no more children to evaluate or when none of these rules can be satisfied by the case in hand. It thus ends up with multiple paths, with each path representing a particular refinement sequence and hence multiple conclusions. The structure of an MCRDR knowledge base can be drawn as an n-ary tree with each node representing a rule. Fig 1 shows such a structure and also shows the inference for a particular case.

Knowledge Acquisition
When a case has been classified incorrectly or is missing a classification, knowledge acquisition is required and can be divided into three parts.

First, the system acquires the correct classifications from the expert.
Second, the system decides on the new rules' location.
Thirdly, the system acquires new rules from the expert and adds them to correct the knowledge base.

It is likely that experts may find the system more natural if the order of steps two and three are reversed, thereby better hiding the implicit knowledge engineering that is going on. However, the order is not crucial in terms of the algorithm.

Acquiring New Classifications
Acquiring new classifications is trivial, the expert simply needs to state them. For example if the system produces classifications class 2, class 5, class 6 for a given problem, the expert may decide that class 6 does not need to be changed but class 2 and class 5 should be deleted and class 7 and class 9 added.

Locating Rules
The system should find the locations for the new rules that will provide these classifications. The system needs at least two rules for class 7 and class 9. If new classification is the refinement of a wrong one, the system attaches new rule to one which produces the wrong conclusion. In otherwise, the system adds new rule to the independent place from the current conclusions in the tree. As well as attempting to decide whether a classification is best seen as a refinement or an independent classification, we note that in some ways it does not matter - both are workable solutions for any classification. After the system decides all new rule locations for new conclusions, it attaches a stopping rule (which has null conclusion) to every rule producing wrong conclusions but not refined by the new rule yet. Stopping rules play a major role in MCRDR in preventing wrong classifications being given for a case.

Fig 1. The highlighted boxes represent rules that are satisfied for the case {a,c,d,e,f,g,h,k}. As the result of the inference, 4 different rule paths from the root are proposed. Therefore, Class 2, 5, 6 are presented as the conclusion of the test case since Rule 3, 5, 6, 10 are the last rule in each path and Rule 10 and 5 have a same conclusion Class 5.

The inference process can be understood in terms of capturing 'paths', as shown below in Fig 1. When paths are produced there are a number of questions about whether the path produces a classification, whether the classification is redundant because it is produced elsewhere, etc.

Acquiring Rule Conditions - Rule Validation
Verification and validation are concerned with ensuring the KBS system performs as it is meant to. Verification research is normally concerned with ensuring the internal consistency of a knowledge base. The normal approach in verification is to attempt to reduce the KB to pathways from data to conclusions and then look at the relationships between these pathways, for example, the data they use and the intermediate conclusion they establish (Preece, Shinghal et al. 1992). Validation in terms of maintenance or incremental acquisition is concerned with testing whether other cases previously correctly classified will be misclassified by a new rule, as well as ensuring the new rule covers the new case. We are mainly interested in validation, in particular in validating a KBS by testing it on cases.

A standard technique is to use a database (Buchanan, Barstow et al. 1983) of standard cases. In this situation one depends on the cases being representative of the cases the system is meant to cover. With RDR, a case is associated with a rule because the rule is added to deal with that particular case. A new rule must distinguish between the case that caused its creation and the case associated with the rule that gave the previous incorrect classification. With MCRDR, a number of cases can reach a new rule and the higher in the tree the more cases can reach the rule. The new rule should distinguish between the new case and all of these cornerstone cases. That is, MCRDR has multiple cornerstone cases for a rule, compared to RDR where there is a one per rule.

A rule at some level can be hit by all the cases associated with its siblings at the same level and their children lower in the system. The rule has to be made sufficiently specific so that none of these other cases satisfy the rule. However, it does not matter if other cases which include the same classification reach this particular rule. If a rule is added at a level below the top level, only cases which satisfy the parent rule above need to be considered as cornerstone cases. Note that as the system develops, cases may arise which correctly satisfy a rule, but may be added to the system because a rule is needed elsewhere to add a further classification. Such a case will become a cornerstone case for new rules below the rule it satisfies and for which the classification is correct. As the tree develops, the rules lower down will naturally have less cornerstone cases associated with them.

The aim then is make a new rule sufficiently precise so that it satisfies only the case it is being added for and no other stored cases, except that it does not matter if it happens to satisfy cases which include the same classification. The algorithm for selecting conditions to make the rule sufficiently precise is very simple and some discussion is needed why a more sophisticated approach was not chosen. Consider a new case A and two cornerstone cases B and C. In creating a new rule, one may imagine that an expert should choose at least one of the conditions from

(Case A - (Case B ( Case C))
or
negated conditions from the ((Case B ( Case C) - Case A).

However, as seen in Fig 2, these may be empty - leading to the situation where no rule conditions can be found. Alternatively the difference list may contain only trivial conditions that are irrelevant. In other words there are no common conditions that distinguish the presented case from all the cornerstone cases, but a number of different conditions distinguish different cases and these conditions must all be included in the new rules.

A

B

C

Difference list between A and (B and C)

 

A

B

C

Difference list between

A and

(B and C)

Difference list between

A and B

Difference list between

A and B

a

b

c

 

 

f

 

 

c

d

e

f

 

b

c

d

 

 

g

a

 

 

Not d

a

b

c

d

 

f

a

 

c

d

e

f

 

b

c

d

 

 

g

 

 

 

 

 

b

 

 

not e

a

 

 

 

 

f

not g

Fig 2. Left hand side table has some conditions in the difference list between A and (B and C). However, this is not always true and it is shown in the table in right hand side.

The algorithm we use is as follows. The expert is asked to select from a difference list between the presented case and one of the cornerstone cases in the list of cornerstone cases to be considered. The system then tests all cornerstone cases in the list against the conditions selected and deletes cornerstone cases from the list that do not satisfy the condition selected. The expert is then asked to choose conditions from a difference list between the current case and one of remaining cornerstone cases in the list. The conditions selected are added as a conjunction to the rule. The system repeats this process until there is no cornerstone case in the list which satisfies the rule. A crucial question for the following evaluation is whether this process requires too many cycles of adding conditions to rules.

After the system adds a rule with the selected conditions, it tests the remaining cornerstone cases associated with the parent rule and any cases which can satisfy the new rule are saved as a cornerstone case of the new rule. Note again that it is permissible for cases which include the classification given by the rule to satisfy the rule, and the saved cornerstone cases include such cases.

Finally, the new case is added to the cornerstone case database. The lists of cornerstone cases for the other rules correctly satisfied by the case (i.e. giving a correct classification for the case) are also updated to include the new case. The system is now ready to run another case and if the classifications provided are incorrect, for more knowledge acquisition.

It should be noted that MCRDR systems may be developed for a whole domain or incrementally, a sub-domain at a time, to avoid demands on the expert. That is, cases which have more than one classification may only be given one classification initially. In dealing with sub-domains incrementally, the algorithm is the same except for the extra requirement of consulting the expert as to whether the new classification may apply to the each of the cornerstone cases which can satisfy the rule. However, the feature remains, that as the rule becomes more precise to exclude a specific case, the other cases that are also excluded are not considered further.

Evaluation

3 different data sets, Chess, Garvan-ES1, Tic-Tac-Toe, from Irvine Machine Learning Repository are used for evaluations. Simulated experts are created by a machine learning method (INDUCT/RDR) (Gaines and Compton 1992) using these data sets.

Experimental Method
An MCRDR KBS is built by correcting errors, by adding new rules for cases which have not been given the correct classifications. To do this the expert selects relevant conditions from the difference lists for that case. The method used here is identical except that any expertise used in selecting important conditions from the difference list is provided from the rule trace from another KBS processing the same case. It should not be expected that the simulation will perform better than a real expert. The best that could be achieved is the performance of the KBS underlying the simulated expert built by machine learning. This itself is problematic as a number of compromises have to be made in producing expertise form the simulated expert. The following steps are required:

Preparation
Collect a set of cases, and produce a knowledge base using machine learning. This becomes the basis for the Simulated Expert (described more fully later).
Create nine randomly ordered versions of each data set. All experimental studies are done on each of the nine versions of the various sets.
Start MCRDR system with an empty knowledge base.

Processing
Step 1     get the next case in the database
Step 2     ask the (simulated) expert for a conclusion
Step 3     ask the expert system under construction for a conclusion
Step 4     if they agree, go back to Step 1
Step 5     if they disagree, make new (valid) rules and go back to 1.

Step 5 is the crux. The new rule needs to be constructed and located in the KB. For an MCRDR simulation, 'Step 5' consists of:

Step 5.1

run the case against Induct/RDR KBS and produce a rule trace. This is essentially the justification for the conclusion.
Step 5.2 run the case on the developing MCRDR system and decide all possible new rule locations. In this study, all rules to produce a new conclusion will be added under the root node and stopping rules are added under the all rules which produce the wrong conclusions. A major question in MCRDR is how much the load for the expert increases in knowledge acquisition since it handles multiple cornerstone cases for every rule. Adding a rule at the top in a single classification domain will lead to a worst case situation since the case needs to be differentiated from all cornerstone cases when the rule is added at the top.
Step 5.3 Get conditions for all new rules (Refer to the knowledge acquisition section in Chapter 4 MCRDR). The simulated expert will do this rather than the human expert. The level of expertise of the simulated expert will be varied in different studies.

The scenario above shows how both RDR and MCRDR are normally used. For evaluation, the expert is replaced by a simulated expert.

Analysis
All studies were carried out using the last 25% of the data set as unseen test cases. The size and performance of the various KBS are recorded after the various percentages of the data set up to 75% have been seen by the system. This means that the case is run on the KBS and if the wrong conclusion is reached a rule is added. Note that the curves for Induct/RDR are also based on the same increasing training set, but for Induct all previous cases are used in the training set at each stage. In contrast when Induct is used for the simulated expert the whole data set is used. Statistical analyses have not been performed, the differences are discussed in qualitative terms only.

The Expertise of the Simulated Expert
The human expert using an MCRDR system selects conditions from the difference list until no cornerstone cases satisfy the rule that is being developed. The simulated expert is simply the mechanism for similarly selecting conditions from the difference list to go into a new rule. Note that if no rule is satisfied the difference list includes all the features of the present case. The case is run on a KBS developed using Induct/RDR on the entire data set. The simulated expert then uses this rule trace to guide selection of conditions from the difference list. The level of expertise of simulated expert can be controlled by changing the number of conditions from the rule trace selected. Note that increasing the number of conditions (with a maximum of all the conditions in the rule trace) does not mean the best selection since it may be too specific for the case and may not cover other cases.

Also, note that sometimes it is desirable not only to use the rule trace of the input case but also the negations of conditions in the rule trace of the cornerstone case for the rule that gave the wrong classification. Induct/RDR first finds the conclusion which occurs most frequently and uses it as a default conclusion, so that there will be no rule trace for cases with this classification. If there are no conditions in the rule trace, the simulated expert tries to use the cornerstone case of the corrected rule. If this case's rule trace is not empty, the system can use the negated conditions of this rule trace since these conditions may be better than random selection from the difference list.

In the MCRDR evaluation design, 4 different level of expertise are used. The first level ('Total') uses all the conditions in the Induct rule trace. The second level uses only one condition from the intersection of the Induct rule trace and the difference list. This is 'Partial' expertise. As a minor refinement, since the order in which Induct selects conditions may effect the outcome, two extremes of selecting the single condition from the top or bottom of the rule trace were used. For both level 'Total' and 'Partial' expertise, when there is no condition in the intersection, the MCRDR system will choose a condition from the difference list randomly. Note that at this stage there is a difference between the RDR and MCRDR evaluation systems. The final level is 'None'. This does not use the Induct rule trace at all and selects a condition from the difference list randomly. RDR data for this case is not included as it is of little interest, as such an expert has no expertise.

Result

The x-axis in all graphs represent different sizes for the training sets. For RDR and MCRDR the training set represents the cases seen and classified by the system, but only the cases for which an incorrect classification was made were used for adding a new rule. For Induct, all the previous cases are used as training data, so that 20% means the first 20% of the cases were used as a machine learning training data.

The shaded portion of graphs show the range of values observed for the nine randomized data sets. The points at the edge of the shaded area are not from a single data set but are extreme values from the nine randomized sets for each training set size. Graphs from different levels of expertise frequently overlap. For clarity the graphs include Induct/RDR data for a single randomization only. As was shown previously, there is little scatter in the Induct/RDR results so that the full data set is not necessary to use Induct /RDR as the gold standard here (Compton, Preston et al. 1995).

Error on Test Cases
Fig 3. show the error rates of various MCRDR systems as they develop. The error data are given by testing the cases in the 25% test set. Note that the starting error rate is 48.8% for Chess as the default classification is correct for 52.2% of cases, 22.7% for Garvan-ES1 as the null classification is correct for 77.3% of cases, and is 37.4% for Tic-Tac-Toe as the default classification is correct for 62.6% of cases.

Fig 3. These show error(%) on test cases. These represent the ranges from 9 randomized data sets for various levels of expertise.

In the Chess domain, all levels of expertise in MCRDR except 'None' show better performance than Induct/RDR and RDR during the early stages. It should be noted that Induct has a fairly small training set at this stage. Secondly perhaps the validation against a number of cases in RDR better at producing accurate rules than adding rules in context. Induct/RDR catches up at about a 40% training set but it does not show a clearly better performance that MCRDR. The variance of the output for RDR at all levels of expertise seems greater than MCRDR, and this seems to be a trend throughout nearly all the data. This perhaps again suggests that using multiple cornerstone cases produces a more robust system less effected by the order in which cases are seen by the system.

In the Garvan-ES1 domain, MCRDR seems to show higher errors in Partial I (bottom) throughout training than RDR except that it is perhaps superior at the start but for the other two methods, MCRDR performs similarly or perhaps better. In RDR, Partial I (bottom) performs better than Partial II (Top). As noted above, the first 19 out of 32 conditions in the case are translated from 'Dr's Comment' and 'Source' etc. and these will mostly be FALSE. Therefore, when conditions are chosen from the top of the difference list, it may be not relevant conditions for the case. Interestingly the bias in selecting conditions in 'Partial I (Bottom)' which should show up mainly when selecting conditions not from the rule trace or difference list but only from the case itself is sufficient to enable Partial I to outperform Total. This again suggests the comparative weakness of the simulated experts against true human expertise.

The key feature of all theses results is that apart from the case of using no expertise ('None'), the various MCRDR experts are comparable to Induct/RDR in the errors observed. However, as might be expected Induct/RDR marginally outperforms MCRDR after large numbers of cases are seen. Note, that for Induct/RDR as well as the other methods, the error rate continues to fall as more cases are added to the training set. As shown by Catlett, this is a common phenomenon with induction applied to very large training sets (Catlett 1991). Further, MCRDR outperforms or is comparable to RDR despite the fact that these are single classification domains. Since RDR has been shown to work for real situations involving human experts (PEIRS), it seems reasonable to conclude that MCRDR could be equally well used in such domains. Since PEIRS is actually a multiple classification domain, but where sub-domains have been selected to minimise multiple-classification problems, we would expect MCRDR to be more generally useful and powerful. Finally, it is to be expected that a real expert would outperform the simulated experts here because of the problems noted above.

Error on Seen Cases
Fig 4. shows how incremental validation and verification can effect the earlier performance of the knowledge base. These figures show the results of testing all previously seen cases against the developing knowledge base. These are all cases which were correctly classified previously or for which rules were specifically introduced to provide correct classifications. As the knowledge base develops this error is always well below the error rate on unseen cases. The difference between test cases and previously seen cases continues throughout training. Of course there are no errors on seen cases for Induct/RDR, as all seen cases are used for training.

These results on seen cases for MCRDR are of considerable importance. With RDR, rules are hidden at the bottom of a path so that comparatively few cases reach a newly added rule. For MCRDR, rules may be added anywhere and in this study were deliberately added at the top, so that any likelihood of misclassifying cases that were previously classified correctly would may be exacerbated. However, these results show that this is not a problem and in fact MCRDR seems to be slightly better than RDR at same level of expertise except 'Partial I (Bottom) in Garvan-ES1, where RDR has a distinct advantage. This implies that rules can be well validated by cornerstone cases and multiple cornerstone cases provide excellent validation.

Fig 4. These show error(%) on seen cases. These represent the ranges from 9 randomized data sets for various levels of expertise..

How often does knowledge acquisition occur?
Although the error rates for all but the stupid expert are reasonable, the critical question is whether these results are achieved at the cost of increased knowledge acquisition. Fig 5 shows the amount of knowledge acquisition required to achieve the error rates on test cases and seen cases. Note that the Y axis indicates the number of knowledge acquisition incidents or cases that had to be dealt with as being misclassified. For any given case more than one rule may be added. In contrast in the RDR system each case is dealt with by a single rule so that the number of cases seen is the same as the number of rules.

Fig 5. These show knowledge acquisition incidents. These represent the ranges from 9 randomized data sets for various levels of expertise.
Note that the performance of the MCRDR systems is at least as good as the RDR systems except for 'Partial I (Bottom)' in Garvan-ES1, but again this is a special case. In other domains, MCRDR always outperform RDR at the same level of expertise. It does not seem likely that this is due to the minor differences in the simulated expertise algorithms. Partial algorithms are the same if there is an Induct trace, but if there is not an Induct trace MCRDR chooses randomly from the difference list or case while RDR chooses from the top or bottom of the difference list or case. This seems unlikely to give MCRDR any particular advantage. We can note that even in the case of Garvan-ES1 where Partial I (Bottom) has a clear advantage for RDR, even outperforming Total (RDR), the number of knowledge acquisition occurrences in MCRDR are still less than RDR. The much more likely explanation is that MCRDR is better at ensuring rules which do not get hidden away in particular contexts requiring their repetition elsewhere. Although the size of the KB is always smaller for Induct/RDR, it should be noted that Induct/RDR is an exceptional gold standard. It would probably be generally assumed that to produce a KB as small as C4.5 is a reasonable result. Earlier simulation studies showed RDR knowledge bases comparable in size to C4.5, but that the Induct/RDR KBs were about half the size (Compton, Preston et al. 1995).

Fig 6. These represent the average from 9 randomized data sets for various levels of expertise. Each data point is the average of the cases seen per rule since the last data point.

Nmber of Cases Seen by the Expert in Adding a Rule
Another crucial consideration with MCRDR is the complexity of the KA task, ie. the number of difference lists the expert must select from to make a sufficiently precise rule. However, rules which are not in the top level will not have this problem since the number of cornerstone cases dramatically reduces as the length of the rule chain increases. Again a 'worst case' evaluation is adopted with new rules added only at the top level. Stopping rules will therefore all be second level rules. The results are shown in Fig 6. If we exclude the stupid expert as not being representative of a human expert, then on average the expert only has to see 2-4 difference lists per case for Chess, 2-6 for Garvan-ES1 and 3-5 for Tic-Tac-Toe. The number of cases to be dealt with increases as the knowledge base increases in size and more cornerstone cases are stored (Fig 6). Note that these figures are again worst cases because the graphs cover only rules added of the top level. It can be seen with 'Total' expertise that the worst case when the knowledge base is near completion is an average of 3 cases per rule in Chess, 3-4 cases per rule for Garvan-ES1 and 4 cases for Tic-Tac-Toe, and with Partial I and II an average of 4-5 cases for Chess, 7 for Garvan-ES1 and 5-6 for Tic-Tac-Toe. Also note that the number increases rapidly in initial stage followed by a gradual increase during most of the development. This is particularly noticeable for the 'Total' expert for the Garvan-ES1 data with the largest number of cases.

The above data is for averages across the nine randomised data sets. If we consider the worst case (maximum number of difference lists seen) of the worst situation ('None'), the number of difference lists is 12 in Chess, 19 in Garvan-ES1 and 10 in Tic-Tac-Toe). It should be noted that this is a most extreme situation. The simulated expert selects a single condition randomly from the difference list, no expertise at all is used. This is the equivalent of getting a monkey to select conditions from the difference list (in between writing poetry by randomly hitting typewriter keys).

It can be noted again that the range of results for the nine data orderings were less for MCRDR than RDR suggesting MCRDR with its multiple cornerstone case validation and rules not being hidden in local contexts, is less susceptible to the order in which cases are processed.

Conclusion

These results demonstrate that a manually built MCRDR expert system performs similarly to an RDR system and an inductively built expert system. The error rate versus number of training cases for induction or number of cases evaluated by the manual MCRDR system are very similar. The manual MCRDR error rate is slightly better than the inductive error rate. These results are not unexpected, in fact we would expect a human expert to do better than the synthetic expert or induction for small training sets.

We anticipate better results with a human expert as opposed to simulated experts. With RDR the actual sequence of rules that fail in a pathway is critical as the rule above a rule in RDR removes cases before they reach the later rule allowing the later rule to be quite general. We only used satisfied rules in the Induct rule pathways. Further, it was frequently found that there was no intersection between the difference list and the Induct rule trace so that a condition had to be randomly selected from the difference list. In this study this is a major problem with stopping rules. Because all pathways are explored with MCRDR we expect repetition to be less of a problem with MCRDR than RDR so less knowledge acquisition incidents are required. The only case where this shows up clearly in this study is with the moderate expert.

Regardless of whether or not MCRDR will produce a more compact KB than RDR, this study clearly demonstrates that it does not greatly increase the knowledge acquisition required in a single classification domain. This implies that in a multiple classification domain it will provide a viable solution for the incremental development of KBS.

We propose to use the same experimental design on other data sets used in the machine learning literature, to confirm that these results are domain independent. The advantage of the current data set is that it is taken from a real domain. Building a system by taking sequential cases and testing it on cases that occur after the system is assumed complete (here at 15,000 cases) is exactly the scenario that occurs in the real world where one expects the system to apply to future cases. Although the test approach we have used is entirely consistent with a real world system, we propose to use some of the techniques in machine learning for separating test and training sets to provide further comparative data.The conclusion from this work is that the effort that is put into trying to organise knowledge into an optimal model is often unnecessary. It is perfectly adequate merely to keep patching errors with local corrections. Not only is this practical, but the task does not require a knowledge engineer or knowledge engineering skills, thus transforming the practical possibilities of KBS. Despite the efforts of AI to develop methods which will give optimal models we have suggested that testing and fixing is perhaps a an easier way to go than investing the effort demanded in trying to develop the "right" model (Menzies and Compton 1994).

The critical question for the RDR approach has been whether a local patch approach can be found for tasks other than single classification. Despite the greater complexity of MCRDR compared to RDR, the multiple classification task does not seem much different from the single classification task. However we have argued that a multiple classification system provides the basis for dealing with a range of problem types (Compton, Kang et al. 1993) such as configuration, scheduling and related problems. The full development of such a system is a long way off and some amendments are necessary to extend. The other paper in this workshop demonstrates its success in configuration task and possibility.

References

Buchanan, B., D. Barstow, et al. (1983). Constructing an Expert System. Building expert systems Eds. F. Hayes-Roth, D. Waterman and D. Lenat. Reading Massachusetts, Addison Wesley. 127-67.

Catlett, J. (1992). Ripple-Down-Rules as a mediating representation in interactive induction. Proceedings of the Second Japanese Knowlege Acquisition for Knowledge-Based Systems Workshop, Kobe, Japan,

Chandrasekaran, B. (1986). Generic Tasks in Knowledge Based Reasoning: high-Level Building Blocks for Expert System Design. IEEE (FALL): 23-30.

Ramadan, Z., Compton, et al (1997). From Multiple Classification Ripple Down Rules to Configuration Ripple Down Rules. Australian Knowledge Acquisition Workshop in AI¨97.

Clancey, W. J. (1993). Neuropsychological Interpertation to Vera and Simon. Cognitive Science 17(1): 87-116.

Compton, P., Edwards, G., Srinivasan, A., Malor, R., Preston, P., Kang, B. and Lazarus, L. (1992). Ripple down rules: turning knowledge acquisition into knowledge maintenance. Artificial Intelligence in Medicine 4: 47-59.

Compton, P., K. Horn, et al. (1989). Maintaining an Expert System. Application of Expert Systems Ed. J. R. Quinlan. London, Addison Wesley. 366-385.

Compton, P. and R. Jansen (1990). Philosophical Basis for Knowledge Acquisition. Knowledge acquisition 2: 241-257.

Compton, P., B. H. Kang, et al. (1993). Knowledge Acquisition Without Analysis. Knowledge Acquisition for Knowledge Based Systems, Lectures Notes in AI (723) Eds. G. Boy and B. Gaines. Berlin, Springer Verlag. 278-299.

Compton, P., P. Preston, et al. (1994). Local patching produces compact knoweldge bases. A Future for Knowledge Acquisition Eds. L. Steels, G. Schreiber and W. V. d. Velde. Berlin, German, Springer-Verlag. 104-117.

Edwards, G., Compton, P., Malor, R., Srinivasan, A. and Lazarus, L. (1993). PEIRS: a pathologist maintained expert system for the interpretation of chemical pathology reports. Pathology 25: 27-34.

Gaines, B. (1989). Knowledge acquisition: the continuum linking machine learning and expertise transfer. The Third European Workshop on Knowledge Acquisition for Knowledge-Based Systems, Paris, 90-101

Gaines, B. R. and P. J. Compton (1992). Induction of Ripple Down Rules. AI '92. Proceedings of the 5th Australian Joint Conference on Artificial Intelligence, Hobart, Australia, World Scientific, Singapore, 349-354

Linster, M. (1991). Sisyphus 91 : modelling the office-assignment domain. Sisyphus'91 : Models of Problem Solving.

Manago, M. V. and Y. Kodratoff (1987). Noise and Knowledge Acquisition. The Tenth International Joint Conference on Artificial Intelligence, Milano, Italy, Morgan Kaufmann, 1:348-354

Menzies, T. and P. Compton (1994). Knowledge Acquisition for Performance Systems, or: When can "test" replace "tasks"?, The 8th Knowledge Acquisition for Knowledge Based Systems Workshop, Banff Canada, 34.1-34.20

Newell, A. (1982). The Knowledge Level. Artificial Intelligence 18: 87 - 127.

Preece, A. D., R. Shinghal, et al. (1992). Principles and practice in verifying rule-based systems. The Knowledge Engineering Review 7(2): 115 - 141.

Wielinga, B. J., A. T. Schreiber, et al. (1992). KADS: a Modeling Approach to Knowledge Engineering. Knowledge Acquisition 4: 5-54.