The inability of experts to articulate the knowledge required to solve a problem is, arguably, the greatest challenge to building an expert system. The problem is made worse in situations where the response of the expert must be so rapid that there is not even a chance of a plausible post hoc reconstruction of the decision processes involved. For that reason, construction of the knowledge base by example is the only approach available. Examples can be used in two ways. They may be used as input to an induction program whose task is to find an abstraction of a control strategy from the data. Examples may also be used to induce the expert to discern difference between cases, thereby allowing the knowledge acquisition system to construct rules semi-automatically. The work presented in this paper demonstrates the feasibility of both approaches. In particular, it shows that the RDR methodology can be extended to domains where the expertise involved is necessarily subcognitive. This is demonstrated by the application of a combination of ripple-down rules and machine learning to the task of acquiring piloting skills for an aircraft in a flight simulator.
Learning symbolic rules by observing a human operator is also called behavioural cloning (Michie, Bain & Hayes-Michie, 1990; Sammut et al, 1992; Urbancic & Bratko, 1994) and has two stages. First, an operator skilled in a task, is asked to control the system. During his or her performance, the state of the system along with the operator's action, are logged to a file. In the next stage, a learning program uses the logged information to construct control rules .
Sammut et al, (1992) demonstrated a particularly difficult application of behavioural cloning in learning to control an aircraft in a flight simulator. They used a flight simulator on a Silicon Graphic computer and restricted the task to learning to fly a Cessna 150 on a predefined flight path. Three human pilots were asked to fly the aircraft thirty times through the same flight plan. The accumulated file for the thirty flights for each pilot was used to create ca controller for each of the four controls available, namely, elevators, ailerons, thrust, and flaps. Quinlan's C4.5 (Quinlan, 1993) was used as the induction engine.
Although able to successfully construct an autopilot capable of flying
the aircraft through the required flight plan, the original learning to
fly experiments encountered a number of problems:
This paper describes just such an approach. We show that ripple-down rules (RDR) can be used to construct a controller for a complex dynamic system, such as an aircraft. This is significant in that RDR have mostly been demonstrated for classification tasks. We also show that relatively compact and understandable rules can be built without an excessive number of examples being represented by the trainer and these rules can be quite robust. Another feature of this system is that it allows machine learning methods to be mixed with RDR's, drawing on the advantages of each approach.
In the following sections, we briefly describe the problem domain and
the nature of the data available. We give an overview of the knowledge
acquisition system and the give details of the machine learning and RDR
tools used. We then report the results of experiment and conclude with
some indications of future research directions.
The source code of the flight simulator has been modified so that,
The same inability to introspect motivated the development of ripple-down rules. However, the RDR methodology does not seek to entirely exclude the human from the process of building rules to emulate the desired skill. Rather, the human is asked to critique the performance of the rules and point out differences between cases when the program fails. In the case of controlling a dynamic system, RDR's make use of the fact that while the human operator is not aware of the mechanisms involved in a low-level skill, he or she may be aware of goals and sub-goals. This knowledge allows the operator to reason about cases.
Dynamic Ripple Down Rules (DRDR) implement the RDR method for controlling dynamic systems (Shiraz, 1997). The basic algorithm of DRDR is the same as RDR. In actual implementation, DRDR stores the knowledge base as a binary tree with a rule at each node. Each node has a rule condition and conclusion and two branches depending on whether the rule is satisfied or not by the data being considered [Figure 1].
A ripple-down rule as the general form:
The numbers, 1, 2 and 3, in this example, refer to cases. RDR's a re built incrementally. Each time the RDR fails to produce the correct answer, a new rule, along with the case that caused its creation, are added to the RDR, Suppose an RDR fails. We compare the new case with the case associated with the last fired rule. The differences give rise to the conditions that will be placed in a new rule that will distinguish the new case from the old. If a rule's conditions were satisfied by a case when they should not have been, a new rule is added as an exception. When a rule's conditions were not satisfied by a new case when they should have been, a new rules is added as a n alternative (else).
The initial RDR usually has the form:
That is, in the absence of any other information, the RDR recommends taking some default action. For example, in a control application it may be to assume everything is normal and not make any changes. If a condition succeeds when it should not, then an exception is added (i.e. a nested if-statement). Thus the initial condition is always satisfied, so that when the do nothing action is inappropriate, an exception is added.
A ripple-down rule can also be viewed graphically as a binary tree in which nodes are condition/conclusion pairs and the exceptions and alternatives follow the branches of the tree, as shown in figure 1. The figure also shows a typical case associated with a rule. Each case contains values for all of the variables listed in table 1.
Although the basic RDR methodology has been retained, we require additional
features in order to apply ripple-down rules in this domain. These extensions
form the basis of Dynamic RDR's (DRDR), and are described next,
DRDR handles the four knowledge bases and outputs multiple conclusions
(one from each of the four RDR's). Whenever the pilot pauses the flight
to investigate the current rules, he or she is able to see DRDR's conclusion
for all action variables (Figure 2).
|
|
|
|
Before describing the knowledge acquisition system, as a whole, we first describe the learning algorithm.
To automatically construct rules from the pilot's behaviour, we use LDRDR (Shiraz, 1997). The basic algorithm for LDRDR is to search the data logged from a flight, record by record, and find those attributes that cause an action, then create rules based on those attributes and their qualitative states. The assumption is that the qualitative state of variables changes when an action is performed.
The main reason for introducing LDRDR (Shiraz, 1997) instead of using
one of the existing machine learning programs was the need to deal with
sequential data. In addition, it was necessary to have a program that was
compatible with DRDR, with the ability to learn incrementally. Other machine
learning algorithms that construct ripple-down rules, for example, Induct/RDR
(Gaines & Compton, 1992), are batch algorithms and are not designed
to deal with sequential data. LDRDR is specifically designed to work with
sequential data. It takes the current knowledge base, the behavioural traces,
and a priority list as its input. It creates rules that are added to the
knowledge base to deal with cases not previously handled correctly. The
LDRDR algorithm constructs a controller as follows:
The logged data usually contains information from different stages. This data is usually noisy and contains many redundant records. Pre-processing prepares the logs for the learning program. This includes segmenting the logged data, discretising control values, eliminating spurious values and creating separate inputs for each control action.
After pre-processing, each of the data files and the existing RDR's
for a particular control action are to the LDRDR algorithm. LDRDR also
uses a priority list, which we discuss later. The output of LDRDR is an
extension of the original RDR to cover cases in the input data that were
not covered by the original RDR. The new RDR is converted into C if-statements
by recursively traversing the RDR and creating one if-statement for each
rule. An if-statement's conditions are the conjunctions of all true conditions
in the RDR from the root to the rule. Rules will be executed sequentially.
If any of the rules is executed, control will jump to the end of the list
to guarantee a single conclusion from the tree. If none of the rules fires,
the last rule, which is the default rule, will be executed.
There is always a delay between a stimulus and a response. Ideally we would like to record the pilot's action and the event that actually provoked the action. The problem is how can we know when that was? This is not a trivial issue, because human reaction time is not a constant, varying from person to person. Moreover, the reaction time of a person varies depending on the task. If the task is performed frequently and becomes familiar, the response time is shorter than for situations which are new for the human. In addition to these problems, pilots usually anticipate the future location of the aircraft and prepare a response for that state. In this experiment, following Sammut, et al (1992), we decided to use a response time of one second in each stage of the flight. We have not attempted to model the pilot's predictive behaviour.
As with DRDR, different sets of rules are created for each stage. In each stage, different knowledge bases are created for each action. Therefore, twenty-eight knowledge bases have to be created for a complete flight (4 control action times 7 flight stages).
The first stage of pre-processing is to segment the data into the stages where they were recorded. To make the segmentation of logged data easy, a new attribute has been added at the beginning of the recorded data to show the record's stage number. Based on this attribute, the filtering program segments the recorded data.
The Cessna aircraft's control system consists of four moveable control
surfaces (elevator, flaps, ailerons, and rudder) plus the throttle. The
values of the control variables are continuous. However, the learning algorithm
can only deal with discrete class values. To solve this problem, a pre-processor
is used to sub-divide the variable settings into intervals that can be
given discrete labels. The range for each partition is chosen by analysing
the frequency of occurrence of the values. The method of discretisation
follows Sammut et al. (1992).
The algorithm extends RDR as follows:
Inputs: current knowledge base; behavioural traces; a priority list .
for each attribute in the priority list:
2. Create a test for the attribute. The test is based on the attribute's current value and its previous qualitative state. The test always has the form:
4. Attributes in the priority list are ordered by a numerical score. Increment the attribute's priority.
5. If the number of tests in the condition list reaches a user defined maximum, scan the rest of the attributes and simply update their priorities if their qualitative state has changed.
If the condition list is not empty, create a rule and add it to the RDR. The conclusion of the rule comes from the action recorded in the trace. The current record becomes the rule's cornerstone case. The rule will be added as an exception to the last rule in the RDR if that rule is evaluated true (true branch). It is added as an alternative if false (else).
The output of LDRDR is an extension of the original RDR to cover cases
in the input data that were not covered by the original RDR.
There is a priority list for each control action. These lists can be transferred from one stage to another. During learning, the priority list is updated automatically by considering attributes that contribute more in rule generation (except for the attributes that the expert decides are to be tested first). The priority of an attribute increments by one each time LDRDR notices a change in its qualitative state. This list is always sorted by priority.
LDRDR chooses attributes from the top of the list. After choosing the
attribute, LDRDR looks at the logged data to decide whether there is a
change in the qualitative state of that attribute or not. The attribute
will be included in one of the tests in the new rule if there is a change
in its qualitative state. This process continues until LDRDR creates the
maximum number of tests allowed or reaches the end of the priority list.
The manual knowledge acquisition part of this system (DRDR) is effective when it is possible for the pilot to articulate rules related to his or her performance. These rules tend to be general in nature and if they are not complete, in themselves, they are often useful as constrains on the learning system. Thus, the complexity of the rules is reduced.
Knowledge acquisition begins by applying some simple rules created manually by DRDR or by using LDRDR, to logged information. In both cases, rules can be tested by running the simulation in autopilot mode . During the flight, if the aircraft does not follow the expected path, the pilot is able to pause the flight and trace the rules that have been executed or are currently being executed. The pilot is also able to modify existing rules that seem incorrect by adding new rules. In this case, the previous and current status of the aircraft, plus all the simulations state variables, will be presented to the pilot. As well, all the rules under execution for each control action will be reported.
If the pilot decides to create rules using LDRDR, he or she must fly the aircraft and record data. This task can be repeated as many times as the pilot wants. After logging data, the log files are preprocessed and passed to LDRDR
The above procedure is repeated until a set of successful clones has been created. In our experiments, knowledge bases are created one stage at a time. This has the advantage that rules constructed for one stage can be transferred and adapted for other stages, thus reducing some of the effort required in later stages. The priority lists can also be transferred from one stage to another or new priority lists can be created for each new stages.
After every modification of the RDR (manually or automatically), the
flight simulation is run in autopilot mode to test the new RDR. To do this,
the code of the original autopilot is replaced by the RDR (translated into
C). A C function is also incorporated into the flight simulator to determine
the current stage of the flight and when to change stages. The appropriate
set of rules for each stage is then selected from four independent if-statements
in each stage for every control action (Sammut et al, 1992). Table 2 shows
a controller built for the first stage. Note that these are the rules created
by one pilot. Another pilot would almost certainly create slightly different
rules.
|
if airspeed <= 50 then level_pitch else if elevation < 100 then pitch_up_3 else if elevation <= 110 then pitch_up_1 except if y_feet > 1970 then pitch_down_2 else if elevation < 130 then pitch_down_1 else if elevation < 130 then pitch_down_3 |
|
if y_feet <= 250 then full_flaps else if y_feet <= 500 then half_flaps |
|
if azimuth <= 50 then left_roll_1 else if azimuth <= 1800 then left_roll_2 else if azimuth <= 3550 then right_roll_1 else if azimuth <= 3599 then right_roll_2 |
|
|
DRDR has the option to convert the RDR into a decision list of if-then
rules. In this case, each node of a RDR will be displayed as a rule (Figure
4). This includes all the satisfied conditions from the root to that node.
In our experiments, we found this option very useful for users when they
needed to create new rules. Rules are stored as C if-then statements. A
macro, "THEN" has been added for readability.
|
|
DRDR makes it possible for the pilot to investigate each node by entering
its number. In this case, the current status of the aircraft plus the conditions
satisfying the last rule are presented. Also, the conclusion reached and
all the satisfied conditions are reported to the pilot. The rule's number
plus the number of its parent and children are also available. The pilot
is also able to see the information about all these nodes (Figure 5).
|
|
Parvaz was tested using three volunteers. Among the subjects, only one of them, the first author, was familiar with Parvaz, while another was familiar with a flight simulator. All subjects were postgraduate students in computer science. Their task was to create a set of knowledge bases using Parvaz that could successfully complete the previously specified flight plan.
Prior to the experiments, the subjects received a one hour tutorial in the use of Parvaz and attended a demonstration. During the demonstration, the demonstrator explained how he flew the aircraft and the subjects were allowed to ask questions about the flight and the flight plan. They were allowed to practice with the simulator until they became proficient in flying the aircraft. During the experiments the following data were collected:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Examples can be used in two ways. They may be used as input to an induction program whose task is to find an abstraction of a control strategy from the data. Examples may also be used to induce the expert to discern difference between cases, thereby allowing the knowledge acquisition system to construct rules semi-automatically. The work presented in this paper demonstrates the feasibility of both uses of examples. In particular, it shows that the RDR methodology can be extended to domains where the expertise involved is necessarily subcognitive.
There are several directions in which future research might head.
Bratko, I. (1993). Qualitative reasoning about control. In Proceeding of the ETFA93 Conference, Cairns, Austria, : PP
Compton, P. (1992)." Insight and Knowledge." AAAI, Spring Symposium: Cognitive aspects of knowledge acquisition., Stanford University, PP 57-63
Compton, P. J., & Jansen, B. (1988). Knowledge in context: A strategy for expert system maintenance. In Proceedings of the Australian Artificial intelligence Conference,
Dzeroski, S. (1993). Discovering Dynamics. Proceeding of the 10th International Conference on Machine Learning (eds. Utgoff, P.), Amhert, Massachusetts, : PP 97-103.
Gaines, B. R. and P. Compton (1992). Induction of Ripple-Down Rules. Proceeding of the 5th Australian Joint Conference on Artificial Intelligence, Hobert, Tasmania, : PP
Kang, B. H. (1995). "Validating Knowledge Acquisition: Multiple Classification Ripple Down Rules." University of New South Wales, Ph.D. thesis.
Makarovic, A. (1991). A Qualitative Way of Solving the Pole Balancing Problem Machine Intelligence, Eds. J. Hayes, D. Michie, E. Tyugu, Oxford, : PP 241-258.
Michie, D., & Chambers, R. A. (1968). Boxes: An Experiment in Adaptive Control. In E. Dale & D. Michie (Eds.), Machine Intelligence 2. Edinburgh: Oliver and Boyd.
Michie, D., Bain, M., & Hayes-Michie, J. E. (1990). Cognitive models from subcognitive skills. In M. Grimble, S. McGhee, & P. Mowforth (Eds.), Knowledge-base Systems in Industrial Control. Peter Peregrinus.
Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann.
Sammut, C., Hurst, S., Kedzier, D., & Michie, D. (1992). Learning to Fly. In D. Sleeman & P. Edwards (Ed.), Proceedings of the Ninth International Conference on Machine Learning, Aberdeen: Morgan Kaufmann.
Sammut, C. (1996). Automatic construction of reactive control systems using symbolic machine learning. Knowledge Engineering Review.
Shiraz, G. M. and C. Sammut (1997)." Combining Knowledge Acquisition and Machine Learning to Control Dynamic Systems." The Fifteenth International Joint Conference on Artificial Intelligence (IJCAI-97), Nagoya, Japan, PP
Urbancic, T., & Bratko, I. (1994). Reconstructing Human Skill with Machine Learning. In A. Cohn (Ed.), Proceedings of the 11th European Conference on Artificial Intelligence, John Wiley & Sons.