Udo Hahn, Manfred Klenner & Klemens Schnattinger
Computational Linguistics Lab -- Text Knowledge Engineering Group
Platz der Alten Synagoge 1, D-79085 Freiburg, Germany
The work reported in this paper is part of a large-scale project aiming at the development of a German-language text knowledge acquisition system [Hahn et al.1996c] for two real-world application domains -- test reports on information technology products (current corpus size: approximately 100 documents with 105 words) and medical findings reports (current corpus size: approximately 120,000 documents with 107 words). The knowledge acquisition problem we face is two-fold. In the information technology domain lexical growth occurs at dramatic rates -- new products, technologies, companies and people continuously enter the scene such that any attempt at keeping track of these lexical innovations by hand-coding is clearly precluded. Compared with these dynamics, the medical domain is lexically more stable but the sheer size of its sublanguage (conservative estimates range about 106 lexical items/concepts) also cannot reasonably be coded by humans in advance. Therefore, the designers of text understanding systems for such challenging applications have to find ways to automate the lexical/concept learning phase as a prerequisite and, at the same time, as a constituent part of the text knowledge acquisition process. Unlike the current mainstream with its focus on statistically based learning methodologies [Lewis1991,Resnik1992,Sekine et al.1992], we advocate a symbolically rooted approach in order to break the concept acquisition bottleneck. This approach is based on expressively rich knowledge representation models of the underlying domain [Hahn et al.1996a,Hahn et al.1996b,Hastings1996].
We consider the problem of natural language based knowledge acquisition and concept learning from a new methodological perspective, viz. one based on metareasoning about statements expressed in a terminological knowledge representation language. Reasoning either is about structural linguistic properties of phrasal patterns or discourse contexts in which unknown words occur (assuming that the type of grammatical construction exercises a particular interpretative force on the unknown lexical item), or it is about conceptual properties of particular concept hypotheses as they are generated and continuously refined by the on-going text understanding process (e.g., consistency relative to already given knowledge, independent justification from several sources). Each of these grammatical, discourse or conceptual indicators is assigned a particular "quality" label. The application of quality macro operators, taken from a "qualification calculus" [Schnattinger & Hahn1996], to these atomic quality labels finally determines, which out of several alternative hypotheses actually hold(s).
The decision for a metareasoning approach is motivated by requirements which emerged from our work in the overlapping fields of natural language parsing and learning from texts. Both tasks are characterized by the common need to evaluate alternative representation structures, either reflecting parsing ambiguities or multiple concept hypotheses. For instance, in the course of concept learning from texts, various and often conflicting concept hypotheses for a single item are formed as the learning environment usually provides only inconclusive evidence for exactly determining the properties of the concept to be learned. Moreover, in "realistic" natural language understanding systems working with large text corpora, the underdetermination of results can often not only be attributed to incomplete knowledge provided for that concept in the data (source texts), but it may also be due to imperfect parsing results (originating from lacking lexical, grammatical, conceptual specifications, or ungrammatical input). Therefore, competing hypotheses at different levels of validity and reliability are the rule rather than the exception and, thus, require appropriate formal treatment. Accordingly, we view the problem of choosing from among several alternatives as a quality-based decision task which can be decomposed into three constituent parts: the continuous generation of quality labels for single hypotheses (reflecting the reasons for their formation and their significance in the light of other hypotheses), the estimation of the overall credibility of single hypotheses (taking the available set of quality labels for each hypothesis into account), and the computation of a preference order for the entire set of competing hypotheses, which is based on these accumulated quality judgments.
The knowledge acquisition methodology we propose is heavily based on the representation and reasoning facilities provided by terminological knowledge representation languages (for a survey, cf. woods92). As the representation of alternative hypotheses and their subsequent evaluation turn out to be major requirements of that approach, provisions have to be made to reflect these design decisions by an appropriate system architecture of the knowledge acquisition device (cf. Fig.1). In particular, mechanisms should be provided for:
The notion of context we use as a formal foundation for terminological metaknowledge and metareasoning is based on McCarthy's context model [McCarthy1993]. We here distinguish two types of contexts, viz. the initial context and the metacontext. The initial context contains the original terminological knowledge base (KB kernel) and the text knowledge base, a representation layer for the knowledge acquired from the underlying text by the text parser [Hahn et al.1994]. Knowledge in the initial context is represented without any explicit qualifications, attachments, provisos, etc. Note that in the course of text understanding -- due to the working of the basic hypothesis generation rules (cf. Section "Hypothesis Generation") -- a hypothesis space is created which contains alternative subspaces for each concept to be learned, each one holding different or further specialized concept hypotheses. Various truth-preserving translation rules map the description of the initial context to the metacontext which consists of the reified knowledge of the initial context. By reification, we mean a common reflective mechanism, which splits up a predicative expression into its constituent parts and introduces a unique anchor term, the reificator, on which reasoning about this expression, e.g., the annotation by qualifying assertions, can be based. This kind of reification is close to the one underlying the FOL system [Weyhrauch1980,Giunchiglia & Weyhrauch1988]. Among the reified structures in the metacontext there is a subcontext embedded, the reified hypothesis space, the elements of which carry several qualifications, e.g., reasons to believe a proposition, indications of consistency, type and strength of support, etc. These quality labels result from incremental hypothesis evaluation and subsequent hypothesis selection, and, thus, reflect the operation of several second-order qualification rules in the qualifier ( quality-based classi fier). The derived labels are the basis for the selection of those representation structures which are assigned a high degree of credibility -- only those qualified hypotheses will be remapped to the hypothesis space of the initial context by way of (inverse) translation rules. Thus, we come full circle. In particular, at the end of each quality-based reasoning cycle the entire original i-th hypothesis space is replaced by its ( i+1)-th successor in order to reflect the qualifications computed in the metacontext. The ( i+1)-th hypothesis space is then the input of the next quality assessment round.
Figure 1. Architecture for Text Knowledge Acquisition
Terminological Logic. We use a standard terminological concept description language, referred to as , which has several constructors combining atomic concepts, roles and individuals to define the terminological theory of a domain (for a subset, see Table 1).
Table 1. Syntax and Semantics for a Subset of
Table 2. Axioms
Concepts are unary predicates, roles are binary predicates over a domain , with individuals being the elements of . We assume a common set-theoretical semantics for -- an interpretation is a function that assigns to each concept symbol (the set A) a subset of the domain , , to each role symbol (the set P) a binary relation of , , and to each individual symbol (the set I) an element of , . Concept terms and role terms are defined inductively. Table 1 contains corresponding constructors and their semantics, where C and D denote concept terms, while R and S denote roles. represents the set of role fillers of the individual d, i.e., the set of individuals e with .
By means of terminological axioms (for a subset, see Table 2 a symbolic name can be introduced for each concept. It is possible to define necessary and sufficient constraints (using ) or only necessary constraints (using ). A finite set of such axioms is called the terminology or TBox. Concepts and roles are associated with concrete individuals by assertional axioms (see Table 2; a,b denote individuals). A finite set of such axioms is called the world description or ABox. An interpretation is a model of an ABox with regard to a TBox, iff satisfies the assertional and terminological axioms. Terminology and world description together constitute the terminological theory for a given domain.
Reification. Let us assume that any hypothesis space H contains a characteristic terminological theory. In order to reason about that theory we split up the complex terminological expressions by means of reification. We here define the (bijective) reification function , where is a terminological expression known to be true in the hypothesis space H and is its corresponding reified expression, which is composed of the reificator (an instance of the concept class REIF), the type of binary relation involved (including INST-OF and ISA), the relation's domain and range, and the identifier of the hypothesis space in which the expression holds. Table 3 gives two definitions for , more complex ones are provided in schnattinger-et-al95. By analogy, we may also define the function with the corresponding inverse mapping.
Table 3. Sketch of the Reification Function
Given the set which denotes all and the set of all instances r of the class REIF (i.e., ), we supply the function , which maps each reified expression to the corresponding instance of REIF, i.e., the reificator:
Translation between Contexts. Translation rules are syntactic transformations which derive sentences in the metacontext that are equivalent to sentences in the initial context. A translation rule from context to context is any axiom of the form with and being formulas. These translation rules are lifting rules in the sense of McCarthy, as they also relate the truth in one context to the truth in another one.
Instead of supplying a translation rule for each conceptual role from the set P, for brevity, we state a single second-order axiom such that the initial context be translatable to the metacontext under truth-preserving conditions:
In the metacontext, qualifications can now be expressed instantiating the specific role QUALIFIED by a qualifying assertion with respect to some reificator r.
In a similar way, we may construct a translation scheme which (re)translates the metacontext to the initial context. This rule incorporates the quality of some reified element r, which must exceed a specific threshold criterion (cf. Section "Quality-Based Reasoning with Qualification Rules").
In the architecture we propose, text parsing and concept acquisition from texts are tightly coupled. For instance, whenever two nominals or a nominal and a verb are supposed to be syntactically related the semantic interpreter simultaneously evaluates the conceptual compatibility of the items involved. Since these reasoning processes are fully embedded into a terminological knowledge representation system, checks are being made whether a concept denoted by one of these objects is allowed to fill a role of the other one or is an instance of this concept. If one of the items involved is unknown, i.e., a lexical and conceptual gap is encountered, this interpretation mode generates initial concept hypotheses about the class membership of the unknown object, and, as a consequence of inheritance mechanisms holding for concept hierarchies, provides conceptual role information for the unknown item.
Besides these conceptually rooted computations, the hypothesis generation process also assigns labels which indicate the type of syntactic construction under analysis. These labels convey information about the language-specific provenance of the hypotheses and their individual strength. This idea is motivated by the observation that syntactic constructions differ in their potential to limit the conceptual interpretation of an unknown lexical item. In other words, linguistic structures constrain the range of plausible inferences that can be drawn to properly locate their associated concepts in the domain's concept hierarchy. For example, an apposition like "the operating system OS/2" doubtlessly determines the superclass of "OS/2" (here considered as an unknown item) to be "operating system", while "IBM's OS/2" at best allows to infer that "OS/2" is one of the products of IBM (e.g., a computer or a piece of software). Thus we may stipulate that hypotheses derived from appositions are more reliable ("certain") than those derived from genitival phrases only, independent of the conceptual properties being assigned.
The general form of a parser query (cf. also Fig. 1) triggering the generation of hypotheses is: query-type (target, base, label), with target being the unknown lexical item, base a given knowledge base concept, and label being the type of syntactic construction which relates base and target. In the following, we will concentrate on two particular instances of query types, viz. those addressing permitted role fillers and instance-of relations, respectively.
Table 4. Syntactic Qualification Rule PermHypo
The basic assumption behind the first syntactic qualification rule, PermHypo (Table 4), is that the target concept fills (exactly) one of the n roles of the base concept (only those roles are considered which admit nonnumerical role fillers and are "non-closed", i.e., still may accept additional role fillers). Since it cannot be decided on the correct role yet, n alternative hypotheses are opened (unless additional constraints apply) and the target concept is assigned as a potential filler of the i-th role in its corresponding hypothesis space. As a result, the classifier is able to derive a suitable concept hypothesis by specializing the target concept (initial status "unknown") according to the value restriction of the base concept's i-th role. Additionally, PermHypo assigns a syntactic quality label to each i-th hypothesis indicating the type of syntactic construction in which the (lexical counterparts of the) target and base concept co-occur in the text. These qualifying assertions are expressed at the terminological level by linking the reificator of a terminological term via a role QUALIFIED to a qualifying proposition.
In the syntactic qualification rules described in Tables 4 and 5 the symbol "" separates the condition part (starting from the operator EXISTS) from the action part (containing the TELL operator). The procedural semantics of the operators FORALL and EXISTS should be intuitively clear; the operator TELL is used to initiate the assertion of terminological propositions. For PermHypo, we assume , and = PP-ATTRIBUTION,GENITIVE-ATTRIBUTION, CASE-FRAME-ASSIGNMENT, which is a subset of the syntactic quality labels. generate is a function that -- in the initial context -- either retrieves an already existing hypothesis space containing a particular terminological assertion, or, if no such hypothesis space yet exists, creates or specializes a hypothesis space and asserts a particular terminological term in this newly constructed hypothesis space. A transformation rule immediately maps this terminological assertion to its reified form in the metacontext.
The second syntactic qualification rule, SubHypo (Table 5), is triggered if a target has been encountered in an exemplification phrase ("operating systems like OS/2"), as part of a compound noun ("WORM technology") or occurs in an apposition ("the operating system OS/2"). As a consequence, an instance-of relation between the target and the base item is hypothesized and, in addition, that syntactic quality label is asserted which indicates the language-specific construction figuring as the structural source for that hypothesis. For SubHypo, we assume , and = EXEMPLIFICATION-NP, NOUN-DECOMPOSITION, APPOSITION-ASSIGNMENT, which is another subset of the syntactic quality labels.
Table 5. Syntactic Qualification Rule SubHypo
In this section, we will focus on the kind of quality assessment which occurs at the knowledge base level only; it is due to the operation of (second-order) conceptual qualification rules. Within second-order logic we may quantify over relations of first-order terms, e.g., quantifies over all roles R which relate a and b. Our intention is to use second-order expressions in order to reason about the properties of terminological descriptions and, thus, to determine the credibility of various concept hypotheses. Such expressions can be integrated into the condition part of production rules in order to generate qualifying assertions for concept hypotheses.
Qualifying assertions are the raw data for the computation of quality labels by the classifier which asserts INST-OF relations to the corresponding quality labels (we here only deal with simple quality labels that are associated with exactly one qualifying role, though more complex conditions can be envisaged). Combining the evidences collected this way in terms of a quality ranking of concept hypotheses, only those reified terms that reach a certain credibility threshold after each quality assessment cycle are transferred from the metacontext back to the initial context (cf. Fig.1; qualified hypo space in the initial context). Through the use of reification, we avoid the problems associated with the computational intractability of second-order logic, and still stay on solid first-order ground.
In the remainder of this section we supply verbal and graphical descriptions of four conceptual qualification rules. These rules are tested in the metacontext immediately after the reification function has been applied to some proposition in the initial context. The bold portions in the figures indicate the terminological terms to be qualified, while the lighter ones depict the qualifying instance. A detailed example of the working of these rules will be provided in the next section. A formal description of the rules is given in hahn-et-al96a.
Assessment of Quality Labels. During each learning step several qualification rules may fire and thus generate various quality labels. In order to select the most credible hypotheses from each cycle, we take the direction (positive/negative) and the individual `strength' of each label into account by formulating the following Threshold Criterion:
Ranking of Hypotheses. Only those hypotheses that continuously reach the credibility threshold after each quality assessment cycle are transferred from the metacontext back to the initial context. At the end of the text analysis a final ranking of those concept hypotheses is produced that have repeatedly passed the Threshold Criterion by applying the following Ranked Prediction Criterion:
We will now exemplify quality-based terminological reasoning by considering a concept acquisition task in the domain of information technology. As a result of applying syntactic and conceptual qualification rules different degrees of credibility are assigned to concept hypotheses and, finally, one hypothesis is selected as the most credible one. Let us assume the following terminological axioms (for technical details, cf. Section "Formal Framework of Quality-Based Reasoning"):
In addition, the following reified assertional axioms are stipulated:
Finally, two verb interpretation rules (for "develop" and "offer", respectively) are supplied mapping lexical items onto "conceptually entailed" propositions of the text knowledge base:
Consider the phrase "Marktanalytiker bestätigen, daß Compaq seit Jahren erfolgreich LTE-Lites anbietet und seit kurzem auch Venturas." Assuming Venturas to be the target concept, two ambiguities arise (these are rephrased in English terms): (1) "Market analysts say that Compaq has been successfully offering LTE-Lites for many years and Venturas [AGENT] has recently begun to do so as well." vs. (2) "Market analysts say that Compaq has been successfully offering LTE-Lites for many years and has recently begun to offer Venturas [PATIENT] as well.". For the first part of the sentence (up to "anbietet" (offer)), the parser incrementally generates a new instance of OFFER, assigns Compaq as AGENT and LTE-Lite as PATIENT of that instance (it has not yet encountered the unknown item Venturas; a partial graphical representation is given in Fig. 2). Thus, we get in hypothesis space H0 (fragments of the reified knowledge structures are depicted in Fig. 3):
Figure 3. Concept Graph from the Metacontext
Figure 2. Concept Graph from the Initial Context
The verb interpretation rule for OFFER has no effects, since PRODUCES is already true (). As already mentioned, the unknown item Venturas can either be related to LTE-Lite via the The verb interpretation rule for OFFER has no effects, since PRODUCES is already true .
As already mentioned, the unknown item Venturas can either be related to LTE-Lite via the AGENT role or to Compaq via the PATIENT role of OFFER. This is achieved by the application of the hypothesis generation rule PermHypo (Table 4) that opens two hypothesis subspaces of H0, H1 and H2, for each interpretation of Venturas. Their reified counterparts, H 1 and H 2, are assigned the syntactic quality label CASE-FRAME-ASSIGNMENT (for the ease of readability, we will defer the consideration of syntactic quality labels in the formal descriptions of the current example until they contribute to the discrimination between different hypotheses in later stages of the sample analysis).
The creation of (together with ) triggers the generation of two quality labels ADDITIONAL-ROLE-FILLER according to rule III from the previous section (note that for both spaces the propositions to are assumed to hold). , and cause the verb interpretation rule for OFFER to fire yielding . Applying the terminological axioms for OFFER leads to the deduction of (the AGENT role of OFFER restricts any of its fillers to PRODUCER and its taxonomic superconcepts, viz. COMPANY () and PHYSICAL-OBJECT (), by way of transitive closure). In particular, H 1, which covers the AGENT interpretation contains the following assertions ( to ):
On the other hand, H 2 ( to ) covers the PATIENT interpretation:
In H 2 all hypotheses generated for Venturas ( to ) receive the quality label SUPPORTED, in H 1 none. This support (cf. rule II from the previous section) is due to the conceptual proximity and have relative to and , respectively. Since both hypothesis spaces, H1 and H2, imply that Venturas is at least a PHYSICAL-OBJECT the label MULTIPLY-DEDUCED (rule IV) is derived in the corresponding reified spaces, H 1 and H 2. Rule III also triggers in H 1 and H 2, so it does not contribute to any further discrimination.
H2 and thus H 2, however, can further be refined. According to the terminological axioms, Compaq -- a NOTEBOOK-PRODUCER () -- produces NOTEBOOKs or ACCUs. Thus, Venturas, the filler of the PRODUCES role of Compaq () must either be a NOTEBOOK ( H 2)
or an ACCU ( H 2)
According to this distinction (cf. rule II), H 2 (but not H 2) is supported by PRODUCES , and PRODUCES .
Since we operate with a partial parser, the linguistic analyses may remain incomplete due to extra- or ungrammatical input. Assume such a scenario as in: "... Venturas ... entwickelt ... Compaq." ("... Venturas ... develops ... Compaq."). The parser triggers the syntactic qualification rule PermHypo on a presumed case frame assignment for the verb "entwickelt" (develop). As a result, the newly generated (reified) hypotheses receive the same syntactic quality label, viz. CASE-FRAME-ASSIGNMENT.
Due to the value restrictions that hold for DEVELOP, Compaq ( a priori known to be a PRODUCER) may only fill the AGENT role. Correspondingly, in H 1 and H 2 the following reified propositions hold:
In H1, where Venturas is assumed to be a PRODUCER, Venturas (additionally to Compaq) may fill the AGENT role of DEVELOP, leading to H 1:
In H2 and the spaces it subsumes, viz. H2 and H2, Venturas may only fill the PATIENT role. Given the occurrences of , and , the verb interpretation rule for DEVELOP fires and produces . This immediately leads to the generation of a CROSS-SUPPORTED label (rule I from the previous section) relative to proposition ():
Next, one of the interpretations of an ambiguous phrase such as "Die NiMH-Akkus von Venturas ..." invalidates context H2 (ACCU hypothesis) leading to the generation of the (entirely negative) quality label INCONSISTENT-HYPO in H 2. The inconsistency is due to the fact that an accumulator cannot be part of another accumulator. For the hypothesis space H 1 we get
and for H 2 we derive
As a consequence of the application of the rule PermHypo, the learner assigns to each reified hypothesis a syntactic label indicating that a prepositional phrase containing the target was attached to the base item (label PP-ATTRIBUTION).
Finally, consider the globally ambiguous sentence "Kein Wunder, daß der Notebook Venturas auf der Messe prämiert wurde.". Again, two readings are possible: (1) "There was no surprise at all that the Venturas notebook had been awarded at the fare" and (2) "There was no surprise at all that the notebook from [the manufacturer] Venturas had been awarded at the fare". Considering the first reading which implies VENTURAS to be a notebook, the SubHypo qualification rule (cf. Table 5) is triggered and assigns the (strong) syntactic label APPOSITION-ASSIGNMENT to proposition from hypothesis space H 2.
On the other hand, the second reading (genitive phrase attachment) invokes the PermHypo rule and the assignment of the (weaker) syntactic label GENITIVE-ATTRIBUTION to proposition . Note that the referent of "der Notebook Venturas" ( the notebook from Venturas) has been resolved to by the anaphor resolution component of our parser (strube95 strube95). This reference resolution process is legitimated by hypothesis of H 1, viz. (Venturas PRODUCES ). Thus, we get for H 1:
Note that in our example we now encounter for the first time two different syntactic labels as derived from the same verbal input. This is due to the global ambiguity of the sentence. Up to this point, all reified hypothesis spaces have received the same type and number of syntactic quality labels (i.e., CASE-FRAME-ASSIGNMENT (2) and PP-ATTRIBUTION (1), respectively). Hence, considering only syntactic evaluation criteria, a ranking of hypotheses based on the syntactic quality labels would prefer H2 over H1, since the label APPOSITION-ASSIGNMENT (from H 2) is stronger than the label GENITIVE-ATTRIBUTION (from H 1), all other labels being equal.
Considering the collection of conceptual quality labels we have derived, this preliminary preference is further supported. The most promising hypothesis space is H2 (covering the NOTEBOOK reading for Venturas) whose reified counterpart holds 10 positive labels: MULTIPLY-DEDUCED (1), SUPPORTED (8), CROSS-SUPPORTED (1), but only one negative label (ADDITIONAL-ROLE-FILLER). In contrast, H2 is ruled out, since an inconsistency has been detected by the classifier. Finally, H 1 (holding the PRODUCER interpretation for Venturas) has received a weaker level of confirmation --- MULTIPLY-DEDUCED (1), SUPPORTED (3), CROSS-SUPPORTED (1), and ADDITIONAL-ROLE-FILLER (2) --- and H1 is, therefore, less plausible than H2 (cf. also the "Ranked Prediction Criterion" in Section "Quality-Based Reasoning with Qualification Rules"). Summarizing our sample analysis, we, finally, have strong indications for choosing hypothesis H2 over H1 based on conceptual and syntactic assessment criteria, since they both point into the same direction (which, of course, needs not always be the case).
The preference aggregation scheme we have just sketched is based on a solid formal decision procedure which outweighs the contributions of different types and numbers of quality labels (cf. schnattinger-hahn96).
While major theoretical principles of our approach have already been previously published [Hahn et al.1996a,Hahn et al.1996b], our model, so far, lacked a serious empirical justification. In this section, we present some empirical data from a preliminary evaluation of the quality-based concept learner. In these experiments we focus on the issues of learning accuracy and the learning rate. Due to the given learning environment, the measures we apply deviate from those commonly used in machine learning approaches to concept learning. In concept learning algorithms like IBL [Aha et al.1991] there is no hierarchy of concepts. Hence, any prediction of the class membership of a new instance is either true or false. However, given such a hierarchy, a prediction can be more or less precise, i.e., it may approximate the goal concept at different levels of specificity. This is captured by our measure of learning accuracy which takes into account the conceptual distance of a hypothesis to the goal concept of an instance, rather than simply relate the number of correct and false predictions, as in IBL.
In our approach learning is achieved by the refinement of multiple hypotheses about the class membership of an instance. Thus, the measure of learning rate we propose is concerned with the reduction rate of hypotheses as more and more information becomes available about one particular new instance. In contrast, IBL-style algorithms consider only one concept hypothesis per learning cycle and their notion of learning rate relates to the increase of correct predictions as more and more instances are being processed.
Altogether we considered 18 texts which contained 6344 tokens. The input for the learner were terminological assertions representing the semantic interpretation of the sentences as exemplified by Fig. 2 (cf. Section "A Concept Acquisition Example").
Figure 5. Impact of the Qualification Calculus on the Learning Accuracy
Figure 4. Learning Accuracy at Different Prediction Levels
In a first sequence of experiments we investigated the learning accuracy of the system, i.e., the degree to which the system correctly predicts the concept class which subsumes (realizes) the target concept under consideration. The following parameters relate to the discussion of the learning accuracy:
Learning accuracy (LA) is defined as:
where SP specifies the length of the shortest path (in terms of the number of nodes traversed) from the TOP node of the concept hierarchy to the maximally specific concept subsuming (realizing) the instance to be learned; CP specifies the length of the path from the TOP node to that concept node which is common both for the shortest path (as defined above) and the actual path to the predicted concept (whether correct or not); FP specifies the length of the path from the TOP node to the predicted (in this case false) concept and DP denotes the distance between the predicted node and the most specific concept correctly subsuming the target instance.
Table 6. Summary of Some Concept Learning Results
Most of these parameters appear in Table 6 which summarizes the learning data for four different texts. The first two rows depict correct predictions (texts about "ASI-168" and "Eizo-T560i"), the first one with an significant reduction rate for the hypothesis spaces, the second with plenty of (low-valued second-order) predictions. Text "B310" illustrates a near miss (the target NOTEBOOK has only been narrowed down to COMPUTER-SYSTEM; cf. also the corresponding accuracy rate), while the final text "Ultra-Lite" gives an example of a "false" prediction (however, correct up to the level of HARDWARE).
As the system provides a ranking of learning hypotheses (cf. the PRED column in Table 6) we investigated the optimal choice of rankings. Fig. 4 gives the accuracy graph for the different rankings produced by the learner. We considered each level of predictions (up to the third; based on the application of the "Ranked Prediction Criterion") independently for each text and, finally, determined its mean accuracy for each level (91% (18), 39% (4), 50% (2) for first-, second-, and third-order prediction, respectively; absolute numbers of cases are in brackets). The mean values for second- and third-order predictions should be considered with care as the base number of cases is terribly low. In 14 of the 18 cases the "Threshold Criterion" reduces the number of concept hypotheses to 1; hence only few second- and third-order predictions were generated (we interpret this finding as an indication of confidence in the underlying quality-based reasoning mechanisms). Since none of the (admittedly low numbers of) second- and third-order predictions ever exceeded the accuracy value of the prediction at the first level, the "Ranked Prediction Criterion" seems discriminative already at this top level. So we may conclude that choosing only the first level of predictions yields the highest benefits.
In order to illustrate the contribution of the qualification calculus on the learning accuracy achieved, Fig. 5 depicts the average accuracy value including (solid upper line) the qualification rules and without (dotted lower line), i.e., no "Threshold" and no "Ranked Prediction Criterion" (cf. Section "Quality-Based Reasoning with Qualification Rules") were applied, though the usual terminological reasoning facilities still were used.
Figure 6. Mean Number of Included Concepts per Learning Step
The learning accuracy focuses on the final result of the learning process. By considering the learning rate, we supply background data from the step-wise development of the learning process. Fig. 6 contains the mean number of transitively included concepts for all considered hypothesis spaces per learning step (each concept hypothesis denotes a concept which transitively subsumes various subconcepts). Note that the most general concept hypothesis denotes TOP and, therefore, includes the entire knowledge base (currently, 340 concepts). We grouped the 18 texts into two classes in order to normalize the number of propositions they contained, viz. one ranging from 7 to 11 ( class 1, reductionto 7 propositions), the other ranging from 12 to 25 propositions ( class 2, reduction to 12 propositions). The left graph in Fig. 6 gives the overall mean learning rate for both classes. The right one (zooming at the learning steps 6 to 12) focuses on the reduction achieved by the qualification calculus, yielding a final drop of approximately 50% (class 1: from 4.9 to 2.6 concepts, class 2: from 3.7 to 1.9).
Summarizing this preliminary evaluation experiment, the quality-based learning system yields competitive accuracy rates (a mean of 91%; cf. Fig. 4), exhibits significant and valid reductions of the hypothesis spaces (cf. Fig. 6) based on the working of the qualification calculus (cf. Fig. 5).
Our approach bears a close relationship to the work of Mooney  Martin  and Hastings , who aim at the automated learning of word meanings and their underlying concepts from context. But our work differs from theirs in that the need to cope with several competing concept hypotheses is not an issue in these studies. Considering, however, the limitations almost any parser available for realistic text understanding tasks currently suffers from (finally leading to the generation of partial parses only), usually multiple concept hypotheses will be derived from a given natural language input. Therefore, we stress the need for a hypothesis generation and evaluation component as an integral part of any robust natural language system that learns in tandem with such coverage-restricted devices.
Other systems aiming at text knowledge acquisition differ from our approach in that they either rely on hand-coded input [Skuce et al.1985] or use overly simplistic keyword-based content analysis techniques [Agarwal & Tanniru1991], are restricted to a deductive learning mode [Handa & Ishizaki1989], use quantitative measures for uncertainty to evaluate the confidence of learned concept descriptions (as the WIT system, cf. Reimer ), or lack the continuous refinement property of pursuing alternative hypotheses (as the SNOWY system, cf. Gomez , Gomez ). Nevertheless, WIT and SNOWY are closest to our approach, at least with respect to the text understanding methodologies being used, namely, the processing of realistic texts and the acquisition of taxonomic knowledge structures with the aid of a terminological representation system.
As WIT uses a partial parser similar to our approach, learning results are also uncertain and their confidence level needs to be recorded for evaluation purposes. However, WIT uses a quantitative approach to evaluate the confidence of learned concept descriptions, while we prefer a qualitative method which is fully embedded into the terminological reasoning process underlying text understanding. This allows us to reason about the conditions which led to the creation and the further refinement of concept hypotheses in terms of quality assessment.
SNOWY is a knowledge acquisition system that processes real-world texts and builds up a taxonomic knowledge base that can be used in classification-based problem solving. In contrast to our system, SNOWY is domain-independent, i.e., it uses no domain-specific background knowledge for text understanding and knowledge acquisition. As a consequence, SNOWY has no means for a knowledge-based assessment of new hypotheses. This is in contrast to our approach, in which conceptual qualification rules are incrementally applied to judge the quality of new and alternative hypotheses given a sufficient domain theory.
The processing of written texts can, in general, be considered a major step towards the automation of knowledge acquisition [Virkar & Roach1989] which, so far, has been dominated by interactive modes in a dialogue setting, e.g., KALEX [Schmidt & Wetter1989] or AKE [Gao & Salveter1991].
We have introduced a methodology for automated knowledge acquisition and learning from texts that relies upon terminological (meta)reasoning. Concept hypotheses which have been derived in the course of the text understanding process are assigned specific "quality labels" (indicating their significance, reliability, strength). Quality assessment of hypotheses accounts for conceptual criteria referring to their given knowledge base context as well as linguistic indicators (grammatical constructions, discourse patterns), which led to their generation.
Metareasoning, as we conceive it, is based on the reification of terminological expressions, the assignment of qualifications to these reified structures, and the reasoning about degrees of credibility these qualifications give rise to based on the evaluation of first-order syntactic and second-order conceptual qualification rules. Thus, the metareasoning approach we advocate allows for the quality-based evaluation and a bootstrapping-style selection of alternative concept hypotheses as text understanding incrementally proceeds. A major constraint underlying our work is that this kind of quality-based metareasoning -- through the use of reification mechanisms -- is completely embedded in the homogeneous framework of standard first-order terminological reasoning systems using multiple contexts, so that we may profit from their full-blown classification mechanisms.
The applicability of this terminological metareasoning framework has been shown for a concept acquisition task in the framework of realistic text understanding. We are currently focusing on the formulation of additional qualification rules and query types, the formalization of a qualification calculus which captures the evaluation logic of multiple quality labels within a terminological framework [Schnattinger & Hahn1996], and an in-depth empirical evaluation of our approach based on a larger corpus of texts. The knowledge acquisition and learning system described in this paper has been fully implemented in LOOM [MacGregor1994].
Acknowledgments. This work was partially supported by grants from DFG under the account Ha 2097/2-1 and Ha 2097/3-1. We like to thank the members of our group for fruitful discussions. We also gratefully acknowledge the provision of the LOOM system from USC/ISI.
We only consider the derivation of quality labels referring to the target concept and leave away those labels that support propositions already contained in the a priori knowledge of the KB kernel.
By normalization with respect to the number of propositions we account for the fact that each text is characterized by a specific number of learning steps per unknown concept. The result of a learning step is either a refinement of the concept to be learned or a confirmation of the current state of the concept description. In order to compare texts with varying numbers of learning steps we grouped those texts together which exhibited a common behavior with respect to the confirmation of concept hypotheses (i.e., they did no longer change at all; this usually happens at the very end of the learning process) by simply eliminating learning steps which exhibit simple confirmation behavior. This coincides with empirical evidence we have collected for the fact that the proposed learning procedure quite rapidly finds the relevant discriminations which then tend to be continuously confirmed during the remaining learning process.