Providing User-Support in Performing Knowledge Discovery in Databases

Robert Engels, Michael Erdmann, Rainer Perkuhn, Rudi Studer
Institut fur Angewandte Informatik und Formale Beschreibungsverfahren
University of Karlsruhe (TH)
D-76128 Karlsruhe (Germany)
e-mail: {engels l erdmann l perkuhn l studer} @aifb.uni-karlsruhe.de

Knowledge Management (KM) is becoming a success factor for industrial organisations. Obtaining control over and gaining information out of data helps to achieve the organisation's goals more effectively. Thus knowledge (or information) becomes a very important resource. This resource must be adequately procured, stored, processed and communicated. These tasks are central points of Knowledge (and Information) Management which embraces key issues such as knowledge acquisition, data warehouses, data mining, data base management systems, knowledge representation, case-based reasoning, hyper media, workflow management, and decision support systems.

As one can see, besides the information systems aspect AI methods and concepts play a sign)ficant role in Knowledge Management, esp. concerning procurement (knowledge acquisition), storing (knowledge representation), and processing (e.g. data mining, decision support systems) of knowledge.

One important source of information and knowledge is the growing number of large databases which are built up and maintained in large organizations, but in the meantime also in a lot of small and medium size enterprises (SMEs). As a consequence, the research and application area 'knowledge Discovery in Databases' (cf. [Frawley et al. 91], [Fayyad and Uthurusamy 95], [Simoudis et al. 96], [Fayyad et al. 96]) gained some importance during the last years. Successful KDD applications in marketing, financial investment, or network management indicate that the development of KDD applications may result in strategic advantages in performing the business tasks (cf. e.g. [Brachman et al. 96]).

However, all experiences show that the development of (successful) KDD applications is a complex and errorprone process [Brachman and Anand 96]. In general, the KDD process consists of a task analysis step (for identifying the real application problem), a pre-processing step (for identifying and selecting relevant data), a data mining step, and a post-processing step (for evaluating the data mining results). At the moment, only first steps towards a comprehensive methodology that supports such complex and iterative KDD processes have been proposed (cf. e.g. [Wirth and Reinartz 96], [Engels 96]). On the other hand, there is a clear indication that such a methodology is needed since:

especially in SMEs KDD applications will be developed by application specialists and not by KDD specialists,
task analysis has to be supported in a systematic way in order to come up with a KDD problem specification which really meets the application needs,
there exist so many dependencies between different KDD algorithms (e.g. for pre-processing and data mining) as well as between data characteristics and applicable algorithms that a kind of planning support is required for achieving a well-defined and consistent KDD process,
the KDD process is highly iterative which means that results of later steps (like e.g. evaluation) will have to be fed back to earlier steps (like e.g. pre-processing).

Our approach for providing user-support in performing Knowledge Discovery in Databases [Engels 96] aims at developing a methodology to support the user when performing entire KDD processes and to support the reuse of previously successfully applied KDD processes.

The methodology should provide a user guidance module to enable application specialists (not necessarily KDD experts) to develop a KDD process by (re-)using successfully applied (parts of) KDD processes, data mining algorithms, and preand post-processing algorithms adequately. By supporting such a reuse-oriented approach the development time of new applications decreases while higher quality solutions are achieved. The user guidance module includes a repository which contains previously applied KDD processes, data mining algorithms, and pre/post-processing algorithms. To be able to retrieve these processes and algorithms (e.g. by applying case-based reasoning methods) there must exist a uniform description that highlights the differences in the functionality of these processes and algorithms. On the other hand, the initial specification of the KDD task must be decomposed into appropriate subtasks in order to make the complex KDD process more tractable and in order to be able to associate adequate algorithms with the identified subtasks (cf. [Engels 96], [Engels and Perkuhn 96]). Furthermore, an appropriate specification of data characteristics is also a prerequisite for selecting applicable analysis algorithms (cf. e.g.[Hoppe 96], [Michie et al. 94]).

Such a user guidance module integrates concepts from the Knowledge Acquisition community and the KDD community, e.g. structured knowledge descriptions on several levels ([Schreiber et al. 94], [Angele et al. 96a]), task decomposition [Angele et al. 96b], and the definition of repositories of reusable processes and algorithms.

Since our approach is developed in cooperation with an industrial partner emphasis is also put on evaluating the developed methodology in real world domains such as the quality assessment of cars.

As we have mentioned in the beginning KDD is only one facet of Knowledge Management. Because of KDD's capability to extract relevant information from a large amount of data and to use the discovered knowledge in a strategic way we think KDD is an important facet of Knowledge Management.

Acknowledgements

Part of this work is supported by Daimler-Benz AG, Research and Development.

References

J. Angele, D. Fensel, and R. Studer (1996a): Domain and Task Modelling in MIKE. In: A.G. Sutcliffe, D. Benyon, and F. van Assche (eds.): Domain Knowledge for Interactive System Design. Proceedings of the TC8/WG8.2 Conference on Domain Knowledge in Interactive System Design, Geneva, Switzerland, May 1996. Chapman & Hall, London, 1996. J. Angele, S. Decker, R. Perkuhn, and R. Studer (1996b):

Modelling Problem-Solving Methods in New KARL. In: Proceedings of the 10th Knowledge Acquisition Workshop (KAW'96), Banff, Canada, 1996. R. Brachman and T. Anand (1996): The Process of Knowledge Discovery in Databases: A HumanCentered Approach. In: U. Fayyad et al. (eds.): Advances in Knowledge Discovery and Data Mining. AAAI Press, Menlo Park, California, 1996. R. Brachman, T. Khabaza, W. Kloesgen, G. Piatetsky

Shapiro, and E. Simoudis (1996): Mining Business Databases. Communications of the ACM 39, 11 (November 1996), pp. 42-48. R. Engels (1996): Planning Tasks for Knowledge Discovery in Databases. Performing Task-Oriented User-Guidance. In: Proceedings of the 2nd International Conference on Knowledge Discovery in Databases (KDD'96). Portland, Oregon, August 1996.

R. Engels and R. Perkuhn (1996): Describing and Integrating Competence Theories for Problem-Solving Components and Machine Learning Algorithms. In: Position Paper Collection of the European Knowledge Acquisition Workshop (EKAW '96), Matlock Bath, U.K., May 1996. (ftp://ftp.aifb.uni-karlsruhe.de/pub/ren/ekaw96.ps.Z)

U.M. Fayyad and R. Uthurusamy (Eds.) (1995): Proceedings of the First International Conference on Knowledge Discovery & Data Mining. AAAI Press, Menlo Park, 1995.

U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy (eds.) (1996): Advances in Knowledge Discovery and Data Mining. AAAI Press, Menlo Park, California, 1996.

W. Frawley and G. Piatetsky-Shapiro. (1991) Knowledge Discovery in Databases. Cambridge, Mass. 1991.

D. Michie, D.J. Spiegelhalter, and C.C. Taylor (1994). Machine Learning, Neural and Statistical Classification. Ellis Horwood. 1994.

Th. Hoppe (1996): Kriterien zur Auswahl maschineller Lernverfahren (Criteria for selecting machine learning algorithms). Informatik Spectrum, 19(1),12- 19, 1996.

A. Th. Schreiber, B. Wielinga, R. de Hoog, H. Akkermans, and W. van de Velde (1994): CommonKADS: A Comprehensive Methodology for KBS Development. IEEE Expert, December 1994, pp. 28-37.

E. Simoudis, J. Han and U. Fayyad (Eds.) (1996): Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. AAAI Press, Menlo Park, 1996.

R. Wirth and Th. Reinartz (1996): Detecting Early Indicator Cars in an Automotive Database: A MultiStrategy Approach. In: E. Simoudis, J. Han and U. Fayyad (eds.): Proceedings of the 2nd Int. Conference on Knowledge Discovery in Databases, Portland, Oregon. 1996.