Evaluation Issues for Visual Programming Languages

Tim Menzies
Artificial Intelligence Department
School of Computer Science and Engineering
The University of NSW  
http://www.cse.unsw.edu.au/ timm

February 20, 1998


Visual programming systems have several advantages: (1) they are very motivating for beginners; (2) a spatial representation simplifies certain limited kinds of inferencing; (3) the use of ill-structured diagrams may assist in brain-storming. However, these three benefits may not be widely applicable. Many software engineering and knowledge engineering problems are not inherently spatial. Also, most VP tools do not support ill-structured diagrams. Lastly, diagrams are not necessarily superior explanation tools. In many cases, studies claiming certain benefits with visual systems can be matched by a counter-study with the opposite results. Clearly, some variable is not being controlled for within these opposing studies. It is possible that the task of a system is more important than its presentation (visual or textual).


Many knowledge acquisition systems use some sort of visual presentation. Kremer argues convincingly that such visual languages have numerous advantages for knowledge acquisition (KA) [Kremer, 1998]. Other researchers claim numerous benefits for visual frameworks. For example:

When we use visual expressions as a means of communication, there is no need to learn computer-specific concepts beforehand, resulting in a friendly computing environment which enables immediate access to computers even for computer non-specialists who pursue application [Hirakawa & Ichikawa, 1994].

This case that pictures assist in explaining complicated knowledge seems seems intuitively obvious. But is it correct? Other widely held intuitively obvious beliefs have been found to be incorrect, and sometimes even spectacularly so:

This article takes a critical look at the available evidence on the efficacy of visual programming (VP) systems. After an introduction to VP, we will review theoretical studies and small scale experimental studies suggest an inherent utility in visual expressions. However, when we explore the available experimental evidence, we find numerous contradictory results. This exploration extends my previous arguments in this area [Menzies, 1996].

A (Brief) Introduction to Visual Programming

As a rough rule-of-thumb, a visual programming system is a computer system whose execution can be specified without scripting except for entering unstructured strings such as Monash University Banking Society or simple expressions such as X above 7 . Visual representations have been used for many years (e.g. Venn diagrams) and even centuries (e.g. maps). Executable visual representations, however, have only arisen with the advent of the computer. With falling hardware costs, it has become feasible to build and interactively manipulate intricate visual expressions on the screen.

More precisely, a non-visual language is a one-dimensional stream of characters while a VP system uses at least two dimensions to represent its constructs [Brown & Kimura, 1994]. We distinguish between a pure VP system and a visually supported system:

Arguments for the Advantages of VP

Many authors argue that VP systems are a better method for users to interact with a program. Green et. al. [Green et al., 1991] and Moher et.al. [Moher et al., 1993] summarise claims such the above quote from [Hirakawa & Ichikawa, 1994] as the superlativist position; i.e. graphical representations are inherently superior to textual representations. Both the Green and Moher groups argue that this claim is not supported by the available experimental evidence. Further, they argue against claims that visual expressions offer a higher information accessibility; for example:

Pictures are superior to texts in a sense that they are abstract, instantly comprehensible, and universal. [Hirakawa & Ichikawa, 1994]

My own experience with students using visual systems is that the visual environment is very motivating to students. Others have had the same experience:

The authors report on the first in a series of experiments designed to test the effectiveness of visual programming for instruction in subject-matter concepts. Their general approach is to have the students construct models using icons and then execute these models. In this case, they used a series of visual labs for computer architecture. The test subjects were undergraduate computer science majors. The experimental group performed the visual labs; the control group did not. The experimental group showed a positive increase in attitude toward instructional labs and a positive correlation between attitude towards labs and test performance. [Williams et al., 1993]

For another example of first year students being motivated by a VP language, see [Glinert & Tanimoto, 1984] (p18-19). However, merely motivating the students is only half the task of an educator. Apart from motivating the students, educators also need to train students in the general concepts that can be applied in different circumstances. The crucial case for evaluating VP systems is that VP systems improve or simplify the task of comprehending some conceptual aspect of a program. If we extend the concept of VP systems to diagrammatic reasoning in general, then we can make a case that VP has some such benefits. Larkin and Simon [Larkin & Simon, 1987] distinguish between:

While these two representations may contain the same information, their computational efficiency may be different. Larkin and Simon present a range of problems modeled in a diagrammatic and sentential representation using production rules. Several effects were noted:

A common internal representation for a VP systems is one that preserves physical spatial relationships. For example, Narayanan et.al. [Narayanan et al., 1995] use Glasgow's array representation [Glasgow et al., 1995] to reason about device behaviors. In an array representation, physical objects are mapped into a 2-D grid. Adjacency and containment of objects can be inferred directly from such a representation. Inference engines can then be augmented with diagrammatic reasoning operators which execute over the array (e.g. boundary following, rotation).

Other authors have argue that diagrams are useful for more than just spatial reasoning. Koedinger [Koedinger, 1992] argued that diagrams can support and optimise reasoning since they can model whole-part relations. Kindfield [Kindfield, 1992] studied how diagram used changes with expertise level. According to Kindfield, diagrams are like a temporary swap space which we can use to store concepts that (1) don't fit into our head right now and (2) can be swapped in rapidly; i.e. with a single glance. Goel [Goel, 1992] studied the use of ill-structured diagrams at various phases of the process of design. In a well-structured diagram (e.g. a picture of a chess board), each visual element clearly denotes one thing of one class only. In a ill-structured diagram (e.g. an impressionistic charcoal sketch), the denotation and type of each visual element is ambiguous. In the Goel study, subjects explored

using a well-structured diagramming tool (MacDraw) and a ill-structured diagramming tool (freehand sketches using pencil and paper). Free-hand sketches would generate many variants. However, the well-structured tool seemed to inhibit new ideas rather than help organise them. Once something was recorded in MacDraw, that was the end of the evolution of that idea.
One gets the feeling that all the work is being done internally and recorded after the fact, presumably because the external symbol system (MacDraw) cannot support such operations. [Goel, 1992]

Goel found that ill-structured tools generated more design variants (i.e. more drawings, more ideas, more use of old ideas) than well-structured tools. We make two conclusions from Goel's work. Firstly, at least for the preliminary design, ill-structured tools are better. Secondly, after the brain-storming process is over, well-structured tools can be used to finalise the design.

Evaluating the Arguments for VP

It is not clear which of the above advantages apply to general software or knowledge engineering. Many software engineering or knowledge engineering problems are not naturally two-dimensional. For example, while we write down an entity-relationship diagram on the plane of a piece of paper, the inferences we can draw from that diagram are not dependent on the physical position of (e.g.) an entity.

In terms of the ill-structured/well-structured division, the VP tools I have seen in the KA field are all well-structured tools. That is, they are less suited to brain-storming than producing the final product.

Jarvenpaa and Dickson (hereafter, JD) report an interesting pattern in the VP literature [Jarvenpaa & Dickson, 1988]. In their literature review on the use of graphics for supporting decision making, they find that most of the proponents of graphics have never tested their claims. Further, when those tests are performed, the results are contradictory and inconclusive. For example:

Similar contradictory results can be found in the study of control-flow and data-flow systems.

Given these conflicting results, all that can conclude at this time is that the utility of control-flow or data-flow visual expressions are an open issue.

In other studies, the Green group explored two issues: superlativism and information accessibility (defined above). Subjects attempted some comprehension task using both visual expressions and textual expressions of a language. The Green group rejected the superlativism hypothesis when they found that tasks took longer using the graphical expressions than the textual expressions. The Green group also rejected the information accessibility hypothesis when they found that novices had more trouble reading the information in their visual expressions than experts. That is, the information in a diagram not instantly comprehensible and universal. Rather, such information can only be accessed after a training process.

The Moher group performed a similar study to the Green group. In part, the Moher study used the same stimulus programs and question text as the Green group. Whereas the Green group used the LABVIEW data-flow system, the Moher group used Petri nets. The results of the Moher group echoed the results of the Green group. Subjects were shown three variants on a basic Petri net formalism. In no instance did these graphical languages outperform their textual counterparts.

The Moher group caution against making an alternative superlativism claim for text; i.e. text is better than graphics. Both the Moher and Green groups distinguished between sequential programming expressions such as a decision true and circumstantial programming expressions such as a backward-chaining production rule. Both sequential and circumstantial programs can be expressed textual and graphically. The Moher group comments that:

Not only is no single representation best for all kinds of programs, no single representation is ... best for all tasks involving the same program. [Moher et al., 1993]

Sequential programs are useful for reasoning forwards to perform tasks such as prediction. Circumstantial programs are output-indexed; i.e. the thing you want to achieve is accessible separately to the method of achieving it. Hence, they are best used for hypothesis-driven tasks such as debugging.

VP as Explanation

The core of the case for VP is something like VP lets us explain the inner workings of a system at a glance. This section explores the issue of VP and explanation using the BALSA system.

In the BALSA animator system [Brown & Sedgewick, 1985], students can (e.g.) contrast the various sorting algorithms by watching them in action. Note that animation is more than just tracing the execution of a program. Animators aim to explain the inner workings of a program. Extra explanatory constructs may be needed on top of the programming primitives of that system. For example, when BALSA animates different sorting routines, special visualisations are offered for arrays of numbers and the relative sizes of adjacent entries.

Animators like BALSA may or may not be pure VP systems. BALSA does not allow the user to modify the specification of the animation. To do so requires extensive textual authoring by the developer. BALSA therefore does not satisfy the Rule 2 of pure VP system (defined above).

One drawback with the BALSA system is that its explanations must be hand-crafted for each task. General principles for explanation systems are widely discussed in AI. Wick and Thompson [Wick & Thompson, 1992] report that the current view of explanation is more elaborate than merely print the rules that fired or the how and why queries of traditional rule-based expert systems. Explanation is now viewed as an inference procedure in its own right rather than a pretty-print of some filtered trace of the proof tree. In the current view, explanations should be customised to the user and the task at hand. For example:

Summarising the work of Wick and Thompson, Leake, and Paris, I diagnosis the reason for the lack of generality in BALSA's explanation system as follows. BALSA's explanation systems were hard to maintain since BALSA lacked:

  1. The ability to generate multiple possible explanations;
  2. An explicit user model
  3. A library of prior explanations;
  4. A mechanism for using (2) and (3) to selectively filter (1) according to who is viewing the system.


On the positive side, visual systems are more motivating for beginners than textual systems. In the case of spatial reasoning problems, a picture may indeed be worth 10,000 words [Larkin & Simon, 1987]. Given some 2-D representation of a problem (e.g. an array representation), spatial reasoning can make certain inferences very cheaply. Also, ill-structured diagramming tools are a very useful tool for brainstorming ideas.

On the negative side, beyond the above three specific claims, the general superlativist case for VP is not very strong. Many software engineering and knowledge engineering problems are not inherently spatial. Most of the VP systems I am aware of do not support Goel's ill-structured approach to brainstorming. The JD research suggests that claims of the efficacy of VP systems have been poorly documented. The Moher and Green groups argue that VP evaluations cannot be made in isolation to the task of the system being studied. Lastly, a diagram may not necessarily support information accessibility for knowledge. A good explanation device requires far more than impressive graphics (recall the BALSA case study). Like many of our current approaches for knowledge engineering [Menzies, 1997b, Menzies et al., 1997, Menzies, 1997a], VP systems need to be better evaluated.


Boehm-Davis & Fregly, 1985
Boehm-Davis, D. & Fregly, A. (1985). Documentation of Concurrent Programs. Human Factors, 27:423-432.

Brown & Sedgewick, 1985
Brown, M. & Sedgewick, R. (1985). Techniques for Algorithm Animation. IEEE Software, pages 28-39.

Brown & Kimura, 1994
Brown, T. & Kimura, T. (1994). Completeness of a Visual Computation Model. Software- Concepts and Tools, pages 34-48.

Glasgow et al., 1995
Glasgow, J., Narayanan, H., & (eds), B. C. (1995). Diagrammatic Reasoning : Cognitive and Computational Perspectives. MIT Press.

Glinert & Tanimoto, 1984
Glinert, E. & Tanimoto, S. (1984). Pict: An Interactive Graphical Programming Environment. IEEE Computer, pages 7-25.

Goel, 1992
Goel, V. (1992). ``Ill-Structured Diagrams'' for Ill-Structured Problems. In Proceedings of the AAAI Symposium on Diagrammatic Reasoning Stanford University, March 25-27, pages 66-71.

Green et al., 1991
Green, T., Petre, M., & Bellamy, R. (1991). Comprehensibility of Visual and Textual Programs: The Test of Superlativism Against the ``Match-Mismatch'' Conjecture. In Empirical Studies of Programmers: Fourth Workshop, pages 121-146.

Hirakawa & Ichikawa, 1994
Hirakawa, M. & Ichikawa, T. (1994). Visual Language Studies - A Perspective. Software- Concepts and Tools, pages 61-67.

Jarvenpaa & Dickson, 1988
Jarvenpaa, S. & Dickson, G. (June 1988). Graphics and Managerial Decision Making: Research Based Guidelines. Communications of the ACM, 31(6):764-774.

Kindfield, 1992
Kindfield, A. (1992). Expert Diagrammatic Reasoning in Biology. In Proceedings of the AAAI Symposium on Diagrammatic Reasoning Stanford University, March 25-27, pages 41-46.

Koedinger, 1992
Koedinger, K. (1992). Emergent Properties and Structural Constraints: Advantages of Diagrammatic Representations for Reasoning and Learning. In Proceedings of the AAAI Symposium on Diagrammatic Reasoning Stanford University, March 25-27, pages 154-159.

Kremer, 1998
Kremer, R. (1998). Visual Languages for Knowledge Representation. Submitted to KAW'98: Eleventh Workshop on Knowledge Acquisition, Modeling and Management, Voyager Inn, Banff, Alberta, Canada. To appear.

Larkin & Simon, 1987
Larkin, J. & Simon, H. (1987). Why a Diagram is (Sometimes) Worth Ten Thousand Words. Cognitive Science, pages 65-99.

Leake, 1993
Leake, D. (1993). Focusing Construction and Selection of Abductive Hypotheses. In IJCAI '93, pages 24-29.

Menzies, 1996
Menzies, T. (1996). Visual Programming, Knowledge Engineering, and Visual Programming. In Proceedings of the Eighth International Conference on Software Engineering and Knowledge Engineering. Knowledge Systems Institute, Skokie, Illinois, USA ISBN 0-9641699-3-2. Available from http://www.cse.unsw.edu.au/ timm/pub/docs/96seke.ps.gz.

Menzies, 1997a
Menzies, T. (1997a). Evaluating Issues with Critical Success Metrics. In Banff KA '98 workshop. Available from http://www.cse.unsw.EDU.AU/ timm/pub/docs/97evalcsm.

Menzies, 1997b
Menzies, T. (1997b). Evaluation Issues for Problem Solving Methods. Banff KA workshop, 1998. Available from http://www.cse.unsw.edu.au/ timm/pub/docs/97eval.

Menzies et al., 1997
Menzies, T., Cohen, R., Waugh, S., & Goss, S. (1997). Evaluating Conceptual Qualitative Modeling Languages. In Submitted to the Banff KAW '98 workshop. Available from http://www.cse.unsw.EDU.AU/ timm/pub/docs/97evalcon.

Mintzberg, 1975
Mintzberg, H. (1975). The Manager's Job: Folklore and Fact. Harvard Business Review, pages 29-61.

Moher et al., 1993
Moher, T., Mak, D., Blumenthal, B., & Leventhal, L. (1993). Comparing the Comprehensibility of Textual and Graphical Programs: The Case of Petri Nets. In Empirical Studies of Programmers: Fifth Workshop, pages 137-161.

Narayanan et al., 1995
Narayanan, N. H., Suwa, M., & Motoda, H. (1995). Behaviour Hypothesis from Schematic Diagrams. In Glasgow, J. & N.H. Narayanan, B. C., (Eds.), Diagrammatic Reasoning, pages 501-534. The AAAI Press.

Paris, 1989
Paris, C. (1989). The Use of Explicit User Models in a Generation System for Tailoring Answers to the User's Level of Expertise. In Kobsa, A. & Wahlster, W., (Eds.), User Models in Dialog Systems, pages 200-232. Springer-Verlag.

Scanlan, 1989
Scanlan, D. (1989). Structured Flowcharts Outperform Psuedocode: an Experimental Comparison. IEEE Computer, 6(5):28-36.

Shneiderman, 1983
Shneiderman, B. (1983). Direct Manipulation: A Step Beyond Programming Languages. Computer, pages 57-69.

Swigger & Brazile, 1989
Swigger, K. & Brazile, R. (1989). Experimental Comparisons of Design/Documentation Formats for Expert Systems. International Journal of Man-Machine Studies, 31:47-60.

Wick & Thompson, 1992
Wick, M. & Thompson, W. (1992). Reconstructive Expert System Explanation. Artificial Intelligence, 54:33-70.

Williams et al., 1993
Williams, M., Ledder, W., Buehler, J., & Canning, J. (1993). An Empirical Study of Visual Labs. In Proceedings 1993 IEEE Symposium on Visual Languages, pages 371-373. IEEE Comput. Soc. Press.