How to do conformance testing?

Daniel W. Connolly (
Mon, 12 Dec 94 12:49:13 EST

Way back in March or May when I first started drafting HTML 2.0, I
intended to instigate a conformance test suite to go along with the

I planned to archive a test case for every issue that came up: during
the discussion of the spec, the test case would either go in the
positive test suite or the negative test suite, depending on whether
the idiom was standard or an error.

Somebody recently raised this issue on this list...

I do have a test suite, though it hardly covers every issue that has
come up along the way. See:

That public version is somewhat out of date w.r.t. my development
sources, but you can get the idea...

I'm sure other vendors keep regression suites to test their browsers
too. Perhaps we could all leverage each other's work.

The question is: how do we actually test conformance? Given a document
and an implementation, what do I do to prove that the implementation
handles the document "correctly"?

One thing to test is that the document was parsed correctly. We could
do something like the SGML RAST stuff: the implementation would write
out a canonical form of the parsed document. But I doubt current
implementations do stuff like default attribute value handling in
their parsers -- so we'd have to do something slightly different from

But correct parsing wouldn't show that browsers interoperate with
respect to issues like:

* newline handling in <PRE> and <TEXTAREA>

hmmm... I can't think of any other interoperability issues that haven't
been taken care of by making them illegal.

So is there any benefit to conformance testing for HTML? Do we just
gather a big test suite and say "Have at it!" or do we specify some
what to show that an implementation passes the tests?

As food for thought, I'll include a little perl ditty I wrote to
exhaustively enumerate the (current-state, next-input-char) transition
table of a lexical analyzer for HTML. I've used it to debug several
HTML lexer/parser implementations.

for sample output.

It would be useful to take something like the tree generated by Earl
Hood's dtd2html tools and build an HTML document (or set of documents)
that exhaustively enumerate the (current-state, next-input-tag)
transition table of a tag parser for HTML.

# $Id:,v 1.2 1994/05/01 05:01:59 connolly Exp $
# perl >states.html
# This script builds a killer html file that enumerates all the
# parsing states of an HTML parser.

$rcsid = '$Id:,v 1.2 1994/05/01 05:01:59 connolly Exp $';
$rcsid =~ s/$//g;

print <<EOF;
<title>HTML Implementors' Guide: parse state enumeration</title>
<!-- generated by $rcsid -->

%names = (
"\t", 'tab',
"\n", 'newline',
" ", 'space',
"!", 'bang',
'"', 'quote',
'#', 'hash',
'&', 'amp',
"'", 'tic',
"-", 'dash',
".", 'dot',
"/", 'slash',
"0", 'digit',
"<", 'lt',
"=", 'eq',
">", 'gt',
"?", 'question',
'A', 'letter',
'[', 'lsqb',
']', 'rsqb',

@productions = (
'<IMG SRC="" ALT=\'\' ISMAP>',
'<UNKNOWN attr=val>d<br>d</UNKNOWN>',
'<P><UNKNOWN attr=val>d<br>d</UNKNOWN></P>'

@chars = sort keys %names;

%seen = ();

$pn = 0;

foreach $p (@productions){
$ln = 0;
foreach $l (1..length($p)){
$base = substr($p, 0, $l);

next if $base =~ /[a-zA-Z]$/ && substr($p, $l, 1) =~ /[a-zA-Z]/;

next if $seen{$base}++;

$cn = 0;
foreach $c (@chars){
next if "$base$c" eq '<>'; # drives SGMLs nuts
print "$pn.$ln.$cn $n: $base$c >;\"'>/--> ",
"<!--", $names{$c}, "--><br>\n";