Re: SGML confirming? and HTML conformance testing

Craig Hubley (
Tue, 18 Apr 95 19:26:12 EDT

> >I would like to see a HTML specification that is a strict subset of

Properly speaking HTML is an application of SGML, or more usefully, HTML
is an instance of an SGML-compliant markup language... hate to be a pedant
but I have yet to see a trade journalist get this right.

It is true that tags defined in an HTML DTD are a subset of those that could
be possibly defined in SGML, or which have been defined in all SGML DTDs to
date. However this falls somewhat short of saying that the language itself
is a subset; HTML's syntax/parsing must be *exactly the same* as that of
SGML to be considered compliant, and no shortcuts on this level are possible
Even acceptable abbreviations, such as the SHORTTAG issue, are specified very
exactly in SGML.

Summarizing my reasons for believing that we should stick strictly to SGML:

- The work of the SGML committees, early adopters (including DoD and FCC),
implementors and vendors, etc., could not possibly be duplicated in this
century. Anyone who thinks they have a 'better answer' to problems of
large scale document management can take it up with all of the above. I
for one have to consider the SGML answer the 'best answer' that could be
agreed upon, until the SGML community itself finds/adopts another answer.

- If parsers are unable to reliably recognize SGML tags, or nonstandard HTML
tags (which will be inevitably added by vendors - and may not follow SGML
syntax if this precedent is not absolutely clear), they cannot ignore them
reliably either. In short order this could easily destroy the compatibility
and reliability of the Web, as browsers continually fail to parse HTML that
was created with other browsers in mind. If vendors will only add nonstandard
tags that are in strict SGML format, this can be rendered mostly harmless.

- Even if one believes the global Web is doomed, an SGML-compliant HTML standard
must exist anyway as the basis for more robust/reliable internal applications
that could choose not to permit the full range of browsers and tools used. It
would be a great disservice to users of HTML applications on LANs to let HTML
deviate from SGML... for one thing it means two document management systems.
Internally-produced-and-used HTML is far more easily kept SGML compliant, and
organizations can decide to standardize on tools that meet this requirement.

- SGML is the basis of most practical modern document management systems. To
let HTML diverge from SGML means two different markets for document management.
You want to save trees, keep HTML compliant with SGML...!

- Hundreds of vendors already support SGML! Probably as many support 'just
HTML' by this point, but they are generally not organizations with ten or
more years' worth of experience with large scale document management tools.
The experience of one SoftQuad or OpenText is worth fifty Netscapes here.
On the other hand there are presentation-side issues that SGML was simply
never designed to cover, but MIME extensions and PDF types cover that well.
There is no reason to believe that parsing 'innovations' or 'shortcuts'
beyond those specified in SGML, could do anything to aid presentation.

> >SGML. What is otherwise the point of using SGML?
> I agree, though the question will probably be endlessly debated...

If there are issues remaining in SGML that HTML is in a position to qualify,
or where HTML requires a stronger statement or uniform implementation decision,
these issues should be taken up with the SGML committee(s). However there is
no excuse for failing to research the SGML precedents and understand the
priorities that drove the SGML committee to do what it did. That experience
is priceless.
Craig Hubley Business that runs on knowledge
Craig Hubley & Associates needs software that runs on the net 416-778-6136 416-778-1965 FAX
Seventy Eaton Avenue, Toronto, Ontario, Canada M4J 2Z5