Re: SGML confirming? and HTML conformance testing

Daniel W. Connolly (
Mon, 17 Apr 95 16:36:39 EDT

Arthur van Hoff writes:
> I'm using what I think is the standard HTML-2 DTD.

"the standard DTD"? The document is an internet draft -- not
even a proposed standard yet.

We need a way of reliably identifying these things. I tend
to use the RCS ID. My copy is labelled:

$Id: html.dtd,v 1.25 1995/03/29 18:53:13 connolly Exp $

In general, the date of the DTD tends to be a reliable identifer.
For internet-draft versions of the DTD, use the date of the

> It is causing me
> a lot of trouble because it doesn't describe many of the existing
> documents.

Well... we tried! If you have specific suggestions, this is the
place to air them!

> A lot of documents have images in <pre> content, this
> is not allowed by that DTD.

Oops! I thought that was fixed. I'll be sure it's fixed in
the next draft.

> I hope you are right. We should think about how we are going to make
> this happen. I hope that the HTML3 standard is going to be A LOT
> stricter.

Do you think making the spec stricter will help? Will that stop folks
from doing whatever their browsers support? I think that will only
make the problem (that is the spec doesn't match reality) worse. I
hope to see lots more validation tools released to support authoring
of conforming sgml documents. Until we make it easier (i.e. more
cost-effective) to create conforming SGML documents than to create
erroneous documents, erroneous documents will be the norm.

> But when I read the spec it still says things like "The <P>
> tag [in <PRE>] should be avoided, but for robustness, user agents are
> recommended to treat these tags as line breaks". Am I supposed to write
> my own DTD that accepts <P> in <PRE>?

That's one option.

I have been thinking for a while that we need two DTDs: one for what a
browser is expected to be able to render, and one for what an
editor/conversion tool is expected to be able to understand. The
browser DTD would be just "tag soup," but the editing DTD would do
more to take advantage of features of SGML, to allow more robust
document management.

But I haven't found time to do the work. In the mean time, try the
%HTML.Prescriptive and %HTML.Deprecated flags. Perhaps <P> should
be allows in <PRE> in non-strict mode.

I hope to begin an HTML conformance testing push as soon as the HTML
2.0 document is out the door. I hope to run a service like the HTML
validation service, but with options like "please add this document to
the public test suite" and "please give a rationale for why this
document doesn't parse" and perhaps even "I suggest this change
to the DTD."

I don't have resources for this lined up right now. Volunteers are
welcome to apply!