Re: SGML confirming? and HTML conformance testing

Arthur van Hoff (Arthur.Vanhoff@Eng.Sun.COM)
Mon, 17 Apr 95 19:15:27 EDT

Hi Daniel,

> > I hope you are right. We should think about how we are going to make
> > this happen. I hope that the HTML3 standard is going to be A LOT
> > stricter.
>
> Do you think making the spec stricter will help? Will that stop folks
> from doing whatever their browsers support? I think that will only
> make the problem (that is the spec doesn't match reality) worse. I
> hope to see lots more validation tools released to support authoring
> of conforming sgml documents. Until we make it easier (i.e. more
> cost-effective) to create conforming SGML documents than to create
> erroneous documents, erroneous documents will be the norm.

If the HTML standard is defined poorly then there will be poorly
defined documents. Had the HTML standard been stricter (ie a subset of
SGML) there would have been much fewer illegal documents. I know this
places a burden on both the parser implementor and the author of HTML
text, but right now I am faced with "fixing" my SGML parser so that it
implements all the peculiarities of HTML.

To give an example, http://www.hpl.hp.co.uk/people/dsr/html3/HTMLandSGML.html,
specifies how to resolve inconsistencies between HTML and SGML. THIS
IS WRONG! Resolving inconsistencies means that you will allow them in
the future, which means that the number of inconsistent HTML documents
will only grow.

Take for example the fact that a lot of implementations allow any
character that is not a space or '>' in unquoted attribute values. As
a result everybody specifies URLS without quoting them. But SGML
clearly specifies that you are only allowed to use name characters!

Another example. Most parsers end a tag when at the first '>', even
when it occurs inside an quoted attribute value. This case is
explicitly mentioned in the HTML3 spec and users are suggested to use
> to escape '>' in tags. It appears that the spec forces you to
allow both. Which means that my parser will not be SGML compliant,
because the HTML spec was not enforced strongly enough.

I would like to see a HTML specification that is a strict subset of
SGML. What is otherwise the point of using SGML? It would mean that
there are a lot of invalid documents out there, but these will have to be
updated eventually. In the end there should be one HTML standard, and
not just a bunch of interpretations of the standard.

Have fun,

Arthur van Hoff (avh@eng.sun.com)
http://java.sun.com/people/avh/
Sun Microsystems Inc, M/S UPAL02-301,
100 Hamilton Avenue, Palo Alto CA 94301, USA
Tel: +1 415 473 7242, Fax: +1 415 473 7104