Re: SGML confirming?

Arthur van Hoff (Arthur.Vanhoff@Eng.Sun.COM)
Mon, 17 Apr 95 15:50:52 EDT

Hi Craig,

> More seriously,
>
> > > I'm trying to use a DTD driven SGML parser to read HTML documents but I
> > > am finding that the HTML standard has diverged significantly from the
> > > SGML standard.
>
> Please be more specific.

I'm using what I think is the standard HTML-2 DTD. It is causing me
a lot of trouble because it doesn't describe many of the existing
documents. A lot of documents have images in <pre> content, this
is not allowed by that DTD.

> What version of the HTML DTD are you using ? Are the documents that you
> are trying to read, stated as being compliant with that version of HTML?
> I find that there are substantial variations in the degree to which different
> browsers react to conventions they don't recognize... someone on this list
> stated that handling all the common HTML buglike-features that could inhibit
> SGML parsing had doubled or tripled the work of supporting HTML.

Right. That is what I am finding out.

> I don't doubt that there is a lot of crap out there pretending to be HTML,
> I have written some of this myself under duress, but this is one of the
> things that the standard is supposed to help rectify.

Ok, but in the mean time I have to write an SGML compliant parser
that can deal with those documents. The end result is a not an SGML
parser but an HTML parser.

> I should note that C++ templates were specified in 1989 but not supported
> by the most popular compilers on the Windoze platforms until 1994... it is
> often a very long haul before the big guys actually support an accredited
> standard. If HTML becomes basically incompatible with SGML, however, it is
> pretty clear that an SGML web could arise from its ashes fairly easily...

I hope you are right. We should think about how we are going to make
this happen. I hope that the HTML3 standard is going to be A LOT
stricter. But when I read the spec it still says things like "The <P>
tag [in <PRE>] should be avoided, but for robustness, user agents are
recommended to treat these tags as line breaks". Am I supposed to write
my own DTD that accepts <P> in <PRE>?

Have fun,

Arthur van Hoff (avh@eng.sun.com)
http://java.sun.com/people/avh/
Sun Microsystems Inc, M/S UPAL02-301,
100 Hamilton Avenue, Palo Alto CA 94301, USA
Tel: +1 415 473 7242, Fax: +1 415 473 7104