Yes, the end user market doesn't know or care about how the input was parsed.
Nor should they, in principle. However, if their access to tools is drastically
cut down because the expense to develop HTML-compatible tools is artificially
inflated (by the need to write non-SGML-parsers etc.), they will start noticing.
But by then it will be too late.
An education effort to this effect has to start *now*, with major magazines,
papers, etc., it wouldn't hurt to have a program go out, read every publicly
posted page of HTML, and issue parsing failure reports to its webmaster...
if this can be made reliable enough not to destroy its own reputation, like
maybe 'we found problems, email xxxx@nitpicky.org to see what they were'.
> had to add specific code to our parser to duplicate specific common HTML/SGML
> parsing bugs in order to be able to convince people that our browser isn't
> broken (and we still get bug reports that amount to "page X is displayed
> wrong--I checked in both Netscape and Mosaic!").
NCSA Mosaic, to my knowledge, intends to adhere to the HTML standard itself.
This is the starting point for any compliance effort. Netscape has something
to gain from promoting incompatibilities that are expensive to duplicate (can
you say 'the next IBM'? 'the next Microsoft'?) so this must be kept in mind.
Not that individuals there are of ill will about it but that they will not in
general be highly motivated to keep things compliant and help others catch up.
I don't want to see the nightmare of C++ compilers duplicated, where for five
years the major compilers were out of step with the standard (Microsoft only
added templates, fully defined by 1990 and easy to support, in 1995!). This
kind of thing retards the tools market, like any non-compliance with a standard.
> We're probably going to go the route of Arena and put up a big red
> "ILLEGAL HTML" sign whenever we hit something, and maybe offer to pop up
> a bug report form detailing exactly where and how the page is illegal.
A great idea. Especially if it automatically is forwarded to the webmaster
(under the user's name) and isn't obviously from a vendor's automatic parser.
As long as there is a way to write the page so that it does not require or
exploit the parsing bug, I don't thing people have much trouble with this.
It's only when you *cannot* be both in compliance and compatible with popular
tools that problems arise.
> I think you'll see vendors starting to enforce strict SGML compliance just
> as soon as the WWW community at large decides it's a feature instead of a bug.
This will happen as soon as the WWW community at large is generally using a
variety of tools from minor vendors which have based them on SGML parsers.
Perhaps the SGML community should band together and create a compliance body,
similar to X/Open, to provide branding labels 'HTML 3.0 compliant' to vendor
products.
> We can get away with taking a stronger stance than some vendors because we
> don't make all our money off of HTML software, but even we have to be
> pragmatists or we lose sales and reputation.
Of course. This is not a problem you can be expected to solve alone.
>We spent about twice as much
> time getting our parser to swallow common illegal HTML than we did getting
> it up and running on legal content.
This is a fascinating statistic, you are saying that the effort of supporting
illegal HTML has tripled the cost of your development effort... how much of
this was SGML-compliance issues, i.e. tag form, character set, etc. ? The
answer should provide sufficient motivation for small guys to stick with SGML.
--
Craig Hubley Business that runs on knowledge
Craig Hubley & Associates needs software that runs on the net
craig@passport.ca 416-778-6136 416-778-1965 FAX
Seventy Eaton Avenue, Toronto, Ontario, Canada M4J 2Z5