Re: A thought on implementation...

Erik Naggum <erik@naggum.no>
Errors-To: listmaster@www0.cern.ch
Date: Thu, 17 Feb 1994 16:30:44 --100
Message-id: <19940217.0690.erik@naggum.no>
Errors-To: listmaster@www0.cern.ch
Reply-To: erik@naggum.no
Originator: www-talk@info.cern.ch
Sender: www-talk@www0.cern.ch
Precedence: bulk
From: Erik Naggum <erik@naggum.no>
To: Multiple recipients of list <www-talk@www0.cern.ch>
Subject: Re: A thought on implementation...
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
Content-Length: 2788
(I'm still busy, so this is just a quick reply to Dan Connolly's message of
yesterday.)

Briefly put: The problems that people seem to have with SGML appears to
come from the idea that SGML is so much like text that it can be treated as
text _without_ markup.  This is not true.

That one should be able to process an SGML file without doing any SGML
parsing is not a particularly good idea, nor is it desirable.  Rather than
roll your own primitive and dysfunctional SGML processors, why not use an
actual SGML parser?  Some of them are heavy-weights, and some insist on
being run as separate processes, but the project I'm working on has shown
that one can build a small and conforming SGML parser that is also fast.
Commercial implementations are usually built to be separate programs that
run script languages to do conversion to some other language for use by
other applications.  This has made it very hard for people to use "native
SGML" in their applications, for things to which that SGML would otherwise
be eminently suited.  I find HTML to be a very interesting example of such
an application, but I am concerned with the amount of heuristics involved
in the "pseudo-parsing", and particularly concerned that more of the same
heuristics-based approach is suggested for HTML+.

The idea that comments should be disallowed has one major disadvantage: If
I can't use comments or marked sections in my HTML+ files, it means I must
have two copies, one published, the other not.  Such comments and marked
sections are necessary if I want to keep important information about the
document available over its lifetime.  Since we are not talking about
snapshort documents and ephemeral information _all_ the time, lack of this
feature may be an untenable situation in practice.  Marked sections are
sometimes the only means to "comment out" a larger block of text that
already has comments in it, or hyphens that could terminate the comments
prematurely.  One of the beauties of HTML today is precisely that one can
reference the source documents without any of the stupid pre-processing and
multiple sources that is required in other hypertext systems, which remain
closed systems because of these limitations.

A restriction on the legality of SGML constructs also means that we can no
longer use ordinary SGML tools to test for conformance to the HTML+ DTD and
document conventions, but will have to build new tools to validate already
valid SGML documents.  I maintain that this is a very bad idea.  The cost
of the alternative is small in comparison.

Please note that my main mail address is <erik@naggum.no>.

Best regards,
</Erik>
--
Erik Naggum <erik@naggum.no> <SGML@ifi.uio.no>  |  Memento, terrigena.
ISO 8879 SGML, ISO 10744 HyTime, ISO 10646 UCS  |  Memento, vita brevis.