SGML/HTML (long)

Nathan Torkington <Nathan.Torkington@vuw.ac.nz>

Mail folder: WWW Talk Jul-Oct 1993
Next message: Nathan Torkington: "Semantics of <Hn>"
Previous message: Terry Allen: "Re: Annotations and SGML structure"

Date: Thu, 12 Aug 1993 08:53:41 +1200
From: Nathan Torkington <Nathan.Torkington@vuw.ac.nz>
Message-id: <199308112053.AA12618@kauri.vuw.ac.nz>
To: www-talk@nxoc01.cern.ch
Subject: SGML/HTML (long)
Status: RO

Something I've been giving a lot of thought to recently is the issue
``is HTML a presentation or a semantic encoding'', and one justifiable
view is that if you want semantic encoding you should be writing in
another DTD that represents the semantics you want (DocBook, ManPage,
Memo, HypertextThingie) and translating it to a presentation language
for presentation.

This is because HTML can no way, no how, encode all the semantics that
Real Live Authors want.  The whole point of SGML is that you can
encode all the semantics you want, and leave presentation as a later
translation.  To force the author into the very limited semantics
provided by HTML (headings, lists, emphasis, paragraphs) is to render
useless the enormous amount of effort that has gone into designing the
various DTDs.  I guess that hypertext links would have to be regarded
as common to all the DTDs, to provide hypertextishness.

This is one extreme.  The other extreme is to regard it as a semantics
only encoding, in which case it is going to have to be a bloody big
DTD to cope with all the semantics that authors typically have.  This
also would mean that we should ship style sheets with documents;
authors need control over presentation.  To remove this	control from
authors is rude, and to claim that the whole point of SGML is that
authors don't have to worry about it is only half-true --- SGML
separates the presentation from the semantics, but doesn't dictate
that authors can't control the presentation.

The midway is a compromise, and is what we have now.  It encodes some
minimal set of semantics (currently represented by paragraphs, titles,
emphasis and (unlikely) headings) and some minimal set of abstract
presentation (headings, bold, preformatted).  The minimal semantics
are geared to presentation and don't utilise the full semantic power
of SGML.  The presentation markup is abstract, for example it
specifies headings, but not the point-size and font.

I wouldn't mind seeing the first option implemented, and a set of
useful DTDs and conversion tools made available (or encode the
conversion tools to the server).  I'm willing to be convinced
otherwise, though.

Nat.