Re: SGML, HTML and CS:

lee@sq.com
Mon, 12 Sep 94 14:07:05 EDT

Just a few notes after Dan's article, with my own view of the subject...

There's a paper, I think by Sandy Mamrak, on how you can't use Yacc and Lex
to parse full SGML. Your script only does concrete syntax... if I were to
change the definition of NMCHAR (in the SGML declaration) it'd fall over.

Well, you know that, of course, I just want to warn people that SGML isn't
restricted to Yacc's LL1 (or is it LR1? Aaaargh!) grammar, and is not
context-free. Worse, the lexical class of each token is not fixed, and
tokens can be redefined on the fly. You can include name characters in
the markup declaration open sequence, for example, so that
begin(H1)
can be made the same as
<H1>
just by changing the SGML declaration.

And then there are the idiosyncracies -- for example, an element declared
as #EMPTY cannot have an end tag, but elsewhere an omitted end-tag is treated
differently.

SGML is not rigorously defined. It is not beautiful. It is not easily
implemented. It is not, in itself, `good'.

BUT it is an international standard. There is a lot of software that
supports it, with more being announced at Seybold tomorrow :-)
As long as you stick close to the reference concrete syntax, you can generally
interchange SGML documents between programs with very little editing.
SGML does not solve all of the problems of file transfer. It gives you a
way of representing structured objects on a data stream, and that's all.

But it turns out that that's a lot. It turns out that it's very useful.

Yes, almost anyone with a computer science or mathematical background looks
at SGML and wants to run away, or to change it (if very brave). But it's
a standard and you can't. On the other hand, that's the most important
difference I see between SGML and RTF: both describe structured data, but
RTF as interchange turns out to be very weak. And with the group of
computer programmers on the net making their own changes to RTF, and the
various versions of MS Word making incompatible changes and yet not supporting
structures more than 2 levels deep, it's getting worse.

When `ISO syndrome' was mentioned at the IETF meetings, I couldn't help
thinking of SGML, even though the people at the time were really thinking
of OSI and X400.

Unlike OSI, SGML has caught on, and has proved itself to be useful.
Perhaps one day there will be a rigorously defined subset of SGML that can
be implemented reliably using proper progam-proving techniques. For now,
commercial SGML software is pretty solid, and there are a few pieces of
non-commercial software such as SGMLS.

I suppose this is a place for a reminder: when it comes to passing round
sections of the draft for people to edit, I offered a copy of Author/Editor,
our SGML editor, or of HoTMetaL Pro (if it's in HTML) to the people doing the
work, if it will help. This offer still stands; let me know if you are asked
to do editing of the draft in SGML and want an SGML-native tool.

Lee

-- 
Liam Quin, Manager of Contracting, SoftQuad Inc +1 416 239 4801 lee@sq.com
HexSweeper NeWS game;OPEN LOOK+XView+mf-fonts FAQs;lq-text unix text retrieval
SoftQuad HoTMetaL: ftp.ncsa.uiuc.edu:Web/html/hotmetal, and also doc.ic.ac.uk:
packages/WWW/ncsa/..., gatekeeper.dec.com:net/infosys/Mosaic/contrib/SoftQuad/