Re: SGML, HTML and CS:

Daniel W. Connolly (
Mon, 12 Sep 94 14:39:20 EDT

In message <>, Peter Flynn writes:

>> Oh, my brother: would that you were wrong! After spending about two
>> weeks reading the SGML standard, one realizes that SGML provides few
>> features above and beyond lex/yacc. It is disheartening to realize that
>> a technology that should represent one man-month to implement actually
>> requires more like a man-year or two. There should have been a libSGML
>> years ago that would, by now, be in /usr/lib on every machine on
>> the planet.
>Right. But I'd venture to say that the SGML spec is more robust than
>one for lex or yacc (I've never seen a spec for either), which have an
>unerring tendency to fall flat on their faces at critical times.

OK, so lex and yacc are not commercial-grade software. But there are
many commercial grade compiler-building toolkits based on the same
technology. Flex and Bison are pretty good, for example. As to a spec,
how about the Dragon Book?

[For those unfamiliar:
Compilers -- Prinicpals, Techniques, and Tools
by Aho, Sethi, and Ullman
ISBN 0-201-10088-6 ]

Lex and Yacc are actually specified in tech reports from Bell
Labs. Convex still distributes reprints of these tech reports in their
collection of tutorial papers for ConvexOS. I still use them. The yacc
paper is cited as:

S. C. Johnson, Yacc: Yet Another Compiler Compiler, Computing
Science Technical Report No. 32, 1975, Bell Laboratories,
Murray Hill, NJ, 07974.

>> Amen, brother. You're preaching to the choir. Now: break out your time
>> machine, go back a few years and talk TimBL out of basing HTML on SGML
>> (or maybe it was me that really made the connection between HTML and
>> SGML -- but it was Tim's idea). Better yet, go back 10 or 15 years
>> and teach the SGML committee about compiler technology and automated
>> parsing.
>No good. The problem is that SGML had to pass the ISO cttees to make
>IS, so it's written in ISO-ese. Plus a lot of the groundwork done by
>Charles G was done in the days of old IBM mainframe technology, which
>is a maze of twisty little passages all alike, compared with "normal" :-)
>Unix-based CS today, which is a maze of twisty little passages all
>different :-)...

I have heard this argument -- "SGML was designed before anybody knew
about automated parsing" -- and I just don't buy it. The dragon
book has a 28 page bibliography including the original work by Chomsky:

Chomsky, N. [1956]. "Three models for the description of language,"
IRE Trans. on Information Theory IT-2:3, 113-124

That's right: 1956. There's a paper by Church from 1941. This stuff
was not novel in 1986 when SGML became a standard, nor during the 10
previous years when it was being developed. The designers of SGML simply
failed to do their homework.

> I've signed with Van Nostrand Reinhold to do a book on
>network publishing with WWW. I hope that this will complement the docs
>that Dave is writing with A-W.

Cool. Keep us posted!