Re: SGML, HTML and CS:

"Daniel W. Connolly" <connolly@hal.com>
Date: Mon, 12 Sep 94 14:39:20 EDT
Message-id: <9409121840.AA02453@ulua.hal.com>
Reply-To: connolly@hal.com
Originator: html-wg@oclc.org
Sender: html-wg@oclc.org
Precedence: bulk
From: "Daniel W. Connolly" <connolly@hal.com>
To: Multiple recipients of list <html-wg@oclc.org>
Subject: Re: SGML, HTML and CS: 
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
X-Comment: HTML Working Group (Private)
In message <9409121012.AA04381@curia.ucc.ie>, Peter Flynn writes:

>> Oh, my brother: would that you were wrong! After spending about two
>> weeks reading the SGML standard, one realizes that SGML provides few
>> features above and beyond lex/yacc. It is disheartening to realize that
>> a technology that should represent one man-month to implement actually
>> requires more like a man-year or two. There should have been a libSGML
>> years ago that would, by  now, be in /usr/lib on every machine on
>> the planet.
>
>Right. But I'd venture to say that the SGML spec is more robust than
>one for lex or yacc (I've never seen a spec for either), which have an
>unerring tendency to fall flat on their faces at critical times.

OK, so lex and yacc are not commercial-grade software. But there are
many commercial grade compiler-building toolkits based on the same
technology. Flex and Bison are pretty good, for example. As to a spec,
how about the Dragon Book?

	[For those unfamiliar:
		Compilers -- Prinicpals, Techniques, and Tools
		by Aho, Sethi, and Ullman
		Addison-Wesley
		ISBN 0-201-10088-6 ]

Lex and Yacc are actually specified in tech reports from Bell
Labs. Convex still distributes reprints of these tech reports in their
collection of tutorial papers for ConvexOS. I still use them. The yacc
paper is cited as:

	S. C. Johnson, Yacc: Yet Another Compiler Compiler, Computing
	Science Technical Report No. 32, 1975, Bell Laboratories,
	Murray Hill, NJ, 07974.

>> Amen, brother. You're preaching to the choir. Now: break out your time
>> machine, go back a few years and talk TimBL out of basing HTML on SGML
>> (or maybe it was me that really made the connection between HTML and
>> SGML -- but it was Tim's idea). Better yet, go back 10 or 15 years
>> and teach the SGML committee about compiler technology and automated
>> parsing.
>
>No good. The problem is that SGML had to pass the ISO cttees to make
>IS, so it's written in ISO-ese. Plus a lot of the groundwork done by
>Charles G was done in the days of old IBM mainframe technology, which
>is a maze of twisty little passages all alike, compared with "normal" :-)
>Unix-based CS today, which is a maze of twisty little passages all
>different :-)...

I have heard this argument -- "SGML was designed before anybody knew
about automated parsing" -- and I just don't buy it. The dragon
book has a 28 page bibliography including the original work by Chomsky:

	Chomsky, N. [1956]. "Three models for the description of language,"
	IRE Trans. on Information Theory IT-2:3, 113-124

That's right: 1956. There's a paper by Church from 1941. This stuff
was not novel in 1986 when SGML became a standard, nor during the 10
previous years when it was being developed. The designers of SGML simply
failed to do their homework.

> I've signed with Van Nostrand Reinhold to do a book on 
>network publishing with WWW. I hope that this will complement the docs
>that Dave is writing with A-W.

Cool. Keep us posted!

Dan