History of html.dtd [Was: Hot Metal and HTML ]

"Daniel W. Connolly" <connolly@oclc.org>
Date: Wed, 15 Jun 94 11:45:15 EDT
Message-id: <9406151544.AA00648@ulua.hal.com>
Reply-To: html-ig@oclc.org
Originator: html-ig@oclc.org
Sender: html-ig@oclc.org
Precedence: bulk
From: "Daniel W. Connolly" <connolly@oclc.org>
To: Multiple recipients of list <html-ig@oclc.org>
Subject: History of html.dtd [Was: Hot Metal and HTML ]
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
X-Comment: HTML Implementation Group

I'll address Tim's constructive proposals in a separate message, but I
feel the need to respond to Tim's comments on the history of the DTD.

In message <9406150919.AA02162@www3.cern.ch>, Tim Berners-Lee writes:
>(note capital HTML -- Dan changed the filename for some reason too)
>is now the original while we sort this out.  

The original filename was html.dtd:

 $Id: html.dtd,v 1.2 1992/12/03 02:04:29 connolly Exp $

The reason for this is that the entity manager in the version of SGMLs
that I was using at the time mapped certain names to lower case before
looking them up in the unix filesystem. I'm not sure if it still does
that or not, but that's why the name was originally "html.dtd". I don't
know why you changed it to HTML.dtd.

>Sorry, my mistake for folding in Dan's proposals without testing them all.

If you had tested them, you would have found that, for example,
the fragment "<h1>head</h1>para" is legal by each version of the
DTD that I have tested and released.

I keep a detailed log of all the changes I make to this stuff. I
have a log of html.dtd going back to December of 1992. Perhaps
it would be interesting for folks to have a look at it...

RCS file: /u/connolly/cm/web/html-spec/html.dtd,v
Working file: html.dtd
head: 1.16
locks: strict
access list:
symbolic names:
	timbl-review: 1.13
	draft-iiir-html-01: 1.6
comment leader: "# "
keyword substitution: kv
total revisions: 19;	selected revisions: 19
formal specification of HTML
revision 1.16
date: 1994/06/13 20:55:50;  author: connolly;  state: Exp;  lines: +13 -378
Split HTML DTD into three parts:
	html.dtd -- level 2 version, which includes
		html-1.dtd -- level 1 version, which includes
			html-2.dtd -- level 0 version
revision 1.15
date: 1994/06/03 22:09:14;  author: connolly;  state: Exp;  lines: +1 -3
backed out some HTML.phrase stuff cuz it excercises a bug
in one of my tools.
revision 1.14
date: 1994/06/03 20:02:52;  author: connolly;  state: Exp;  lines: +24 -15
* Changed public identifier to W30

* Started messing with Level0 feature test entities.
revision 1.13
date: 1994/05/18 17:23:29;  author: connolly;  state: Exp;  lines: +23 -10
diff -b -w -u -r1.5 html.decl
diff -b -w -u -r1.12 html.dtd

* Revised comments
* Moved HTML.Version to top of html.dtd
revision 1.12
date: 1994/05/17 21:07:53;  author: connolly;  state: Exp;  lines: +65 -45
ISOlat1: changed entities from text entities (which get parsed
redundantly) to CDATA entities, which matches the semantics
of the implementation I'm developing.

Removed references to doctype-mosaic.
Changed public identifier to reference no particular version of the
DTD (sneaky...)

Added links to a few interesting things.


removed "255 1 UNUSED" stuff, as per somebody's suggestion...
WEK or somebody... can't exactly remember.

Added comment about RE vs SEPCHAR stuff...


* Added comments in the HTML.Prescriptive marked section
  moved bogus second public identifier to another file...

* Removed bogus isindexatend, HTML.GO, HTML.aEndOmissable stuff

* added bodyBlockOnly feature test

* changed KEY, U to feature-test-controlled elements

* added a @@ note about relative HREF's

* changed BASE HREF attr to be required

* changed amp, lt, etc. entities to be CDATA entities,
so they don't get parsed at runtime.

* moved obsolete elements after forms

updated w.r.t KEY, U
revision 1.11
date: 1994/04/30 03:17:56;  author: connolly;  state: Exp;  lines: +32 -13
* doctype-mosaic.sgml is obsolete: The "default" version of
the DTD parses everything now... (after a few tweaks here and there)

* Changed BODY to allow %htext, making <P> start tags not necessary,
but keeping <P> as a container. This means I don't really need
a separate mosaid mode any more.

* Moved several of the features that are incompatible with extant
docs under a %HTML.Prescriptive feature set.

* Added HTML.Version entity, for use by code generation tools.
This entity varies according to the feature set used.

* Changed the "default" mode of some features:
+<!ENTITY % HTML.font-phrase "INCLUDE"

* Changed several attribute names to coincide with their
values, for a hacked version of attribute minimization
support in libwww.

* Added %block to ADDRESS content

* Added a #DEFAULT entity so that undefined entities are legal
revision 1.10
date: 1994/04/19 17:24:06;  author: connolly;  state: Exp;  lines: +2 -1
added &quot;
revision 1.9
date: 1994/04/14 01:23:26;  author: connolly;  state: Exp;  lines: +31 -10
After testing a few more files, inc. some from ORA GNN.

Added a few new feature test entities:
> 	-- The GO element used in ORA GNN. What is this??? -->
> <!ENTITY % HTML.aEndOmissable "IGNORE"
> 	-- infer </A> tags, as in ORA GNN stuff -->
> <!ENTITY % HTML.isindexAtEnd "IGNORE"
> 	-- allow ISINDEX after HEAD and BODY, as in ORA GNN stuff -->

NEXTID is "on" by default -- it doesn't hurt anything, I guess.

ISINDEX is allowed in %body-content if HTML.forms is "on"
@@ Hmmm... this means one must search the whole doc, not just
the head, to see if it's an index. Bad.

Expanded the BLOCKQUOTE content model to include all sorts
of %block stuff -- not just P and ADDRESS.

Fixed missing * in FORM content model.
revision 1.8
date: 1994/04/09 01:02:10;  author: connolly;  state: Exp;  lines: +275 -128
* Added feature test entities for stuff that's handled different ways
by different HTML implementations or specifications.

* Removed %headelement, %bodyelement, %oldstyle, in favor of
using OMITTAG to infer <HEAD>, <BODY> tags.

* Changed %URL to %URI, and cited specification

* Revamped %linkattributes in light of feature test entities

* Revamped HTML, HEAD, elements in light of feature test entities

* Anchor names may or may not be ID's based on the HTML.anchorNameID
feature-test entity.

* Changed %inline to be composed of %phrase and %font, where
%font is controlled by %HTML.font-phrase

* Changed EM, CODE, SAMP, etc. from (#PCDATA) to (%htext)+

* Added P, BR to %text

* replaced %stext with %block and %htext

* Changed BODY, A content models.

* Added BR element

* Changed DL content model to (DT*, DD?)+, changed DT, DD from EMPTY
to containers with omissable end tags. This match all the cases I
found during testing.

* Changed OL, UL, etc. similarly

* Replaced ISO latin 1 entity declarations with an entity reference

* Added FORMs

* Removed emacs local-variable cruft
revision 1.7
date: 1994/04/01 19:21:25;  author: connolly;  state: Exp;  lines: +3 -98
branches:  1.7.2;
Extracted the DTD from the <!SGML .. ><!DOCTYPE [ ...DTD... ]>
stuff, and put the SGML declaration in a separate file.

The DTD can now be used in the more traditional:

	<!DOCTYPE HTML SYSTEM "html.dtd">

revision 1.6
date: 1994/03/30 02:29:15;  author: connolly;  state: Exp;  lines: +211 -109
DTD as released in draft-iiir-html-01.txt
revision 1.5
date: 1994/03/30 02:28:06;  author: connolly;  state: Exp;  lines: +164 -148
The DTD as I originally released it.
revision 1.4
date: 1993/02/03 21:30:13;  author: connolly;  state: Exp;  lines: +148 -164
checked in with -k by connolly at 1994/03/30 00:56:13
revision 1.3
date: 1993/01/07 00:38:36;  author: connolly;  state: Exp;  lines: +66 -39
checked in with -k by connolly at 1994/03/30 00:36:49
revision 1.2
date: 1992/12/03 02:04:29;  author: connolly;  state: Exp;
checked in with -k by connolly at 1994/03/30 00:20:44
date: 1994/04/07 00:33:25;  author: connolly;  state: Exp;  lines: +115 -57
This DTD represents current practice as represented
by a random sampling of docs, mostly from NCSA.

Added forms
Changed <A NAME= attr to NMTOKEN (yuk!)
Rearranged lots of stuff.
date: 1994/04/04 23:58:38;  author: connolly;  state: Exp;  lines: +5 -4
Fixed a couple wierdo's with ADDRESS and such.
date: 1994/04/01 20:30:17;  author: connolly;  state: Exp;  lines: +38 -95
Changed P, LI, DT, DD from EMPTY to containers.

Changed BODY, DL, etc. to have ELEMENT content.

Changed lists to allow embeded lists.

Removed NEXTID element -- should be a processing instruction.

Note: The BR feature should be represented as an entity &br;
that expands to a processing instruction <? break line>
rather than an element <br>.

The PRE style newline handling should be a different

Changed content model of %inline elements to inlcude A.

Changed content model of A to ANY.

B, I, U, TT only allowed inside PRE. EM, STRONG, etc.
not allowed inside PRE.

ISOLat1 stuff moved to separate file.

removed emacs local variables.
date: 1994/04/01 20:07:22;  author: connolly;  state: Exp;  lines: +5 -16
took out a few obsolete features... moved them to html-compat-doc.sgml