Re: comments on the DTD in Nov 16 draft

Daniel W. Connolly (
Mon, 21 Nov 94 14:23:05 EST

In message <>, Paul Grosso writes:
>Use of public text display version field in FPIs

Excellent argument. Makes me glad I insisted that this list
be archived.

> In other words, I would recommend that the
>version info (be it version number and/or revision date) should be
>part of the public text description field. This would mean that
>the FPIs in the DTD and in various places in the spec (including
>in the sample SGML Open entity catalog) would look more like:
> PUBLIC "-//IETF//DTD HTML//EN" html.dtd
> PUBLIC "-//IETF//DTD HTML 2.0//EN" html.dtd
> PUBLIC "-//IETF//DTD HTML Level 2//EN" html.dtd
> PUBLIC "-//IETF//DTD HTML 2.0 Level 2//EN" html.dtd
> -- Ways to refer to Level 1: most general to most specific --
> PUBLIC "-//IETF//DTD HTML Level 1//EN" html-1.dtd
> PUBLIC "-//IETF//DTD HTML 2.0 Level 1//EN" html-1.dtd
> -- Ways to refer to Level 0: most general to most specific --
> PUBLIC "-//IETF//DTD HTML Level 0//EN" html-0.dtd
> PUBLIC "-//IETF//DTD HTML 2.0 Level 0//EN" html-0.dtd

The current scheme is a result of an e-mail exchange between
myself and Erik Naggum, and not based on any serious research
or experince.

Now that I can see more clearly what all this means, I second
the above proposal. I'll make the change this week, barring objections.

>Creating a copy of ISO Added Latin 1 character entity declaration set
>Why is the HTML spec creating a new character entity set that is
>basically identical to the ISO set (as far as I can tell) and giving it
>a different FPI?

I never was sure about the right way to spell this in SGML, but
I'm very clear on the meaning:

The ISO Added Latin 1 entity set declares the szlig entity as:

<!entity szlig SDATA "[szlig ]">

(or pretty close to that.) As I understand it, that means that the
markup &szlig; referes to "your system's represenation of the german
double-s ligature."

In contrast, in HTML, the markup '&szlig;' is specified to be
equivalent to the markup: '&#223;'. There's nothing system-defined
about it. Hence I thought it was wrong to refer to the ISOlat1
public text.

> [I'm reading section 3.15.2 of the spec.] I don't
>think we should do that. I'm guessing it might have something to
>do with the device-specific replacement text one might wish to have for
>the entities (and if that is the case, we shouldn't change the
>base FPI, but instead make use of the public text display version

OK... so we should be using something like: ???

-- ISO latin 1 entity set for HTML --
PUBLIC "ISO//ENTITIES Added Latin 1//EN//HTML" ISOlat1.sgml

> but I can't really get into details of a suggestion without
>understanding more the reason behind this.

With the above carification, could you "get into the details of
a suggestion" please?

>Use of + versus * occurrence indicator on content models with #PCDATA
>while the use of the * instead of the + helps remind the casual
>reader at a glance that P's can be empty. Therefore, I have
>heard it recommended that the * be used instead of the + in
>these cases for the increased clarity it provides. I would make
>this recommendation throughout the DTD. My quick scan notes
>the following declarations:
><!ELEMENT (%font;|%phrase) - - (%text)+>
> <!ENTITY % A.content "(%text)+"
><!ENTITY % A.content "(%heading|%text)+">
><!ELEMENT P - O (%text)+>
><!ELEMENT ( %heading ) - - (%text;)+>
><!ELEMENT PRE - - (%pre.content)+>
><!ELEMENT DT - O (%text)+>
><!ELEMENT OPTION - O (#PCDATA)> [I'd make it ... (#PCDATA)*> ]
><!ELEMENT TITLE - - (#PCDATA)> ditto

Okie dokie.

>Non-compliant use of parameter entities
>There are several occurrences of non-compliant use of parameter entities
>in the latest DTD. In brief, you cannot define a parameter entity with
>"dangling" connectors such as "| FORM | ISINDEX".

Jeez... learn something new every day. It was when I first began
playing with parameter entities that I discovered that SGML is like
quantum mechanics: you have to let go of your intuition in order to
get anywhere. It's important to forget anything you know about cpp,
m4, and other macro processors before attempting to grok SGML.
Why is a question that I value my sanity too greatly to ask.

>There are several ways to rearrange things to be valid. Here I
>make one suggestion:

OK... I'll try to incorporate something like that. But I don't have
any way to be sure I've gotten it right. Perhaps I'll mail it to you
so you can run it through a sufficiently anal-retentive parser before
my next release.

>Why not use LITA?
> I recommend the above line be replaced with either:
><!ENTITY % version.attr 'VERSION CDATA #FIXED "%HTML.Version;"'>
> or (equivalently)
><!ENTITY % version.attr "VERSION CDATA #FIXED '%HTML.Version;'">

As I recall, this excercised a bug in a perl script that I use
to extract info from the HTML DTD. I'll see if I can fix the
perl script and remove this wart.

>CDATA terminated by any ETAGO
>The comment associated with the declaration for %literal; is somewhat

Please suggest an alternative.

> Since the use of CDATA for element content models can lead
>to surprises [I'm glad this is deprecated], I think it's important to
>make the comment clearer.
>The fact is that an element whose content model is CDATA is terminated
>by ANY end tag--even an invalid one! That is, if the string "</"
>occurs in CDATA content, the current element will be terminated.

Well... </[a-zA-z], to be precise. And as long as we're picking nits... :-)

>Furthermore, there is no way for a user or editing interface to
>escape the "</" to allow, for example, SGML code to be shown within
>such an element. That is why I usually recommend the use of RCDATA
>for such content models. This would allow the escaping of "</"
>and "&" by using entity references.

That's not the way it's implemented in the zillion browsers that
are deployed. We cannot re-write history; we can only "deprecate"
it :-)

Speaking of history... this is probably the issue that started this
whole HTML spec effort! It's fun to go trolling though the old
www-talk archives* to see that these same issues have been there since
day one:
Subject: HTML DTD enclosed
Date: Wed, 15 Jul 92 22:35:19 CDT
From: Dan Connolly <>

<!-- BUG:
tags.html says that you can put anything but </XMP> in the
text of an XMP element. SGML says that ETAGO, "</" ends a CDATA

If you've followed any of the recent safe-tcl, tcl vs scheme/python/perl
threads on usenet lately, this should provide a chuckle:

To: (Robert Raisch)
Subject: Re: Links, Types and Documents (Third time's a charm)
In-reply-to: Your message of "Mon, 22 Jun 92 13:30:55 EDT."
Date: Mon, 22 Jun 92 14:38:26 CDT
From: Dan Connolly <>

> 6. Execution
> -- when activated, some arbitrary function is performed
> The point that was mentioned about the lack of an
> ubiquitious scripting language is well made. Lisp
> is too arcane for most. Shell languages are too
> platform specific. What is needed is a simple
> to understand, freely available scripting platform.
> Although I hesitate to mention it, REXX might be
> a reasonable choice due to it's broad availability.
Ah... if you want commentary, state an arguable thesis. No one can argue
against a platitude like "What is needed is a simple to undertand,
freely available scripting platform." I vote for some brand of Lisp, perhaps
XLisp or ELK.

Daniel W. Connolly "We believe in the interconnectedness of all things"
Software Engineer, Hal Software Systems, OLIAS project (512) 834-9962 x5010