comments on the DTD in Nov 16 draft

Paul Grosso (pbg@texcel.no)
Mon, 21 Nov 94 14:05:06 EST

I've got a few comments on the latest spec. Apologies if some
are repeats of other's comments.

section 3.4, page 13:
- the example shows "-//W3O//DTD W3 HTML 2.0//EN" as the
formal public identifier for the HTML DTD. This differs from
the DTD and other places in the spec (e.g., section 6.1.1).

section 3.4.3, page 14:
- in the explanation of "string literal," i suppose it's obvious,
but we might want to add "and not containing any occurrences of
the delimiting character"
- couple lines later, I'd recommend changing
"Some implementations consider any occurrence of the > ..."
to
"Some non-SGML implementations consider any occurrence of the > ..."
so as to make it clear that the SGML standard is not ambiguous on
this point.

section 3.4.3, page 15:
- very first line, I don't understand the need for this use of "
[and I would really dislike the use of " as done in the DTD] when
one can use LITA (that is, a single quote) as the delimiter. While
I'm not opposed to pointing out the possible use of " (though
I would argue against recommending "), I'd like to see the option
of using single quotes pointed out. As written, the text makes it
sound like the use of " is the only option.
- in the NOTE a few lines down, please change "Some implementations"
to "Some non-SGML implementations" (or "Some browser implementations").
- following this note is:
Attributes with a declared value of NAME (e.g. ISMAP, COMPACT)
may be written using a minimized syntax. The markup:
<UL COMPACT="compact">
can be written as
<UL COMPACT>
NOTE: Unless you use the minimized syntax, some implementations
won't understand.
This doesn't make clear to me which form is "minimized." Assuming
strict SGML terminology (where 'minimized' is the opposite of 'minimal'),
the latter form is minimized. In that case, be sure you say "some
non-SGML implementations" since all SGML implementations must understand
the non-minimized (aka minimal, sometimes colloquially called normalized)
form. And, if this is the case, it's too bad, because some SGML tools
only understand (and even more--the great majority, in fact--only produce)
minimal aka normalized form.

section 3.4.4, page 15:
- first paragraph says "HTML generators should generate strictly conforming
HTML." How/where is this defined--I mean above and beyond the DTD? In
particular, what about my previous point? Is an SGML editor that produces
normalized SGML (and therefore does NOT produce minimized syntax for
attributes such as UL's COMPACT) generating strictly conforming HTML?
- next paragraph, for non-understood tag names, I recommend amending
...behave as though,
in the case of a tag, the whole tag had not been there
but its content had, or in the case of an attribute,
that the attribute had not been present.
to read:
...behave as though,
in the case of a tag, both the start tag (including its entire
attribute specification list) and matching end tag (if any) had not
been there but its content had, or in the case of an attribute,
that the entire attribute specification (for that attribute)
had not been present.

section 3.12.2, page 30:
- in the item about line boundaries, if the statement is meant to match
what SGML defines, I would recommend amending:
* Line boundaries within the text are rendered as a move to
the beginning of the next line, except for one immediately
following or immediately preceding a tag.
to
* Line boundaries within the text are rendered as a move to
the beginning of the next line, except for one immediately
following a start tag or immediately preceding an end tag.
[record end handling in SGML is potentially more complex, but my
suggested modification should make things practically (and I mean
that in both senses) accurate.]

section 3.12.5, page 32:
- i think there's an error in one of the META examples. in:
<META EXPIRES HTTP-
EQUIV="Expires">Tue, 04 Dec 1993 21:29:02 GMT</expires>
<META HTTP-EQUIV="Keywords" CONTENT="Fred, Barney">
<META HTTP-EQUIV="Reply-
to" content="fielding@ics.uci.edu (Roy Fielding)">
I can't figure out the "EXPIRES" (the DTD gives no attribute named "expires"
from what I see). Of course, the line break in the middle of HTTP-
EQUIV isn't valid, but I'm assuming that's just a formatting glitch.

section 5.1.3, page 48:
- "nbsp;" -> "&nbsp;"
- "Soft-hypen" -> "Soft-hyphen"

section 6.1.1, page 53:
- I've commented in earlier email about the FPIs and the ISOlat issue.
(I've also posted some explanatory text to be inserted in this section.)

section 6.2.1, page 61:
- in the heading, "Defintions" -> "Definitions"
- This whole section scares me a bit. As I wrote elsewhere, I'd rather
just reference the ISO set. If we want to publish the byte numbers
in the HTML spec that may be used by some browsers, we can do that,
but that's just a question of display-tool-dependent encoding of
the standard ISO character entities. And other tools may not need
or want to use (or be able to use) that particular encoding.
The whole point is that authoring/editing tools should write HTML
documents--and browsers should read/process HTML documents--using
the character entity references defined in the ISO character set
such as &Aacute;. I don't see it as part of the definition of HTML
to tell tools what potentially device-dependent replacement text they
must use.
- If we are going to create our own entity set, we cannot include the
ISO copyright. I recommend we also do not use %ISOlat1; as the
example entity name.

paul

Paul Grosso
VP Research Chief Technical Officer
ArborText, Inc. SGML Open

Email: paul@arbortext.com
or pbg@texcel.no