Re: HTML 2.0 editing status

Terry Allen (terry@ora.com)
Mon, 5 Sep 94 14:27:09 EDT

| Status: OR
|
| In message <199409031449.HAA06296@rock>, Terry Allen writes:
| >
| >The DTDs don't parse together without generating errors
| >about duplication. This may be unavoidable given the structure
| >involved, and they're really warnings rather than errors,
| >but it will be unsettling to many. It would be much nicer
| >if no errors or warnings were generated; I'd almost prefer
| >3 DTDs, at least for the final cut.

I take back that last remark.

| Could you give some details? I agree that the usage of the
| varous DTD fragments is underdocumented, but when used as intended,
| they produce no warnings nor errors for me. Try the html validation
| service, for example.

I'm using sgmls -degruv and for html-0.dtd I get:

sgmls version 1.1
sgmls: In file included at litl, line 1:
Warning at ./html-0.dtd, line 114 in declaration parameter 4:
Duplicate specification occurred for "%block"; duplicate ignored
sgmls: In file included at litl, line 1:
Warning at ./html-0.dtd, line 245 in declaration parameter 4:
Duplicate specification occurred for "%html.content"; duplicate ignored

and if you look at the DTD you see that

<!ENTITY % block "P | %list | DL
| PRE | BLOCKQUOTE %block-2">

is defined at l. 114, but above,

<![ %HTML.Obsolete [
<!ENTITY % block "P | %list | DL
| PRE | XMP | LISTING
| BLOCKQUOTE %block-2">
]]>

however,

<![ %HTML.Prescriptive [
<!ENTITY % HTML.Obsolete "IGNORE">
]]>

<!ENTITY % HTML.Obsolete "INCLUDE"
-- marks things that may disappear in future revisions -->

so I'm rather confused, because

<!ENTITY % HTML.Prescriptive "IGNORE"
-- marks things that may become standard in future revisions -->

So we end up with the Obsolete and the normal entity definitions,
thus the warning. Something's amiss here, or I don't have the
right versions of the DTDs.

I am also concerned that the target audience, developers without
much experience with SGML, will find these double negatives
troublesome.

| >The DTDs are named html-0, html-1, and plain html. The last
| >ought to be html-2, shouldn't it?
|
| The names of the files containing the DTD are arbitrary.

They appear to be meaningful here, hence should be consistently
meaningful. I have another solution, but for completeness I'll
follow out this point.

| The "full"
| DTD file is called html.dtd to make it convenient to parse documents
| that start with:
| <!DOCTYPE HTML ...>
| Where ... might be any number of idioms, including nothing at all, i.e.
| <!DOCTYPE HTML>

See below. You are using an sgmls hack here.

| I would take this declaration to mean "gimme the current version of the
| HTML DTD."

But here we're defining 3 current versions.

| Anyway... it's just more convenient in practice to have something
| called html.dtd. Perhaps html.dtd should be a synonym (implemented
| as a symlink?) for a file called html-2.dtd.

It may seem silly to continue flogging this horse, but then naming is
always contentious. There are lots of "HTML" DTDs floating around;
here we are setting everyone up for confusion if what we call the
HTML 2.0 DTD is called html.dtd *in a set that includes html-0
and html-1*. I should think HTML2.0.DTD would be about right.

My solution to the three-DTD problem is to fold -1 and .
into -0, as marked sections, with the basic content models
supplied with empty parameter entities that expand within
those marked sections. Then there's only one file named
.dtd, and the IGNORE/INCLUDE operations are simple. Example
on request.

| > Seems to me that if we're documenting current
| >practice (approximately) we can't very well mark anything
| >Obsolete,
| My view is that in fact, XMP and LISTING are obsolete in current
| practice. Their actual definition is not expressible in SGML, and
| they are only supported through backwards-compatibility hacks.

I see that from the doc, but then to be strict about it they
can't appear in any DTD. Would it not do just as well to leave
them in and deprecate their use? Are we willing to say that
for Level 2 these elements may not be used at all? If not,
let's eliminate these marked sections. Proposed stuff should
fall into Level 1 or 2 or be eliminated.

The use of Prescriptive might be avoided by making these changes
between versions of the DTD. As it stands now, these categories
(Proposed, Prescriptive) crosscut the 0, 1, 2 DTD structure,
making it possible to have Level 1 with or without Prescriptive,
etc. Let's collapse those categories into the 0, 1, 2 sequence
of changes, if possible. Then the only marked sections would
be the ones including Level 1 stuff and (within that) including
Level 2 stuff.

Maybe the structure of Proposed content models is too complex
for that to work; Dan, you would know best.

I also think that without the wider discussion that will follow
publication, it is unwise to get prescriptive. We are *describing*
here, in order to get to a more advanced stage than the present
mess. Let's get there first, then decide where to go. For
example html-0 now says:

<!ENTITY % htext "A | %text" -- Plus links, no structure -->
<![ %HTML.Prescriptive [
<!ENTITY % A.content "(%text)+"
-- The standard content model for A allows nonsense like:
<H2><A>xyz<H1>h1</H1></A></H2>
-->
]]>

I stoutly deny that is nonsense: it makes a hot spot with
certain formatted text within it. That may make sense for
some display advertising; I can see no other way to get hot
spot text in large type. We'll never hear from the people who
are twisting around HTML to get special effects such as that
until after we publish a new DTD and they rise up in wrath,
because we've taken away something they need.

| Yet there are lots of instances where the _usage_ of XMP and LISTING
| is compatible with an SGML definition. So I left it in the DTD.
| It's a coin toss.
| I have a test suite that guides me through these issues. I have test
| cases with <XMP> tags. If we take <XMP> out of the DTD, I will have
| to move those cases to the "errors" section of the test suite. Perhaps
| I should have an "obsolete" section of the test suite.
| Whether HTML.Obsolete is INCLUDE or IGNORE is a coin-toss, if you ask
| me. But I like to have those sections in there for those test cases
| involving XMP, LISING, etc.
| Perhaps I could maintain a separate DTD for testing purposes, but
| I don't think that's a good idea, and I hope you don't either.

I quite see your point, but the published DTD(s) don't need these
testing constructs if the point of the testing is to figure out
how to arrive at the published DTD(s). Or do you see this as a
useful feature, to be included in the published DTD?

| > Eliminating these entities would also reduce the
| >number of errors reported due to redefinitions and nesting
| >of marked-section entity definitions.
| I still think you're just not using the DTD files as intended.
| If you're using sgmls, set your SGML_PATH to
| ./%N.dtd:%N.sgml
| and put all 5 files (html*.dtd, html.decl, ISOLat1.sgml) in the
| current directory, and you should be all set. Validate with:
| sgmls -s html.decl foo.html
| if foo begins with
| <!DOCTYPE HTML>
| or if it doesn't, create a file, html-prologue.sgml:
| <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//2.0">
| and validate with:
| sgmls -s html.decl html-prologue.sgml foo.html

This is a hack, relying on a certain behavior of sgmls that I
don't use. If you like to use it, fine, but in my usage the name of
the DTD doesn't have to match the DOCTYPE, and I don't want to
have to redefine my SGML_PATH to get such behavior; I strongly
suspect that that would break other setups I run.

The distribution has 3 files called DTDs. They all have
FPIs:

html-0.dtd: "+//ISBN 82-7640-037::WWW//DTD HTML Level 0//EN//2.0"
html-1.dtd: "+//ISBN 82-7640-037::WWW//DTD HTML Level 1//EN//2.0"
html.dtd: "+//ISBN 82-7640-037::WWW//DTD HTML//EN//2.0"

(and isn't that 037 not part of the publisher prefix?)

I must be able to parse against html-1 if, for example, I want
to exclude Forms (if I start with html.dtd they're already
defined). And as the DTDs have FPIs, I ought to be able to parse
them as PUBLIC, not SYSTEM (as you are doing with sgmls), by
their public names, not their filenames. (I don't do that
in practice, but if DTDs are to become distributed someone
will have to start doing it as a matter of practice.)

| >Finally, the doc is so chunked that it is needlessly difficult
| >to navigate. This may be a religious issue, so I don't expect a
| >change,
| It's a good suggestion, but (if I undstand your suggestion correctly)
| it's a sweeping editorial change, the kind of thing that takes a lot
| of time to implement, and greatly destabilizes the document, creating
| a need to completely re-review the document. Think carefully before
| you advocate this.

I'm not advocating a change, only making an observation. Chunked
hypertext is opaque because it can't be scanned quickly.

| On the other hand, if you're just talking about cutting the document
| into fewer, larger, HTML nodes, I suppose this can be easily accomplished
| through a WebMaker configuration option.

That wouldn't hurt. In particular, flattening the directory tree
would be a help. However, what I really want is to know the
linear sequence the doc will take as an RFC, so I can read it
in that order and look for info (such as how to use the marked
sections in the DTD) in sequence, while knowing that I'm traversing
the entire set of nodes. If HTML_TOC describes that, I'm happy.

-- 
Terry Allen  (terry@ora.com)   Editor, Digital Media Group
O'Reilly & Associates, Inc.    Sebastopol, Calif., 95472