(no subject)

Daniel W. Connolly (connolly@hal.com)
Thu, 1 Dec 94 19:57:54 EST

In message <G00002C4F01DEC199409104113*@MHS>, A.RUITER@ELSEVIER.nl writes:
>1. General criticism on the draft
>
>In general we have observed, in our meeting of 11 November, that the
>draft document is full of typing errors.

I suggest you be more specific. Everyone working on the draft is doing
their best. If you see typing errors, they are (still) there because

- we haven't had time to fix them yet, or
- we weren't aware of them

You may want to browse the archives to see if a particular mistake has
already reported -- if not, by all means, report it here.

I made some suggestions a while ago about how I'd like to see
comments come in. Perhaps it's time to repeat them:

http://www.acl.lanl.gov/HTML/html-archive.messages/1.html
=====================================
Date: Thu, 09 Jun 1994 08:03:59 -0500
From: "Daniel W. Connolly" <connolly@austin2.hal.com>

Here are some more and less effective ways to comment on
the 2.0 spec:

* "I'd like to see HTML extended to include ..."
This is out of the scope of the 2.0 effort. We're trying
to be largely descriptive of current practice. www-talk
is still an OK forum for these comments... Dave Ragget
is always listening. But I have blinders on.

* "Here's a proposal for an HTML extension ... [proposed
spec extension included]"
I might have time to stick this in as a proposed feature.
No guarantees.

* "In http:...., you wrote XXX, which is wrong. What
actually happens is YYY"
I'll try to address these, but this format doesn't
save me any time -- it only creates work for me to do.

* "The http:.... node isn't quite right. Here's a replacement
[or diffs] and a few test cases to demonstrate the
subtleties"
Bingo. You're nearly guaranteed to get your comments in
if you submit them this way.

Thank you for your support.
=====================================

> Furthermore we noticed that
>HTML 1 was designed by someone with little or no knowledge of SGML,
>and that traces of this are still visible in HTML 2 (capacities,
>literal and name lengths, ...)

I'm not sure what sort of response this comment calls for.

If, by HTML 1, you refer to the original HTML language: that was
designed by Tim Berners-Lee. When I was first exposed to the WWW
project, I compared the source code and documentation that TimBL
developed with the SGML specification, ISO-8879:1986, and surmised
that TimBL was not intimately familiar with that document -- that he
had designed HTML in terms of his intuitions about SGML that he had
developed while working with some SGML-based tools at CERN.

It turns out that this is true. So you are right in pointing out that
HTML was designed by someone who was not well-versed in the subtleties
of SGML.

However, the specification of capacities and name lengths was not a
critical issue in the design of WWW. The critical design issues have
been a matter of public record since very early in the project[1].

Among the critical features of HTML is platform independence[2]. This,
and other requirements[3], motivated the choice of SGML as a document
representation. I believe TimBL is to be praised for his forsight in
this decision.

[1] "HyperText Design Issues"
http://info.cern.ch/hypertext/WWW/DesignIssues/Overview.html

[2] "HyperText Design Issues: Availability"
http://info.cern.ch/hypertext/WWW/DesignIssues/Availability.html

[3] "About document formats (Design Issues)"
http://info.cern.ch/hypertext/WWW/DesignIssues/Formats.html

My area of expertise is in formal systems -- grammars, logic,
specifications, proof systems, etc. I took it upon myself to refine
the specification of HTML in the WWW materials to bring it into
conformance with the SGML specification:

From: "Toward a Formalism for Communication On the Web"
$Id: html-essay.html,v 1.2 1994/02/15 20:07:12 connolly Exp $
http://www.hal.com/%7Econnolly/drafts/html-essay.html
>
>Thus I chose for my battle to find some formal relationship between
>the SGML standard and the HTML that was "out there." The quest was:
>
> Find some DTD such that the vast majority of HTML documents
> are instances of that DTD, conversely, such that all its
> instances make sense to the existing WWW clients.
>

I believe the current SGML declaration and DTD are a reasonable
solution to the above problem. Evidence to the contrary is welcome.

>The dtd for HTML 2 will probably parse -- at least we assume the
>editor has done this -- but it is definitely a badly designed dtd:
>
>- the dtd displays an odd mixture of form and structure;

Odd in what way? Could you be more specific? I believe the structure
of HTML was originally influenced by the RichText object in the
NeXTStep programming environment which TimBL used to develop the
original WWW implementation.

The same RichText sequence-of-paragraphs structure is the basis of
almost all WYSYWIG word-processing systems. I conjecture that this
structure is used to represent the vast majority of casual
computer-mediated communication these days.

The sequence-of-paragraphs structure has been enhanced somewhat since
then to accomodate idioms like nested lists (probably as a result of
support for this idiom by Mosaic and the LaTeX2HTML tools).

In any case, the DTD is meant to _describe_ the way HTML is used
today, not to _prescribe_ some structure that should be used.

>- there are three elements, |<head>|, |<body>| and |<html>|, that allow
>tag omission for both the start and the end tag;

This is consistent with both conventional usage of HTML and with the
SGML specification. Could you explain why this is representative of a
"badly designed dtd"?

>- many deprecated HTML features are included in the dtd, using the mechanism
>of marked
>sections in a dtd.
>
>We do not like the many uses of entities in the dtd; we believe that
>removing these will improve readibility.

It might, but it would do so at the cost of functionality. I believe
it is valuable to be able to validate documents according to the
various levels, and according to the recommended "strict" mode in
addition to the standard mode. This belief is corroborated by several
messages to this mailing list from authors and designers of authoring
systems. Response from the HTML Validation Service[4] also supports
this belief.

[4] "HaLsoft HTML Validation Service"
http://www.hal.com/%7Econnolly/html-test/service/validation-form.html

>A great deal of confusion is caused by the various interpretations of
>the paragraph tag, |<p>|, within this one draft document; see
>page 13 of the draft for a good example. The question is: ``Is
>|<p>| the start or the end of a paragraph?''

<p> is the start of a paragraph. If the draft doesn't make that clear,
we are open to suggestions as to how it could be made more clear.

>2. Page-by-page criticism on the draft
>
>11: Descriptions of browsers (applications of the standard) and
>exceptions do not belong in an RFC.

I agree. Unfortunately, the consumers of the HTML spec need to know
this information, and right now, there is no place else to get it.

So while I don't believe we are in a position to address this comment
in the 2.0 draft, I believe the descriptions of browsers, servers,
etc. will migrate to other documents in subsequent revisions.

>17: Why is namelen 72? In standard SGML, i.e. the Reference Concrete
>Syntax (RCS), 8 suffices. It will probably suffice for HTML 2 as well.

BLOCKQUOTE is more than 8 characters. Hence we cannot use the RCS.

HTML follows the precedent of CALS, IBM InfoMaster, DocBook, and
numerous other SGML applications in using a concrete syntax with
NAMELEN increased.

The number 72 is admittedly arbitrary, as noted in html.decl:

NAMELEN 72 -- somewhat arbitrary; taken from
internet line length conventions --

Could you be specific as to why this presents a problem?

>18: If a feature is deprecated, just don't allow it in the dtd!
>27: There is no point in stating ``this is discouraged''. Either allow
>something in the dtd or don't!

This would cause too much discrepancy between current practice in HTML
and the specification. "Too much" is a relative term. I don't have any
hard-and-fast evidence to cite here. I can only say that I believe the
current DTD reflects the rough consensus of this working group.

If you have a specific alternative proposal, please submit it.

>19: If browsers do not support short tags etc., why not put ``shorttag
>no'' in the SGML declaration?

Browsers support idioms such as <IMG ISMAP src="xxx">, which would be
illegal if SHORTTAG NO were in the SGML declaration.

>19: The length of an attribute value is limited to 1024 characters,
>whereas in the RCS 12 is used. This will limit the length of the
>|alt| attribute, but we believe this should be an element, since
>it is really the caption of a figure.

In my testing, I found several documents with attribute values longer
than the RCS value of 128. So I concluded that in current practice
LITLEN should be higher than 128.

The value 1024 is something of a compromise. I believe that current
implementations actually don't have a limit on the length of an
attribute. But SGML requires that we specify some maximum. I chose
1024 probably because that was the maximum supported by some version
of the sgmls parser.

>22: What is the relation between links, head, isindex and anchors? How
>should links be represented?
>
>24: An anchor must have either an href or a name attribute, but we are
>aware that this cannot be expressed in SGML. A remark of this kind
>should be inserted in the documentation. On the whole, the explanation
>of links and anchors should be made more clearly.
>25: We do not understand the semantics of the methods attribute to
>|<a>|.

The order and occurence of the links, head, isindex, and A elements
are given in the DTD. The semantic relationship between them is not
clearly specified in the draft; this is part of the specification of a
WWW user agent which is given informally and incompletely in the HTML
spec.

On the other hand, some linking model could be specified in a manner
that's independent of a description of browser functionality. I have
never taken the time to write such a model up, though I agree that
this would be a useful addition to the spec.

I don't expect this situation to be completely resolved in the HTML
2.0 draft for reasons of expediency. If you see something in the draft
which is clearly inaccurate or misleading, rather than just
incomplete, please call it out.

>26: Shouldn't there be a |<p>| immediately after the |<h1>|? This will
>not parse!

In "strict" mode, it won't parse. In standard mode, you should note
that #PCDATA is in the content model of BODY, and the example does
parse.

I agree that it is a bad example, given that we are suggesting that
folks start paragraphs with <p> tags.

>26: Headings h1--h6 do not indicate heading levels, but merely
>heading styles. If levels are what you want, the dtd needs to be
>rewritten.

The DTD can only specify order and occurence. The "application
conventions" accompanying the DTD specify any additional semantics of
an SGML element type. The draft specifies that h1-h6 indicate levels.

If you have a proposal as to how the DTD could be rewritten to capture
these intentions -- without conflicting too badly with current
practice -- we'd be glad to review it.

>28: The explanation of nested fonts should be rewritten. The present
>explanation is unclear, since it looks as if two different examples
>give the same output. It should be made absolutely clear that this
>can only happen when a client does not have certain fonts. We suggest
>that first an example is given of correct output, and then an example
>of what a client can do when certain fonts are missing.

That's not the intent of the spec. The intent is that author's
shouldn't count on <b> nested in <i> to produce bold-italics, because
in practice, testing shows that this is not the case on all
implementations. (See Corprew Reed's test results on nested
highlighting in the html-wg archives.)

>29: Sections 3.8 and 3.9 show considerable overlap. What is the
>difference between |var|, |kbd| and |tt|?

These elements derive from the TeXinfo system. I sent out a complete
description to the list recently. You're not the first to comment on
this. Evidently the prose needs work. Suggestions are welcome (I
think!).

>30: |alt| should become an element (a caption for a figure).

Completely out of scope for 2.0. But see the FIG element of html 3.0...

>31: We suggest that HTML 2 has only one generic list element to cover
>|dit|, |menu| and |ul|, with an attribute to indicate the presentation
>type. This makes maintenance of documents much easier.

Again, out of scope for 2.0.

>36: What is |</expires>|? December 4, 1993 was not a Tuesday.

This is a known problem. See Mr. Buchard's recent (and not so recent
:-) comments on the FORMS sections.

>53: Hyphenation is not explained satisfactorily.

How should hyphenation be explained? I don't see how hyphenation
is relavent to the HTML spec at all.

>66: Is it really necessary to use |%version_attr| in this obscure way?

It's not absolutely necessary, but I find it useful to have the DTD
version available in the ESIS. I got the idea from the folks who
developed the Rainbow DTD, who probably got it from some other SGML
application.

Does it cause problems?

>Why is the HTML version a public identifier?

No particular reason. Should it be something else?

>89: The list of terms is incomplete.

I agree. But could you be more specific?

Thank you for your comments.

Dan