Re: HTML todo list

Dan Connolly <connolly@pixel.convex.com>
Message-id: <9301121901.AA15286@pixel.convex.com>
To: timbl@nxoc01.cern.ch
Cc: www-talk@nxoc01.cern.ch
Subject: Re: HTML todo list 
In-reply-to: Your message of "Tue, 12 Jan 93 18:28:58 +0100."
             <9301121728.AA01295@www3.cern.ch> 
Date: Tue, 12 Jan 93 13:01:52 CST
From: Dan Connolly <connolly@pixel.convex.com>

>>  In http://info.cern.ch/hypertext/WWW/FAQ/List.html
>>  4. HTML should support QUESTION and RESPONSE elements to
>>  support converting USENET FAQ's to HTML
>
>Not sure about that one. Ther are several ways of mapping FAQs onto  
>HTML as it is ... I would prefer to see for example a MENU of  
>questions, each linked to the answer in a separate document which had  
>its question as for example a H1 heading and TITLE.

Well then maybe the USENET FAQ project needs a separate DTD. I'm
willing to table the issue for now.

>>  Special character entities?
>>  Yeah! It uses numeric character references already!
>No -- it used named entities.  I'll leave it

So lt, gt, and amp are "Deprecated" rather than "Obsolete", that
is, they are not recommended, but they will be supported. In that
case, we should update the DTD to include them.


>>  12. Default text: this node seems to confuse lots of issues. I  
>suggest
>>  we get rid of it. The way to look at HTML is as a stream of data  
>and
>>  markup. Newlines are handled differently all over the place. It  
>might
>>  be reasonable to talk about how newlines are handled by the text
>>  formatter, after they have been handed over from the SGML parser.
>
>
>People writing SGML don't want to know about parser and formatters  
>(an arbitray distinction which is very questionable in the definition  
>of a DTD or SGML -- it is only relevant to the definition of the  
>software interface to an SGML engine)

The distinction between parsers and formatters (i.e. applications)
is very much defined by SGML: a conforming application is not allowed
to act on anything but the ESIS. For example, it's illegal to
treat attribute values delimited by single quotes differently from
those surrounded by double quotes, because that information is
not reported by the parser. The same is true for newlines: it's
illegal to treat
<foo>
content
</foo>
different from <foo>content</foo> because the difference is
not reported by the parser (unless we do some shortref magic
to force the parser to report the difference.)

In any case, I think "people writing SGML" is the group for
whom an understanding of these issues is most critical.
They should be referred to the implementors' guide. This
business of "default text" or "Character Data" is thoroughly
discussed in "Text and Markup" under "Parsing content into
data and markup" and in "Recommended Usage" under "Body elements."

I wrote the "Text and Markup" node to replace this "Default Text"
or "Character Data" node, and I still think the node does
more harm than good.


>>  In http://info.cern.ch/hypertext/WWW/MarkUp/Tags.html
>>  13. This text is out of place: 
>
>>  Each tag starts
>>  with a tag opener (a less than sign)
>>  and ends with a tag closer (a greater
>>  than sign).   Many tags have corresponding
>>  closing tags which identical except
>>  for a slash after the tag opener.
>
>
>Take this as an informal intro not a spec.
>Let's keep the spec in parallel.

I took great pains to make "Text and Markup" an
accessible yet correct intro to SGML syntax. I'd like
to see folks referred to that document for these issues.
If it's not readable, let's fix it.

We must be very careful
of two things: 1. that these redundant informal blurbs
do not in any way conflict with the SGML standard,
and 2. that they are not misleading.

This blurb mostly passes criteria 1: all tags do
indeed start with a less than sign (and I guess
"tag opener" is close enough to "start tag open delimiter"
though "... which is identical except for a slash after
the tag opener" is goofy. </, the end tag open delimiter,
is not viewed as a start tag open delimiter followed by
a slash.)

But not all less than signs indicate tags: a less than
sign is only recognized as STAGO when it's followed by
a letter.  And most A end tags are hardly identical
to their start tag, even modulo the slash. The case
of the start tag can be different from the case of the
end tag. I fear that folks will read this blurb and
write broken sed scripts.

Certainly there should be a link from this blurb to
http://info.cern.ch/hypertext/WWW/MarkUp/Connolly/Current/Text.html#Tags

>  In  
>http://info.cern.ch/hypertext/WWW/MarkUp/Elements/HEAD.html
>>  14. These blurbs should probably quote their element declarations
>>  from the DTD, in order to help folks learn to read the DTD.
>
>Yes. And the DTD should be in PRE with links back to the blurbs.
>I have started a //info.cern.ch/hypertext/WWW/MarkUp/HTML.dtd.html

Excellent idea. But again, there are maintenance issues we must
sort out.

>That's where I got to

I certainly appreciate the speedy response, and the notes
of encouragement I got from a few others.

I am much more confident that this will all be resolved soon.

Dan