HTML todo list

Dan Connolly <connolly@pixel.convex.com>
Message-id: <9301120436.AA00126@pixel.convex.com>
To: www-talk@nxoc01.cern.ch
Subject: HTML todo list
Date: Mon, 11 Jan 93 22:36:43 CST
From: Dan Connolly <connolly@pixel.convex.com>
Please excuse the bitching and moaning from the last message.
My motto is "Accept the situation or do something about it."
Kind of a condensed version of the serenity prayer.

Since I don't have resources to fix the whole HTML situation,
here's a laundry list of everything I can think of about
HTML. Perhaps discussions and code fixes can refer
to "#15 in the HTML todo list" for example.


In http://info.cern.ch/hypertext/WWW/Technical.html

1. My dictionary lists "markup" as a word, not mark-up.
  HTML format[4]          A description of the mark-up language used for some
                         documents and for search hit-lists.

In http://info.cern.ch/hypertext/WWW/Daemon/Bugs.html
2. The PLAINTEXT situation should be logged as a bug against
the server. PLAINTEXT is deprecated.

In http://info.cern.ch/hypertext/WWW/FAQ/List.html
4. HTML should support QUESTION and RESPONSE elements to
support converting USENET FAQ's to HTML

In http://info.cern.ch/hypertext/WWW/Provider/ShellScript.html
5. PLAINTEXT is deprecated. Use PRE, and use a sed script
to change < to &#60, > to &#62, and & to &#38.

6. http://info.cern.ch/hypertext/WWW/Tools/HTMLGeneration/dir2html.txt
This thing doesn't quote attributes; it uses XMP in stead of
PRE with numeric character references.

7. http://info.cern.ch/hypertext/WWW/Tools/HTMLGeneration/ls2html.awk.txt
Quotes around HREFS, PLAINTEXT.

8. http://info.cern.ch/hypertext/WWW/Daemon/Implementation/asis.txt
Quote HREFS, numeric character references where necessary.

9. http://info.cern.ch/hypertext/WWW/HytelnetGate/src/htn2html.c
Uses HEADER in stead of HEAD.
Quote HREFs.
Special character entities?
Yeah! It uses numeric character references already!

In http://info.cern.ch/hypertext/WWW/MarkUp/MarkUp.html
10. Mark-up again

11. This text seems out of place:
WWW parsers should
ignore tags which they do not understand,
and ignore attributes which they
do not understand of tags which they
do understand.

12. Default text: this node seems to confuse lots of issues. I suggest
we get rid of it. The way to look at HTML is as a stream of data and
markup. Newlines are handled differently all over the place. It might
be reasonable to talk about how newlines are handled by the text
formatter, after they have been handed over from the SGML parser.

In http://info.cern.ch/hypertext/WWW/MarkUp/Tags.html
13. This text is out of place: 
Each tag starts
with a tag opener (a less than sign)
and ends with a tag closer (a greater
than sign).   Many tags have corresponding
closing tags which identical except
for a slash after the tag opener.

It's more thoroughly discussed in
http://info.cern.ch/hypertext/WWW/MarkUp/Connolly/Current/Text.html
[which still needs revision: it's correct, but could use better
organization.]

In http://info.cern.ch/hypertext/WWW/MarkUp/Elements/HEAD.html
14. These blurbs should probably quote their element declarations
from the DTD, in order to help folks learn to read the DTD.
"Only certain elements are allowed" is vague: there are restrictions
about the order and occurence sometimes too.

In http://info.cern.ch/hypertext/WWW/MarkUp/Elements/TITLE.html
15. This seems redundant:
The title of a document is given
between title tags:
<pre>		&#60;TITLE&#62; ... &#60;/TITLE&#62;
</pre>
Lexical details should be discussed elsewhere. The example is
good, but the mention of tags is out of place.

16. What does this mean?
It should
[...]
ideally fit on one line.

17. Should the TITLE element be CDATA, RCDATA, or PCDATA?
If we want to be able to use Latin chars in the title,
it can't be CDATA. The only difference between RCDATA
and PCDATA (with no subelements allowed) is that comments
are recognized in PCDATA, whereas they are just regular
data in RCDATA.

In http://info.cern.ch/hypertext/WWW/MarkUp/Elements/ISINDEX.html
18. The word "Format" connotes lexical details, which are discussed
elsewhere. I endorse the use of examples, but I'd like to keep
the model of
	SGML source ==parser==> ESIS ==WWW semantics==>formatted output
consistent. The WWW semantics processor doesn't deal with <>'s etc.
It just sees the presence of the ISINDEX element and acts accordingly.

In http://info.cern.ch/hypertext/WWW/MarkUp/Elements/NEXTID.html
19. The status of each element should be noted consistently. e.g.
Mainstream	Consistently used by past, present, and future implementations.
Deprecated	In use and will be supported, but should be avoided. (XMP)
Obsolete	In use in some documents, but will not be supported. (NEXTID)
Proposed	Not yet in the DTD or widely supported (e.g. LINK)
Standard	Not yet widely supported, but will be (e.g. PRE)
Extra		It's legal to ignore these. (e.g. EM)

http://info.cern.ch/hypertext/WWW/MarkUp/Elements/LINK.html
20. How many of these are allowed? I could change
<!ELEMENT HEAD - -  (TITLE? & ISINDEX? & NEXTID?)>
to
<!ELEMENT HEAD - -  (TITLE? & ISINDEX? & NEXTID? & LINK?)>
or
<!ELEMENT HEAD - -  (TITLE? & ISINDEX? & NEXTID? & LINK*)>
I don't know if the latter is legal SGML. I'd have to try
it out.

21. Link types are not well defined. The only reason to put
something in a public specification is so everybody can agree
on some semantics. If there are no semantics to agree on,
why include the TYPE attribute? (It's status is at best
"proposed" in my mind, though it's in the DTD.)

In http://info.cern.ch/hypertext/WWW/MarkUp/Headings.html
22. "(at least six)" -- how about exactly six? Though I've
seen a lot of style guides that frown on anything more than 4.

In http://info.cern.ch/hypertext/WWW/MarkUp/SGML.html
23. We should give at least one complete reference to the standard, i.e.

	ISO 8879:1986, Information Processing -- Text and
        Office Systems -- Standard Generalized Markup Language
                         (SGML)

24. In the Archive section, we could metion comp.text.sgml,
the SGMLs parser materials, and the ifi.uio.no archive.

In http://info.cern.ch/hypertext/WWW/MarkUp/Elements/A.html
25. All attribute values have to be quoted, including NAME.
The example is wrong.

26. The TYPE attribute hardly seems worth mentioning. In the DTD,
it's a NAME, not just any old string.

27. We should look at modeling anchors as HyTime linkends
and/or ilinks.

28. We should look at modeling the LINK element as a HyTime
construct as well.

In http://info.cern.ch/hypertext/WWW/MarkUp/Elements/P.html
29. I don't like the use of "exact representation" here:
The exact representation of
this (indentation,  leading, etc)
is not defined here, and may be a
function of other tags, style sheets
etc.

A reader could be confused between the source representation
and the rendered format. The source representation certainly
is defined. I'd say the "rendered representation" or some such.

30. Where are P's allowed? In the DTD, they're allowed in:
HTML, BODY, ADDRESS, BLOCKQUOTE, PRE but not in HEAD, A,
CODE, SAMP, etc.

In http://info.cern.ch/hypertext/WWW/MarkUp/Lists.html
31. Ordered lists: Obsolete or Standard?

32. "The format is:" Here again, this is an example, but it's
hardly a specification of the format of a UL element.

33. What does this mean?
The opening list tag  must be immediately
followed by the first list element.

34. The important difference between UL, MENU, and DIR is not
how they are displayed, but their semantic meanings. A MENU
is a list of things to choose from. A DIR is a list of names
in a directory.

35. We could also make this semantic distinction between PRE,
XMP, and LISTING, were it not for the syntactic confusion
surrounding XMP and LISTING.

36. Get rid of this:
the closing tag must obviously match
the opening tag.

In http://info.cern.ch/hypertext/WWW/MarkUp/Elements/PRE.html
37. Wording of the newline documentation:
Line boundaries within the text are
significant, except for one immediately
following or immediatly preceding
a tag.

I don't like saying "newlines are significant" or "not significant."
Something like "newline characters shall be rendered as line breaks..."
or "newlines shall be ignored by the renderer..." would be better.

38. Semantics of newlines in PRE. Given the current DTD, a newline
after the PRE start tag or before the PRE end tag is not reported
by an SGML parser.

I think I can cook up some magic SHORTREF declarations that will
cause the SGML parser to report the newlines (possibly as P tags).
[This would obviate the need for special newline processing code
in libHTML]

In any case, I'd suggest that ALL NEWLINES REPORTED BY THE SGML
PARSER IN THE PRE ELEMENT BE DISPLAYED AS LINE BREAKS. That only
leaves the issue of which newlines are reported, which is governed
by the SGML standard.

39. I don't like the way this is worded:
The &#60;p&#62; tag should not be used.
If found, it should be interpreted
as a single new line.

I'd suggest: "it should be displayed as a line break" to avoid treating
<P> as "\n" and interpreting "\n" in some strange way.

40. "... character character highlighing elements may be used."
Ack! I don't recommend this! Hmmm... maybe only the B, I, and U
elements. This certainly conflicts with the current DTD.

In http://info.cern.ch/hypertext/WWW/MarkUp/Highlighting.html
41. These have status "Extra"
Where not supported by implementations,
like all tags, these should be ignored.<p>

This should be a warning to providers that some information may
be lost on some browsers.

42. (Definition of these and reference
- Dan?)
They come from TeXinfo.

43. I left the TeXinfo @file element out. I don't remember why.
It might have been an oversight. Do we want it in there?

44. Examples (TBD) see complete.html in my stuff.

In http://info.cern.ch/hypertext/WWW/MarkUp/Tags.html#z41
45. The PLAINTEXT tag terminates the HTML entity. What
follows is not SGML. In stead, there's an HTTP convention
that what follows is a text/plain body.

46. "The text may contain any ISO Latin printable characters" --
this conflicts with the DTD. I think a design that treats Latin
characters as external data entities is cleaner than one that
treats them as text characters, but I'm willing to go the
other way.

47. "including the
tag opener, so long as it does not
contain the closing tag in full."
For Pete's sake, could we get this out of there once and for all?
Perhaps it deserves a historical note or something, but we can't
leave it in as part of the standard. I'm willing to support
unquoted attribute values, but not this.

48. "The <a  NAME="z22">XMP tag</a>..." Use the term "element". The
term "tag" doesn't include the content of the element.

In http://info.cern.ch/hypertext/WWW/MarkUp/MarkUp.html
49. "Special characters are represented
by SGML entities"
They're represented by numeric character references.
The lt, gt, and amp entities are not in the DTD. They should
be supported for historical reasons, but they are obsolete.

In http://info.cern.ch/hypertext/WWW/MarkUp/Connolly/Current/HTML.html
50. I'd like to move the Abstract, Specification, and the reference to
"Text and Markup" up into
http://info.cern.ch/hypertext/WWW/MarkUp/MarkUp.html
That node would look like

<H1>HyperText Markup Language</H1>
<H3>Abstract</H3>...
<H2>Language Reference</H2>
	<A>Text and Markup</A>
	<A>The Elements</A>
	<A>Implementors' Guide</A>
<H2>Specification</H2>
	<A>the DTD</A>
<H2>Appendices</H2>
	<A>futures</A>
	<A>constraints</A>

and this node would become "Implementors' Guide", with
pointers to recommended, complete, tolerated, errors,
libHTML, and SGMLs.

In http://info.cern.ch/hypertext/WWW/MarkUp/Connolly/Current/html.dtd
51. include ISO Latin 1 character set in SGML declaration?

52. Put PLAINTEXT back in HTML element (fell out by mistake.)

53. LINK element?

54. Get rid of H5 and H6?

55. Get rid of link TYPE lement?

56. Document BLOCKQUOTE in Elements reference.

57. EXPIRES attribute on HEAD?

58. Get rid of NEXTID element?

59. Document URN, TITLE, METHODS attributes of A element.

60. Proposed Headers element (like a DL; for RFC822 message headers:
<HEADERS>
<dt>To<dd>connolly@convex.com
<dt>Subject<dd>HTML todo list
</HEADERS>)

61. List STYLE attribute?

62. XMP and LISTING: CDATA or RCDATA?

In http://info.cern.ch/hypertext/WWW/MarkUp/Connolly/Current/Text.html
63. Under "Parsing Content Into Data and Markup," improve the
explanation of the MIXED, ELEMENT, EMPTY, CDATA, and RCDATA content
types (PCDATA is the wrong term) and how it affects parsing.

64. Revise the section on the sample implementation, libHTML, and
supported.html.

In http://info.cern.ch/hypertext/WWW/Test/test.html
65. This node should be moved to the implementos' guide.

In http://info.cern.ch/hypertext/WWW/MarkUp/Future.html
66. Delete the reference to the perl script.

67. There are two references here to old versions of my spec.

68. Header: it's in there: HEAD

69. Highlighting: it's in there (get rid of HP1 etc. in Elements reference)

70. Fixed width with anchors: it's in there: PRE.

71. Entities: Latin chars are in there. What do we need bullets for?

72. Comments: the comment element is a bad idea. SGML comments are
documented and supported.

73. Link types: we should look at HyTime before we go much further
on this.


In the midaswww-1.0 browser: [by the way: I've fixed all these in my copy]

74. HREF's with quotes don't work

75. Unrecognized tags are treated as data, rather than ignored.

76. numeric character references and entity references aren't supported.

77. ETAGO doesn't end XMP, LISTING, PLAINTEXT unless it's the right
GI. (e.g. <XMP>foo</foo> blah : blah should not be in the XMP element.)

78. local: acess scheme is wierd. I suggest we go with ftp: and
local-file: to match MIME, and get rid of local: and file:


Well, that's all I can think of. Good night.

Dan