Re: HTML todo list

Tim Berners-Lee <timbl@www3.cern.ch>
Date: Thu, 14 Jan 93 18:02:07 +0100
From: Tim Berners-Lee <timbl@www3.cern.ch>
Message-id: <9301141702.AA00591@www3.cern.ch>
To: Dan Connolly <connolly@pixel.convex.com>
Subject: Re: HTML todo list
Cc: www-talk@nxoc01.cern.ch
Reply-To: timbl@nxoc01.cern.ch

My machine crashed from too many wondows and I lost a few unsent mail  
messages with that.  So I may repeat myself at first, up to point 14.

Changes to the DTD I have made are in 

/hypertext/WWW/MarkUp/HTML.dtd.html

Connolly/Current/* is untouched.

> Date: Mon, 11 Jan 93 22:36:43 CST
> From: Dan Connolly <connolly@pixel.convex.com>
> 


> 1. My dictionary lists "markup" as a word, not mark-up.
Fixed
> 2. The PLAINTEXT situation should be logged as a bug against
OK it is but not many servers use it and clients like to be able to  
get source of postscript files for example easily. HTTP2 will fix it.
> 4. HTML should support QUESTION and RESPONSE elements to
> support converting USENET FAQ's to HTML
Too specific I think.
> In http://info.cern.ch/hypertext/WWW/Provider/ShellScript.html
> 5. PLAINTEXT is deprecated. Use PRE, and use a sed script
Done.  text2html.sed on th web under HTML generation tools.
> 6. .../WWW/Tools/HTMLGeneration/dir2html.txt
> This thing doesn't quote attributes; ...
Fixed. 

> 7. .../WWW/Tools/HTMLGeneration/ls2html.awk.txt> Quotes around  
HREFS, PLAINTEXT.
Fixed
 

> 8. .../WWW/Daemon/Implementation/asis.txt
> Quote HREFS, numeric character references where necessary.
Quote sin online version, original is being rewritten anyway I am  
told.
> 9. http://info.cern.ch/hypertext/WWW/HytelnetGate/src/htn2html.c
> Uses HEADER in stead of HEAD.
Fixed.
> Quote HREFs.
Fixed.
> Special character entities?
> Yeah! It uses numeric character references already!
Does it? You mean entities I think.
> In http://info.cern.ch/hypertext/WWW/MarkUp/MarkUp.html

> 10. Mark-up again
Fixed

> 11. This text seems out of place:
OK I have hidden it. :-) Does your spec say it anywhere?

> 12. Default text: this node seems to confuse lots of issues.
OK Reference to your doc instead

> In http://info.cern.ch/hypertext/WWW/MarkUp/Tags.html
> 13. This text is out of place: 

Gone.
 

> In http://info.cern.ch/hypertext/WWW/MarkUp/Elements/HEAD.html
> 14. These blurbs should probably quote their element declarations
I have started an HTML.dtd.html with links.
 

> In http://info.cern.ch/hypertext/WWW/MarkUp/Elements/TITLE.html
> 15. This seems redundant:
Fixed.

> 16. What does this mean?
Elaborated and more sepcific.
 

> 17. Should the TITLE element be CDATA, RCDATA, or PCDATA?
> If we want to be able to use Latin chars in the title,
> it can't be CDATA. The only difference between RCDATA
> and PCDATA (with no subelements allowed) is that comments
> are recognized in PCDATA, whereas they are just regular
> data in RCDATA.

Good point.

 - If we specify Latin 1 as the base set, can't wehave latin 1
   characters in CDATA?

 - If we can't, then I guess we use PCADATA as it would be the
   only place except for <XMP> and <LISTING> where we can use
   RCDATA.
   

> In http://info.cern.ch/hypertext/WWW/MarkUp/Elements/ISINDEX.html
> 18. The word "Format" connotes lexical details, which are discussed
> elsewhere. I endorse the use of examples, but I'd like to keep
> the model of
> 	SGML source ==parser==> ESIS ==WWW semantics==>formatted  
output
> consistent. The WWW semantics processor doesn't deal with <>'s etc.
> It just sees the presence of the ISINDEX element and acts  
accordingly.

Yes.  OK.  But I want as I said before (unless the crash lost the
message) to have two documents out of this. One is the HTML spec for  
MIME IANA registration.  The other is a readable document which is  
NOT 100% a precise refernce document but can be read by human beings  
WITHOUT SGML knowledge.  I can guess that this document will have 10  
times the readership of the other if it is readable, as <10% of the  
people creating HTML will know about SGML CROs etc etc.

It is good to have a lot of cross-reference between them.

> In http://info.cern.ch/hypertext/WWW/MarkUp/Elements/NEXTID.html
> 19. The status of each element should be noted consistently. e.g.
> Mainstream	Consistently used by past, present, and future  
implementations.
> Deprecated	In use and will be supported, but should be avoided.  
(XMP)
> Obsolete	In use in some documents, but will not be supported.  
(NEXTID)
> Proposed	Not yet in the DTD or widely supported (e.g. LINK)
> Standard	Not yet widely supported, but will be (e.g. PRE)
> Extra		It's legal to ignore these. (e.g. EM)


We have almost as many categories as elements!  I'd add
Obsolescent	Will be obsolete when the alternative implementation
		(eg HTTP2) is available.
		
I'd make PRE mainstream as there are no implementations for which a  
new PRE-understanding version is not available or easily made  
available. And so cut out "Standard"  OOps I put it in again  
...see#31.

I have made NEXTID Mainstream.  Editors need it: can't do without it
really.  I would perhaps change it to <EDITING NEXTID=z27> if that  
was felt to be more logical.

We also need a hook for a version for the checkin/out/lock logic  
DAN(?) proposed.  That was that when you
lock or PUT a document, you specify the version so that a document  
can be PUT or CHECKED IN by a different person to the one who GoT it.
This means the server gives a key, a version or date code, with the  
document. This is all HTTP2 except when a document is stored  
somehwre, passed around and then eventually returned to the server.  
In that case, it needs a place to hold its original version number
on the server.

<EDITING NEXTID=z27 CHECKEDOUTAS="19930217234507">

Thoughts?

> http://info.cern.ch/hypertext/WWW/MarkUp/Elements/LINK.html
> 20. How many of these are allowed? I could change
Any non-negative integer
> ... <!ELEMENT HEAD - -  (TITLE? & ISINDEX? & NEXTID? & LINK*)>
> I don't know if the latter is legal SGML. I'd have to try
> it out.
I think that's what we want.

> 21. Link types are not well defined. The only reason to put
> something in a public specification is so everybody can agree
> on some semantics. If there are no semantics to agree on,
> why include the TYPE attribute? (It's status is at best
> "proposed" in my mind, though it's in the DTD.)


Yes and no.  We need some well-define link type but we also need this  
as a hook for the future which we haven't enugh experience.  Link
types whould be registered.

This is a flexibility point, but it must be firm ... like
a towing ball on the back of your pickup you want to be able
to connect anything onto it but you want it well fixed onto the  
truck!

But I want to make it REL instead of TYPE as people think TYPE
refers to the object type of the desdtination object rather than the  
link.  (From messages on this list).

> In http://info.cern.ch/hypertext/WWW/MarkUp/Headings.html
> 22. "(at least six)" -- how about exactly six? Though I've
> seen a lot of style guides that frown on anything more than 4.

I agree.  I wuld frown ony anything over 3 in a hypertext document.
However, it is useful to generate a great big HTML document by
concatenating little ones, demoting their heading levels. You then  
print the big document. This generates up to 6 easily.  Maybe we  
should go to 9 but frown on >4.

> In http://info.cern.ch/hypertext/WWW/MarkUp/SGML.html
> 23. We should give at least one complete reference to the standard, 

Done.

> 24. In the Archive section, we could metion comp.text.sgml,
> the SGMLs parser materials, and the ifi.uio.no archive.

Link put in cruely.

> In http://info.cern.ch/hypertext/WWW/MarkUp/Elements/A.html
> 25. All attribute values have to be quoted, including NAME.
> The example is wrong.

I have cahnged NAME to ne a NAME -- ie doc-wide unique which it must   
be. Numberic ones are then not valid but I donb't generate them any  
more.  I think that we should stick to the intended ID system. In the  
future, we can think about IDs on many other elements.

> 26. The TYPE attribute hardly seems worth mentioning. In the DTD,
> it's a NAME, not just any old string.

I have made it REL as I said above and I think it is very important.

> 27. We should look at modeling anchors as HyTime linkends
> and/or ilinks.

Yes I agree when someone has time to get into that.

> 28. We should look at modeling the LINK element as a HyTime
> construct as well.
Ditto.

> In http://info.cern.ch/hypertext/WWW/MarkUp/Elements/P.html
> 29. I don't like the use of "exact representation" here:
OK we stick to "rendering" for that

> 30. Where are P's allowed? In the DTD, they're allowed in:
> HTML, BODY, ADDRESS, BLOCKQUOTE, PRE but not in HEAD, A,
> CODE, SAMP, etc.

That's right.  They are not in the CERN implementations allowed in  
<DL> or <UL> etc, but they would be useful in those.
Comments?

> In http://info.cern.ch/hypertext/WWW/MarkUp/Lists.html
> 31. Ordered lists: Obsolete or Standard?

Standard. Bother I thought we'd got rd of that! (The next editor will  
turn them into unordered lists at the moment but I can fix that)
 

> 32. "The format is:" Here again, this is an example, but it's
> hardly a specification of the format of a UL element.

Ok. example.

> 33. What does this mean?
> The opening list tag  must be immediately
> followed by the first list element.

(LI | (A|%text)+)  in SGML I suppose just as you say.
You can't
	<UL>and here they all are:
	<LI>The first..
	<LI>the second
	</UL>
 

> 34. The important difference between UL, MENU, and DIR is not
> how they are displayed, but their semantic meanings. A MENU
> is a list of things to choose from. A DIR is a list of names
> in a directory.

Yes and no.  I too like logical definitions -- I am sold on semantic  
markup but HTML is to cover a vast range of data and semantics. MENU
These things are NOT necessarily what their names suggest -- many a  
selectable menu is set out as a DIR or a DL. The element names are
mnemonic only.  The blurb talks about how much text is in the  
paragraphs.

> 35. We could also make this semantic distinction between PRE,
> XMP, and LISTING, were it not for the syntactic confusion
> surrounding XMP and LISTING.

We coudl but we are deprecating XMP and LISTINg and PRE will do for
all.  You can only be very semantic in a very narrow application.
This is not one.

> 36. Get rid of this:
Gone

> In http://info.cern.ch/hypertext/WWW/MarkUp/Elements/PRE.html
> 37. Wording of the newline documentation:
> Line boundaries within the text are
Reworded with "render"

> 38. Semantics of newlines in PRE. Given the current DTD, a newline
> after the PRE start tag or before the PRE end tag is not reported
> by an SGML parser.
> 

> I think I can cook up some magic SHORTREF declarations that will
> cause the SGML parser to report the newlines (possibly as P tags).
> [This would obviate the need for special newline processing code
> in libHTML]
> 

> In any case, I'd suggest that ALL NEWLINES REPORTED BY THE SGML
> PARSER IN THE PRE ELEMENT BE DISPLAYED AS LINE BREAKS. That only
> leaves the issue of which newlines are reported, which is governed
> by the SGML standard.

... and with the issue of explaining the end result to the
simple HTML writer and to me without our needing to call on the
model of the SGML engine and application. Awaiting the results
of your tests with SHORTREF.

> 39. I don't like the way this is worded:
> The &#60;p&#62; tag should not be used.
Ok done

> 40. "... character character highlighing elements may be used."
> Ack! I don't recommend this! Hmmm... maybe only the B, I, and U
> elements. This certainly conflicts with the current DTD.

Serious point here folks.  There was a great demand for B I U
for man pages and the like. Why prohibit anything other than TT.
or to keep it simple, allow anything and mention TT should not be  
used, and the constraints of fixed width may limit the ability to  
render some highlighting.

I have introduced %htext noting that text always occurred with A.
I hope I have done it right.
 

> In http://info.cern.ch/hypertext/WWW/MarkUp/Highlighting.html
> 41. These have status "Extra"
> Where not supported by implementations,
> like all tags, these should be ignored.<p>
> 

> This should be a warning to providers that some information may
> be lost on some browsers.
> 

> 42. (Definition of these and reference
> - Dan?)
> They come from TeXinfo.
Thanks

> 43. I left the TeXinfo @file element out. I don't remember why.
> It might have been an oversight. Do we want it in there?

No too sepcific. We have enough.

> 44. Examples (TBD) see complete.html in my stuff.

I repeat that I like your examples but I would like them split
into GOOD HTML documents describing bad HTML documents,
with links to the bad documents for testing only.
We don't want people to follow links to the only documentation to  
find their parser has core dumped :-)

> In http://info.cern.ch/hypertext/WWW/MarkUp/Tags.html#z41
> 45. The PLAINTEXT tag terminates the HTML entity. What
> follows is not SGML. In stead, there's an HTTP convention
> that what follows is a text/plain body.
OK -- in.

> 46. "The text may contain any ISO Latin printable characters" --
> this conflicts with the DTD. I think a design that treats Latin
> characters as external data entities is cleaner than one that
> treats them as text characters, but I'm willing to go the
> other way.

I'm glad.  Lets.  I think that a full 8-character base will be  
easier.  I think the text should be able to contain any latin 1.

> 47. "including the
> tag opener, so long as it does not
> contain the closing tag in full."
> For Pete's sake, could we get this out of there once and for all?
OK OK OK :-) I hope "The text may contain any ISO Latin printable  
characters, but not the end tag opener. (See Historical note)" is OK


> 48. "The <a  NAME="z22">XMP tag</a>..." Use the term "element". The
> term "tag" doesn't include the content of the element.
Done.

> In http://info.cern.ch/hypertext/WWW/MarkUp/MarkUp.html
> 49. "Special characters are represented
> by SGML entities"
> They're represented by numeric character references.
> The lt, gt, and amp entities are not in the DTD. They should
> be supported for historical reasons, but they are obsolete.
I would like them in the DTD. While people are still reading/writing
HTML they are useful. My mental ASCII table is in hex, not decimal,  
anyway.  Are they any overhead? Why the war against them? For the ISO
characters you wanted the opposite.  (Does your menatl ASCII table  
stop at 128? Mine too)

Comments?

> In  
http://info.cern.ch/hypertext/WWW/MarkUp/Connolly/Current/HTML.html
> 50. I'd like to move the Abstract, Specification, and the reference  
to
> "Text and Markup" up into
> http://info.cern.ch/hypertext/WWW/MarkUp/MarkUp.html
> That node would look like
> 

> <H1>HyperText Markup Language</H1>
> <H3>Abstract</H3>...
> <H2>Language Reference</H2>
> 	<A>Text and Markup</A>
> 	<A>The Elements</A>
> 	<A>Implementors' Guide</A>
> <H2>Specification</H2>
> 	<A>the DTD</A>
> <H2>Appendices</H2>
> 	<A>futures</A>
> 	<A>constraints</A>
> 

> and this node would become "Implementors' Guide", with
> pointers to recommended, complete, tolerated, errors,
> libHTML, and SGMLs.
> 

> In  
http://info.cern.ch/hypertext/WWW/MarkUp/Connolly/Current/html.dtd
> 51. include ISO Latin 1 character set in SGML declaration?
> 

> 52. Put PLAINTEXT back in HTML element (fell out by mistake.)
> 

> 53. LINK element?
> 

> 54. Get rid of H5 and H6?
> 

> 55. Get rid of link TYPE lement?
> 

> 56. Document BLOCKQUOTE in Elements reference.

This BLOCKQUOTE... it should be one thing or the other.
If it cannot contain other paragraph styles then it should be a  
paragraph style like address, and not be able to contain address.
This is easy for everyone to implement.

If it can contain ADDRESS then why not let it contain anything - in  
particular, headings. Trouble is, I can't represent that in RTF  
easily so than blows the NeXT and Mac browsers. So let's
make it 

<!ELEMENT BLOCKQUOTE - - (%htext;|P)+>
like ADDRESS, and bear it in mind for HTML3 which will have SECTION  
in, ie without the linear RTF constraint.

> 57. EXPIRES attribute on HEAD?

I toook it off .. its in HTTP2, as it applies to all formats not just  
HTML.
> 58. Get rid of NEXTID element?
 Nope .. needed to stop editors reusing deleted IDs. See above.
  

> 59. Document URN, TITLE, METHODS attributes of A element.
Ooo yes. Done. Lots of "notes" attached for info only.
 

> 60. Proposed Headers element (like a DL; for RFC822 message  
headers:
> <HEADERS>
> <dt>To<dd>connolly@convex.com
> <dt>Subject<dd>HTML todo list
> </HEADERS>)


Hmmm.
1. In fact, <DL COMPACT> looks very similar and has less narrow a  
meaning.
2.In fact the headers inforation could rather be regarded as part of  
the  metainfo in the <HEAD> element. Many of the RFC822 things will  
in fact be outside the document in the HTTP layer.  This is a bit  
chick-and-egg. Here we are describing an SGML dtd for a spoecific  
format for a MIME_wrapped RFC822 body, and in it we want to put the  
RFC822 header. Hmmm.  Something has got muddled. But I understand  
what you mean: very often one quotes mail messages as text. Strictly,  
one shouldn't though. You shouldn't be able to edit the headers.

Currently there is DL COMPACT which does that. It is implemented in  
www. I am torn betwen generality and the preacticality of getting  
something defined and outthe door and I thinkl the latter wins so  
let's put COMPACT as an attribute for DL and leave the HEADERs if you  
don't mind too much.


> 61. List STYLE attribute?

No I don't think so -- see discussion #60


> 62. XMP and LISTING: CDATA or RCDATA?
CDATA is probably nearest to the original intention?

This is your stuff dan I think:
> In  
http://info.cern.ch/hypertext/WWW/MarkUp/Connolly/Current/Text.html
> 63. Under "Parsing Content Into Data and Markup," improve the
> explanation of the MIXED, ELEMENT, EMPTY, CDATA, and RCDATA content
> types (PCDATA is the wrong term) and how it affects parsing.
> 64. Revise the section on the sample implementation, libHTML, and
> supported.html.
> 

> In http://info.cern.ch/hypertext/WWW/Test/test.html
> 65. This node should be moved to the implementors' guide.
Same coments as above -- moved in <PRE>

> In http://info.cern.ch/hypertext/WWW/MarkUp/Future.html
> 66. Delete the reference to the perl script.
done
 

> 67. There are two references here to old versions of my spec.
> 68. Header: it's in there: HEAD
> 69. Highlighting: it's in there> 

> 70. Fixed width with anchors: it's in there: PRE.
gone .. all gone!

> (get rid of HP1 etc. in Elements reference)
No -- I will put that lot in another file though to keep it clean.
There are some people (here) who geenerate a lot of HPs.
 

> 71. Entities: Latin chars are in there. What do we need bullets  
for?
We don't.

> 72. Comments: the comment element is a bad idea. SGML comments are
> documented and supported.

They are rather different in that a comment can surround a whole
nested stack of SGML elements, and could ne nested. I don't suppose  
SGML comments can?

> 73. Link types: we should look at HyTime before we go much further
> on this.
Well, there is only 9 pages on hhypertext in HyTime (More Time than  
Hy) and in that I can't see any mention of link types.  As I said  
above (with a different metaphor), I think this should be a well  
defined and entrenched gate into uncharted terriory

> In the midaswww-1.0 browser: [by the way: I've fixed all these in  
my copy]
> 

> 74. HREF's with quotes don't work
Foxed wth Tony's fix
> 75. Unrecognized tags are treated as data, rather than ignored.
> 76. numeric character references and entity references aren't  
supported.
Could you post diffs please for those Dan? Thanks.

> 77. ETAGO doesn't end XMP, LISTING, PLAINTEXT unless it's the right
> GI. (e.g. <XMP>foo</foo> blah : blah should not be in the XMP  
element.)
> 

> 78. local: acess scheme is wierd. I suggest we go with ftp: and
> local-file: to match MIME, and get rid of local: and file:
Covered in another messsage.  I am prepared to split
file to ftp: and local: even though there are many (decnet, afs,
etc) ways to get at files and the client may be the best judge of  
what will work for him.

 Now, what about the SAVEDAS adddress so that from justthe content of  
the document hte partial UDIs can be resolved? I think that is a  
useful thing, and could be essentail. I will put that in as Standard.
 

> Well, that's all I can think of. Good night.
I hope you slept well...

I have made a provisional list of link relationships. Ihope
they show the utility of the attribute. They can always be ignored!

> Dan

Tim