Digest of Tim's mail

Tim Berners-Lee <timbl@oclc.org>

Mail folder: html-archive
Next message: Daniel W. Connolly: "IETF stuff [Was: Digest of Tim's mail ]"
Previous message: Daniel W. Connolly: "Conformance Testing [Was: Time to Reorg the Doc?]"
Reply: Murray Maloney: "Re: Digest of Tim's mail"

Date: Tue, 5 Jul 94 11:28:59 EDT
Message-id: <9407051525.AA04336@www3.cern.ch>
Reply-To: html-ig@oclc.org
Originator: html-ig@oclc.org
Sender: html-ig@oclc.org
Precedence: bulk
From: Tim Berners-Lee <timbl@oclc.org>
To: Multiple recipients of list <html-ig@oclc.org>
Subject: Digest of Tim's mail
X-Listprocessor-Version: 6.0c -- ListProcessor by Anastasios Kotsikonas
X-Comment: HTML Implementation Group (Private)

I realize that I have had no replies because all my messages
are bounced by the list processor still. 

(I guess I forgot
which systemn I have to send mail from, but the message returned
has an error which says that tbl@www0.cern.ch, which is
a nonsense mail address,  is not subscribed. So I don't know where to
fix it from.)

Stu, please forward these to the list.

I would like any feedback on the first as soon as possible.
_______________________________________________________________
Date: 30 June 1994

Folks,

If this work is to be ratified by the IETF, some group
must be cast as an IETF working group to discuss it.
Experience shows that if the experts edita doc and turn it 

over to a random group to deliberate over, it falls
appart.  This means that this group who are doing the
work must be teh IETF working group.  There are some
pressures to do this.  There have also been pressures to
do this from the IETF, which may mean that we get some
support in making sure the process works.

What do you think?  The meachnisms of archived dicsussion
are all there -- the group mailing list could be specified
as www-html, and this list could be kept an editor's list,
or the html-ig list could be opened as the working group list.

I would propose to chair the group, Dan edit the document
of course.  We would establish a set of milestones
and a strict charter that the spec be descriptive of current
WWW practice, and then only when that spec is done should
the group tackle Dave's HTML+ document.  Or how we like
to arrange the work, but the charter would enable us to
throw out anyone wanting to open up lengthy irrelevant
discussions, or to ask whether WW should in fact use rtf,
etc etc.

We would have to take care that we don't get any newbie
influx screwing the list.  I would proposed that meetings
only occur if there is a quorum of realimplementors
available at an IETF.  I would also chair the group
and have anyone not implementing HTML parsers etc
sit at the back.

The same thing applies to HTTP, but the issues are
a bit different, BTW.

Who would be able to come to meetings at the next IETF
in Toronto July 25-29? 

Use this space for free expression of your feelings:

Tim

___________________________________________________________________
> From: "Daniel W. Connolly" <connolly@oclc.org>

> I've been wrestling with the idea of reorganizing the HTML document
> into a normative part and an informative rationale, much like the
> POSIX 1003.1 document.

There is a good tradition of legally correct but unreadable
standards.  I don't believe we should follow it.
Many people want all the stuff about an element together,
and don't want to have to read it with grep.

.
> Depending on how the document is published, the packaging changes
> significantly. 

Exactly.  But we have hypertext.  Let's use it.
I suggest that the normative parts, discursive including proposals)
and examples be separated for each element, and for our purposes
left close together

	/Elements/IMG/Norm.html
		     /Discuss/		(WIT discussion)
		     /Examples.html
		     /Explanation.html
		     /All.html		( Norm + Explanation + Examples )

We can then run off documents combining these as we want,
including making the browseable version All.html automatically.

If the bits are split into 10 places it will be very difficult to
keep track of it.  I would like also to
use exactly the same text for the HTML level 2 and HTML+ spec
whwre appropriate, as we should be explicitly aware of where they
differ.  Like:

	/Elements/IMG/NormL1.html
		     /NormL2.html
		     /Discuss/		(WIT discussion)
		     /Examples.html
		     /Explanation.html
		     /All.html		( Norm + Explanation + Examples )

Yes, the code management of it basically has to restart.   :-(
But I think the extra power would be worth it.

> I've also been playing around with the FrameMaker/DocBook tools that
> we have here at HaL.  Is anybody out there planning to include this
> with their docset? 

	"docset"?

> If I produced a DocBook version of this thing,
> would that be valuable to folks? Or would you rather just take the
> PostScript version and print that?

	I would only want to be able to make HTML, plain text and
	postscript.

> Sometimes I'm tempted to take a little bit of the verbiage from the
> HTML files, stick it in the DTD, and call that the HTML 2.0 spec.

> Then somebody else can write user documentation, tutorials, "how to
> write a browser" documents, etc. --

ok, but..

>  stick all the stuff about how to
> compose search URLs from ISMAP documents in there.

That is spec material, not tutorial!  It must be rigorous and defined
in one unique place.

> 

> So let's inventory exactly what we require for an HTML 2.0 spec:
> 

> I invite folks to rate each of the following as:
> 	5 - must have this for my purposes
> 	4 - may have this, and I think it should
> 	3 - may have this, but I don't care
> 	2 - may have this, but I'd rather it did not
> 	1 - must not have this
> 

						Ratings:       Dan Tim
Normative content:
  * An SGML Declaration and one or more DTD subsets		5   5
  * Minimal conformance definition				5   5
  * Definition of element semantics				5   5	
    (e.g. what rendering distinctions MUST be made)
  * Element reference						4   5
  * Examples of recommended usage				3   4
  * Explanation of operation of anchors, forms, ISINDEX, ISMAP	2   5
  * Explanation of WWW linking and addressing			2   5
	     (enough to give semantics of HTML)
  * Security Issues						2   5
	     (mandatory for any RFC-Tim)
(if only there were time...)
  * Test Suite							4   3

Informative content:
  * Publication History						4   3
  * Summary of Changes since draft-iiir-html-01			4   3
  * "Typical Rendering" instructions				3   5
     (From experience, needed to explain semantics)
  * Historical notes about browser implementations that		3
		conflict with the SGML standard
  * Examples of common authoring errors				3   4
  * Rationale behind contentious issues				3   3
		(e.g. "why P is a container")
  * Proposed language changes					3   4

Navigation Features and Media:
  * A Postscript format file					5   5
  * A collection of HTML nodes					4   5
  * A plain text format file					4   5
  * A DocBook document						3   3

  * List of Reviewers						5   5
  * Revision History						4   3
  * Numberd Sections						4   4
  * Title page							3   3
  * Abstract							3   5
  * Index							3   3
       (nice but can't use HTML-L2 as source -Tim)

Publication Forums/Audiences:
  * Publication through the IETF as an RFC or FYI		4   5
  * Publication through the Davenport group			3   3
  * Publication through SGML Open				3   3

_______________________________________________________________________
> From: "Daniel W. Connolly" <connolly@oclc.org>
> In message <9406161206.aa11328@dali.scocan.sco.COM>, Murray Maloney writes:
> >
> >Proposal:  Identify the control characters in ISO 8859/1
> >that are recognized as valid HTML.
> 

> Seconded.

Thirded

> > identify those
> >control characters which are not valid by specifying
> >them as SHUNCHARs in the SGML declaration, and document
> >them in the HTML specification.  For each control character
> >that is valid, identify its meaning and potential uses.

Tolerant parsers, one can note, should ignore unexpected
control characters.  This can of course be done logically
in the conversion from real characters to SGML input characters
so a real SGML parser can still be used.

> I'd say this list is:
> 

> 	Decimal		ASCII "code"	HTML Meaning
> 					in BODY		in PRE
> 	9		HT		word break	col := (col+8) mod 8
> 	10		LF		word break	col := 0; row :=  

row+1
> 	13		CR		word break	col := 0
> 

> Hmmm... about CR and LF in PRE... what about Mac generated documents
> taht only use CR vs. unix that only uses LF vs. DOS that uses CRLF?
> The above definition works for unix and DOS, but not Mac. Is that
> OK for everybody?

Hey!  We are talking document transfer format on the net, folks!
All text/plain is transferred in NETASCII format, ie with a
mandatory CR LF as line terminator.  It is totally irrelevant
what the internal format is: on VMS, the lines are stored
with a leading 16-bit character count.  On VM
its all in EBCDIC!  The servers and clients
to the conversion. Mac generated documents have to get converted
by the server.  This is interoperability...

[Incidentally, this can be coded and is in libwww but it
is a little weird.  There are functions TOASCII() and
FROMASCII(), and when data is prepared for transmission,
any occurence of '\n' in the C runtime version
of a file is replaced with { FROMASCII(13), FROMASCII(10) }
and then the whole buffer, as it leaves is converted
using TOASCII().  On a Mac, '\n' is in fact 13, which
weirds out unix programmers but is valid C.  The C RTL
makes sure that each line of the file is a sequence
of characters terminated by '\n' on *any* system.]

I don't believe the concept of CR->row=0 is useful in this
post-tty era.

I believe lots of unix servers output only LF,
as they just pipe unix files out.

I suggest that the official form be NETASCII
but that parsers be strongly encouraged to ignore the CR
and only use the LF.

How about in this and many other cases making a strict DTD
and a tolerant DTD?

Note that all text/xxx typed probably are sent in NETASCII.
They will be converted into local form very early on,
so any documentation should make it plain that it is documenting
taht which passes on the wire.

> >For all control characters which are not valid, list
> >the characters and their codes, and specify the error
> >(if any) which may result if the character is discovered.
> 

> Perhaps somebody could run some tests on existing browsers to see
> whether it's reasonable to say whether other chars 0-8, 14-31 should
> be ignored altogether or treated as wordbreaks.
> 

> Also... do we leave open the possibility that folks will want to use
> these unused characters for special purposes (such as graphic code set
> switches) in the future?

The escape sequences have been used for Japanese character encoding
already -- warp to Japan and you will see a bunch of it.
So escape sequences should be reserved for their ISO meanings.
And ESC must be reserved for use as an escape sequence initiator.
(probably CSI should be reserved similarly too)
>From the point of view of SGML of course one would have to
interpret these as a precursor to SGML processing.

I know that using the escape sequences is maybe not the
way to go, but it is used and certainly ESC should be allowed
in any *other* role.
Tim

_____________________________________________________________________