Re: Last call: Intro, SGML, MIME sections

lilley (lilley@afs.mcc.ac.uk)
Thu, 4 May 95 11:32:29 EDT

Dan said:

> * Hypertext structure. How do you like the glossary?
> Are the links sprinkled throughout too much noise?

Having each and every occurence of a gloss word made into a link does
give a rather busy effect. However, as one cannot assume that people
have read the pages in a given order or indeed that they started reading
at the top of a page, the normal printed document conventions do not
aply.

My feeling is that the increase in precision is probably worth the
sub-optimal aesthetics. It does mean that other links becomne a little
buried, however.

> But I'm most concerned with the first three sections right now:

> HTML as an Application of SGML

> Hence the terminals above parse as:

> HTML
> |
> \-HEAD, BODY
> | |
> \-TITLE \-P
> | |
> | \-<P>,"Some text. ",EM
> | |
> | \-<EM>,"*wow*",</EM>
> \-<TITLE>,"Parsing Example",</TITLE>

Given certain historical problems with P, I would be happier if the
first occurence of P in an example in the standard showed a closing P tag
somewhere. Either in the example document or in the parse tree.

Yes, I am aware that the closing </p> can be omitted. But as the parse
tree shows HEAD and BODY being inferred, could it not show </P>, </BODY>
and </HTML> being inferred as well? Just to make the point early on?

> The syntax character set for all HTML documents is ISO-646-IRV.

The word syntax is not a link, so the term 'syntax character set' does
not seem to be defined. It would aid clarity if it were.

> Note that the terminating semicolon is only necessary when the character
> following the reference would otherwise be recognized as markup:

True, but perhaps this should say a little more strongly that the
trailing ; is not actually wrong, just that it can be omitted in this
instance if you really want.

(Qouting from the HTML version)
> <P>
> To include comments in an <A HREF="html-spec_12.html#GLOSS11">
> HTML document</A> that will be eliminated in
> the mapping to terminals, surround them with <SAMP>`'</SAMP>. After
> the comment delimiter, all text up to the next
> occurrence of <SAMP>`--&#62;'</SAMP> is ignored.

Should that be

... surround them with <SAMP>`&lt;!--'</SAMP> ... ?

In the example HTML document,

> <IMG SRC ="triangle.xbm" alt="Warning:">
> Be sure to read these <b>bold instructions</b>.

Could that be changed to

<IMG SRC ="triangle.xbm" alt="Warning: ">
> Be sure to read these <b>bold instructions</b>

ie a trailing space after Warning:

> Version
> To help avoid future compatibility problems, the version parameter may
> be used to give the version number of the specification to which the
> document conforms.

If omitted, what does it default to? The current highest version number that
has been standardised?

> Charset
> The charset parameter (as defined in section 7.1.1 of RFC 1521[MIME])
> may be given to specify the character encoding scheme used to
> represent the HTML document as a sequence of octets.

Again it would be helpful to say explicitly what happens when this is omitted
(assume ISO Latin1?)

> HTML user agents must support the ISO-8859-1 character encoding scheme,
> and hence the US-ASCII character encoding scheme. (9)

I feel you should either use ASCII or ISO-646-IRV throughout (and have a
footnote explaining the relationship between the two).

Lastly, I should say that my overall feeling from reading the spec is that it
seems rather stiff and impenetrable, even when you understand what it is talking
about. I would compare the language used to an ISO standard, which you may take
as a compliment or not according to preference ;-)

--
Chris Lilley, Technical Author
+-------------------------------------------------------------------+
|       Manchester and North HPC Training & Education Centre        |
+-------------------------------------------------------------------+
| Computer Graphics Unit,             Email: Chris.Lilley@mcc.ac.uk |
| Manchester Computing Centre,        Voice: +44 61 275 6045        |
| Oxford Road, Manchester, UK.          Fax: +44 61 275 6040        |
| M13 9PL                            BioMOO: ChrisL                 |
|     URI: http://info.mcc.ac.uk/CGU/staff/lilley/lilley.html       | 
+-------------------------------------------------------------------+
|     "The first W in WWW will not wait."   François Yergeau        |
+-------------------------------------------------------------------+