Re: Comment Declarations

Paul Grosso (paul@arbortext.com)
Mon, 22 May 95 13:33:33 EDT

> From: Glenn Adams <glenn@stonehand.com>
>
> ****** CURRENT TEXT ******
>
> | 2.2.5. Comments
> |
> | To include comments in an HTML document that will be
> | eliminated in the mapping to terminals, surround them with
> | `<!--' and `-->'. After the comment delimiter, all text up
> | to the next occurrence of `-->' is ignored. Hence comments
> | cannot be nested. White space is allowed between the closing
> | `--' and `>', but not between the opening `<!' and `--'.
> |
> | For example:
> |
> | <HEAD>
> | <TITLE>HTML Guide: Recommended Usage</TITLE>
> | <!-- $Id: html-sgml.sgm,v 1.4 1995/05/06 01:44:46 connolly Exp $ -->
> | </HEAD>
> |
> | NOTE - Some historical HTML implementations incorrectly
> | consider any `>' character to be the termination of a
> | comment.
>
>
> ****** REVISED TEXT ******
>
> 2.2.5. Comments
>
> An HTML document may contain comments whose contents shall be ignored
> by user agents. Comments may occur in two forms: as simple comments,
> and as comment declarations.
>
> 2.2.5.1. Simple Comments
>
> A simple comment is delimited on either side by '--'. Simple comments
> do not nest. Simple comments may only occur within markup and may appear
> wherever a parameter separator would occur.
>
> For example:
>
> <!DOCTYPE HTML
> -- Use Experimental DTD --
> PUBLIC "-//MYORG//DTD MY EXPERIMENTAL DTD//EN"
> >
>
> A simple comment is terminated only by '--' or the end of the
> entity, whichever comes first. In particular, no other delimiters
> such as '>' should be recognized in a simple comment.

The reference to "end of the entity" is misleading. It is never correct
for the "closing" COM delimiter to be omitted. I would also consider
adding a note or sentence making it even clearer that the COM delimiter
always terminates an open comment--in particular, surrounding it in single
or double quotes does not make any difference. By way of one possible
concrete rewording suggestion:

A simple comment is always and only terminated by the next occurrence
of the '--' comment delimiter. In particular, no other delimiters
such as '>' should be recognized in a simple comment, and the '--'
delimiter is recognized even if contained in quotes.

>
> NOTE - a parameter separator currently appears in an HTML
> document only within markup declarations, which, at the
> current time, is limited to the document type declaration.
>
> NOTE - separators occurring within descriptive markup, e.g., in
> tags, may not contain simple comments.
>
> 2.2.5.2. Comment Declarations
>
> When a comment is required in a context where other markup is not used,
> a comment declaration may be used; for example, within the content of
> an element.
>
> A comment declaration is composed of a markup declaration open
> delimiter '<!' followed by one or more simple comments separated by
^
optionally

[see production 91--the (s|comment)* construct makes the
whitespace between what you're calling "simple comments" optional]

> whitespace and terminated by a markup declaration close delimiter '>'.
>
> No whitespace may appear between the '<!' and the start of the first
> simple comment; however whitespace may appear after the last simple
> comment and before the closing delimiter '>'.
>

[I fear your collection of examples (1) is too much and (2) doesn't
clearly enough distinguish between valid and invalid ones. Below
is one possible rewrite.]

> For example:
>
> <!-- This comment declaration contains one simple comment -->
>
> <!-- This comment declaration --
> -- contains --
> -- four simple comments --
> -- and has optional whitespace after the last simple comment --
> >
>
> <! -- This comment declaration is illegal due to whitespace before
> the first simple comment -->
>
> <!-- This comment declaration doesn't terminate properly >
> <P>and this seeming paragraph element should be treated as part of
> the comment -- but this text will not since the preceding '--'
> will terminate the ill-formed simple comment within the comment
> declaration and since the text starting with "but this text..."
> is not whitespace.
>
> <!-- The following comment declaration is illegal: -->
>
> <!----->
>
> <!-- since it is equivalent to the following: -->
>
> <!-- -- ->
>
> <!-- note that the first contained simple comment is well-formed
> but the second is not --
> >
>
> ****** END OF TEXT ******
>

Some examples of valid comment declarations follow:

<!-- This comment declaration contains one simple comment -->

<!-- This comment declaration --
-- contains --
-- four simple comments --
-- and has optional whitespace after the last simple comment --
>

The following are invalid comment declarations:

<! -- This comment declaration is illegal due to whitespace before
the first simple comment -->

<!-- This comment declaration doesn't terminate properly >
<P>and this seeming paragraph element and following data up to
the next double dash comment delimiter should be treated as part
of the comment . . .

<!--
If you use a double dash as punctuation in the text of your
comment, you will create an invalid structure--like this.
-->

paul

Paul Grosso
VP Research, ArborText, Inc.
and
Chief Technical Officer, SGML Open

Email: paul@arbortext.com