Prologue Support [was: SGML Declaration vs. System Declaration]

Dan Connolly (connolly@w3.org)
Mon, 8 May 95 16:58:24 EDT

James Clark writes:
> > Date: Fri, 5 May 95 16:25:39 EDT
> > From: connolly@w3.org (Dan Connolly)
> > (We could put
> > in an application convention that prohibits marked sections without
> > conflicting with SGML: SGML says that an application must specify that
> > all its documents are conforming SGML documents, but not the
> > converse.)
>
> That's not how I would interpret 15.2.2:
>
> A conforming SGML application shall require its documents to be
> conforming SGML documents, and shall not prohibit any markup that this
> International Standard would allow in such documents.
>
> NOTE - For example, an application markup convention could recommend
> that only certain minimization functions be used, but could not
> prohibit the use of other functions if they are allowed by the formal
> specification.

This whole issue is not clear to me at all.

I aim to specify, in the HTML 2.0 document, with SGML as a normative
reference, a language in the formal sense of the word; that is, a set
of strings over some set of symbols.

[The issue of the set of terminal symbols is hairy all by itself, but
for the sake of argument, let's fix the terminal symbols, or alphabet
at the ISO-8859-1 character repertoire.]

So I stick some public text in the specification -- some "SGML code,"
if you will. Two questions come up: (1) what is the language specified
by that public text, and (2) if I want my language to be a strict
subset of that language, under what circumstances do I still have a
conforming SGML application?

For example, here's a conforming SGML document that is not in the
language that I intend to specify:

<!doctype input public "-//IETF//DTD HTML 2.0//EN">
<input>

So I wrote this in the HTML 2.0 spec:

|To identify information as an HTML document conforming to this
|specification, each document should start with the prologue:
|
|<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
|
|(11)
|
|If the body of a text/html message entity does not begin with a
|document |type declaration, an HTML user agent should infer the above
|document |type declaration.
|
|HTML user agents are required to support the above document type
|declaration, the following document type declarations, and no others.
|
|<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0 Strict//EN">
|<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML Strict//EN">
|
|In particular, they may support other formal public identifiers, or
|document types altogether. They may support an internal declaration
|subset with supplemental entity, element, and other markup
|declarations, or they may not.

The idea is that the HTML language is specified as those conforming
SGML documents whose prologue is one of the above, given that the
FPI's resolve to the public text given in the spec.

On the other hand, if we're not allowed to have application
conventions that prohibit marked sections, then how can we prohibit
internal declaration subsets? In fact, how can we prohibit documents
that define a vastly different grammar by redefining parameter
entities and declaring new element types? Can a conforming SGML
application even specify the document element?

Clues?

Dan