Re: using NOTATIONs inline (Dan Connolly)
Date: Mon, 8 Jun 92 00:17:48 -0500
From: (Dan Connolly)
Message-id: <>
Subject: Re: using NOTATIONs inline
Newsgroups: comp.text.sgml
In-Reply-To: <>
References: <>
Organization: Engineering, CONVEX Computer Corp., Richardson, Tx., USA
In article <> you write:
>Dan Connolly <> writes:
>|   The WWW group is attempting to define a multimedia interchange
>|   format called HTML.  . . .
>Why not use HyTime?
Partyly because of ignorance (we've heard of HyTime, but we don't
know the details). I'd expect a HYTIME engine to be quite a bit
of work to implement. And partly because, as I understand it, HYTIME
doesn't go as far as to perscribe a DTD. The WWW project needs
one particluar language, not a whole architecture.

I'd certainly like to know more about HYTIME's techniques for addressing
documents, esp. elements of documents.

Now for the WWW gang:
>|   That is, is it possible to put an arbitrary 8 bit binary stream
>|   _inside_ an SGML document? My guess is: no. But if we use
>|   CDATA, can we include anything that doesn't contain the closing
>|   tag in full?
>If you by "the closing tag in full" mean the entire end-tag, complete
>with etago, generic identifier, and tagc, as in "</image>", this is not
>the way SGML does it.  CDATA and SDATA are terminated by a etago
>"delimiter-in-context", which is an etago (end-tag open, "</") delimiter
>followed by a name start character, or a grpo (group open, "(")
>delimiter if concurrent document types are allowed.  In the reference
>concrete syntax, this means that the regular expression "</[(a-z]"
>matches the end of CDATA and SDATA elements.
>You can also use marked sections, with a CDATA status keyword, in which
>case the CDATA is terminated by the mse delimiter (marked section end,
>|   Someone made the point that an SGML document is only allowed to
>|   include SGML characters as specified by the SGML declaration, and if
>|   we're going to use the default SGML declaration, we have to stick to
>|   the characters blessed by it.
>Blessed and blessed.  The SGML declaration is supposed to reflect the
>reality of the document, not enforce arbitrary limits on them.  So you
>write an SGML declaration which fits the document.
>|   That's not my understanding. I thought that inside CDATA (or SDATA,
>|   I think) you could put _anything_ but the closing tag in full.
>As said above, the etago delimiter-in-context terminates the data,
>regardless of whether it's a legal end-tag in that context.
>You should be aware that the SGML parser will parse the contents of the
>"binary" content, and ignore record start, and treat record ends
>different from other characters.  In addition, it's an error for an SGML
>entity to contain characters with any of the numbers listed in the
>SHUNCHAR part of the SYNTAX declaration.  This is _not_ what you want
>with binary data.
>|   What's the scoop? Do we have to use external entities for raw data?
>Yes.  An external entity that is not an SGML text entity requires a
>notation identifier, so you only need to list the entities in the DTD,
>with notation, and refer to them by name in the document instance.
>If this is not satisfactory, you should declare the objects to be CDATA,
>and use a binary to text-only transformation scheme.  There are several
>such schemes.  Among them, base64 is the preferred encoding in my view,
>since it's available as part of the new Multipurpose Internet Mail
>Extensions (MIME) RFC-to-be.  (The latest draft is available for
>anonymous FTP as and MIME.6.txt for
>two weeks from today.  Section 5.2 which concerns the base64 encoding is
>also available as  Transformation
>back to the binary form from the text-only form may be done on the fly
>by the application before sending the data to the notation interpreter.
My idea is to use MIME encodings, but put these attachments _outside_
the SGML text, in an attached (or external) body part.

>In addition to being much easier to deal with in SGML, this also makes
>SGML documents containing such content robust with respect to file
>transfer, etc.
>Hope this helps,

Thanks. Mostly it confirms my suspicions, but it should also provide
a somewhat authoritative answer (no references to ISO 8879 here :-)
to the WWW project.

>Erik Naggum       |  +47-295-0313     |  ISO 8879 SGML     |  Memento,
>Naggum Software   |   "fuzzface"      |  ISO 10744 HyTime  |  terrigena.
>Boks 1570, Vika   | <>  |  JTC 1/SC 18/WG 8  |  Memento,
>0118 OSLO, NORWAY | <> |  SGML UG SIGhyper  |  vita brevis.