Re: comments on the DTD in Nov 16 draft

Daniel W. Connolly (connolly@hal.com)
Fri, 25 Nov 94 19:12:34 EST

In message <9411251913.AA08469@texcel.no.texcel.no>, Paul Grosso writes:
>>
>> >Creating a copy of ISO Added Latin 1 character entity declaration set
>> >---------------------------------------------------------------------
>> >
>> >Why is the HTML spec creating a new character entity set that is
>> >basically identical to the ISO set (as far as I can tell) and giving it
>> >a different FPI?
>>
>> With the above clarification, could you "get into the details of
>> a suggestion" please?
>
>after discussion with Dave Raggett and James Clark, i stand down from
>my earlier misgivings. provided that the idea is still to put &Agrave;
>in the document rather than &#192; or whatever. that way, if i receive
>an HTML document, i can process it with MY HTML doctype which i will
>create by reading my copy of ISO Latin 1 SDATA entity declarations
>in place of the HTML Latin 1.

An HTML document that you receive may have &Agrave; or it may very
well have &#192; in it. How will your system process &#192;? The
document character set does include the "Right Part of Latin Alphabet
Nr. 1" after all.

For the purposes of HTML, the Added Latin 1 entity set was not meant
to extend the expressive capability of the language, but only to
provide mnemonics, and to facilitate 7-bit clean transmission.

I am more confused by the moment...

What exactly is standardized by the ISO Added Latin 1 entity set?
The entity names, the SDATA entities in the public text, the bindings
between them, or all of the above?

I thought it was just the names; i.e. if a DTD includes, by reference,
the ISO Added latin 1 entity set, then &Agrave; was valid markup --
but there is no telling how it is represented in the ESIS.

I thought the public text was intended to be edited locally for
each SGML system; that the SDATA "[agrave ]" thingies were supposed
to be replaced by SDATA "esc-my-typesetter's-agrave-thingy" on
a per-system basis.

Is it the case, rather, that all SGML systems (that intend to support
the ISO entity sets) are supposed to recognize an SDATA entity of
"[agrave ]" as some sort of special character?

Please explain. I've seen numerous discussions on comp.text.sgml, and
I think they have served only to confuse me.

Dan

>> >CDATA terminated by any ETAGO
>> >-----------------------------
>i'd suggest
>
><!ENTITY % literal "CDATA"
> -- historical, non-conforming parsing mode where
> the only markup signal is the first end tag (for
> any element) encountered and this terminates the
> <literal> element
> -->
>
>of course, this assumes you want to describe what SGML will do.
>if the comment is supposed to be describing what browsers do,
>then ignore my suggestion.

It's supposed to be describing what browsers do, so I guess I'll
ignore this suggestion. There wouldn't be anything "non-conforming"
about a content model where ETAGO delimiter-in-context was the only
markup signal. That's just plain CDATA. This is "a historical,
non-conforming parsing mode..."

Strictly speaking, this isn't allowed in "An SGML Application Conforming
to ..." so it's really just a "Note to implementors." Perhaps I'll
color it that way more explicitly.

Dan