Re: Revised language on: ISO/IEC 10646 as Document Character Set

Erik van der Poel (erik@netscape.com)
Tue, 9 May 95 16:42:21 EDT

>OK. So I'm saying it has been put somewhere. Does this address
>your concern, or would you like to suggest something more?

Perhaps it would be clearer if we had some wording in there that
explained the relationship between "charset" and "document character
set"? MIME has an appendix on the canonical encoding model to deal
with the confusion surrounding CRLF, Base64, Quoted-Printable, etc:

Step 1. Creation of local form.
...
Step 2. Conversion to canonical form.
...
Step 3. Apply transfer encoding.
...
Step 4. Insertion into entity.
...

Perhaps there should be something like that in the HTML spec:

Step 1. Receive document in "charset"
Step 2. Convert document from "charset" to processing character set
Step 3. Parse document, converting numeric character entities
from the 10646 document character set to the processing
character set
etc

Or is this already in the spec? It's hard to tell since the spec
is divided into several pieces and I can't do a search on "charset"
across the whole thing. Do you have an updated *.txt file yet?
(Just asking.)

>The consensus I've heard is that ISO10646 is "good enough" and that
>nobody is interested in HTML-based communications using characters
>that are not in ISO10646. Could you give a motivating example of the
>sorts of things that you want to do, and tell me whether you think it
>conflicts with the current wording of the HTML 2.0 document?

Actually, it's not that *I* want to use chars not in 10646. My concern
is that the HTML spec should not attempt to restrict people from using
charsets that *they think* (this is key) are "richer" than 10646.
What's the point of restricting the charset to subsets of 10646?

MIME is in many ways just a framework. One of the WG's decisions was
not to restrict the charset -- instead, people would be allowed to
register charsets. I think HMTL should similarly avoid restricting
the charset.

Erik