Short answer: yes.
Long answer:
> Just to clarify things in my mind, would the following be allowed
> in your world?
OK. Good. I like specific examples. They tend to elucidate a lot of
subtleties.
> HTTP headers followed by HTML document:
>
> HTTP/1.0 200 OK
> Date: Saturday, 29-Apr-95 03:53:33 GMT
> Server: ...
> MIME-version: 1.0
> Content-Type: text/html; charset=iso-2022-jp
> Last-modified: Tuesday, 18-Apr-95 16:10:13 GMT
> Content-length: 15132
>
> <TITLE>...</TITLE>
> <BODY>
> Here is some normal text.
> Here is a 10646 numerical entity: �.
> Here is some ISO-2022-JP text: ...
> </BODY>
>
OK... so what we have above is an HTTP response, which is a response
line followed by what's called (in MIME and HTTP) a message entity.
To interpret the message entity, you look at the Content-Type. It
says "text/html". So you look at the html spec. My working draft (to
be release ASAP!) says:
|3.2 HTML Document Representation
|
| A message entity with a content type of "text/html" represents an HTML
| document, consisting of a single text entity. The charset parameter
| (whether implicit or explicit) identifies a character encoding. The
| text entity consists of the characters determined by this character
| encoding and the octets of the body of the message entity.
So we take the charset parameter, iso-2022-jp, and we use that
to map the octets of the body of the message entity to a sequence
of characters.
During this step, the octets represented by '...' in:
> Here is some ISO-2022-JP text: ...
turn into characters. Nothing surprising happens to this stuff yet:
> Here is a 10646 numerical entity: �.
OK. Now we have a text entity: a sequence of characters. To parse
as per ISO8879, we need to know the document character set. In
the internationalization document (which I don't have handy... sorry)
we're specifying that the document character set for HTML is ISO10646.
So to interpret:
> Here is a 10646 numerical entity: �.
We look up 23598732 in the ISO10646 specification, and see what
character it maps to.
Simple, no?
> If this is allowed, I agree that this would be a good way to migrate
> to the Brave New World of 10646.
One by one, we're all coming to this very conclusion.
Dan