Re: ISO/IEC 10646 as Document Character Set

Glenn Adams (glenn@stonehand.com)
Thu, 4 May 95 12:24:19 EDT

From: erik@netscape.com (Erik van der Poel)
Date: Thu, 04 May 95 08:52:54 -0700

> This RFC specifies a document character set which is used in the
> interpretation of characters in the document entity and in the
> entities referenced from the document entity. This document
> character set is ISO/IEC 10646-1:1993.

Perhaps specify UCS-4? (As opposed to UCS-2.)

The encoding form is irrelevant. By definition, specifying 10646 as
the document character set provides potential access to all coded characters
in the standard.

> This RFC does not specify the actual character set or character
> encoding scheme used in the representation of the document entity
> or any referenced entity. It is the responsibility of communicating
> agents to agree upon an actual character set or encoding scheme.
> The manner in which such an agreement is negotiated is outside the
> scope of this RFC.

If the RFC is intended to be the spec for the "text/html" Internet
Media Type (as used in MIME and HTTP), then it should say *something*
about charset.

The HTML RFC should not say anything for the reason that Larry pointed out.
The encoding form is related only to the representation of an entity;
therefore, it is is unrelated to the document character set.

The right place to specify this is in the HTTP RFC and/or along with other
transport specs.

Glenn