> This RFC specifies a document character set which is used in the
> interpretation of characters in the document entity and in the
> entities referenced from the document entity. This document
> character set is ISO/IEC 10646-1:1993.
Perhaps specify UCS-4? (As opposed to UCS-2.)
The encoding form is irrelevant. By definition, specifying 10646 as
the document character set provides potential access to all coded characters
in the standard.
> This RFC does not specify the actual character set or character
> encoding scheme used in the representation of the document entity
> or any referenced entity. It is the responsibility of communicating
> agents to agree upon an actual character set or encoding scheme.
> The manner in which such an agreement is negotiated is outside the
> scope of this RFC.
If the RFC is intended to be the spec for the "text/html" Internet
Media Type (as used in MIME and HTTP), then it should say *something*
about charset.
The HTML RFC should not say anything for the reason that Larry pointed out.
The encoding form is related only to the representation of an entity;
therefore, it is is unrelated to the document character set.
The right place to specify this is in the HTTP RFC and/or along with other
transport specs.
Glenn