Re: ISO/IEC 10646 as Document Character Set

Glenn Adams (glenn@stonehand.com)
Wed, 3 May 95 23:47:50 EDT

From: Larry Masinter <masinter@parc.xerox.com>
Date: Wed, 3 May 1995 19:43:30 PDT

> 1. We need language in the RFC that specifies what to do in the default
> case that the CHARSET parameter is not present in the Content-Type
> response.

Actually, I disagree, if by "the RFC" you mean the HTML RFC. I think
this is transport dependent. If you get HTML by SMTP mail, the default
should be what the mail transport says it is (US-ASCII) while if you
get something by HTTP the default can be something else.

How about if we don't try to solve this problem in the HTML working
group.

You are partly correct in throwing this out of the HTML arena. However,
HTML spec *should* say something about any assumptions (or lack of
assumptions) about the character encoded used in the representation
of the document entity or the collection of entities which make up an
HTML document.

Thus, the HTML RFC should say something like:

This RFC specifies a document character set which is used in the
interpretation of characters in the document entity and in the
entities referenced from the document entity. This document
character set is ISO/IEC 10646-1:1993.

This RFC does not specify the actual character set or character
encoding scheme used in the representation of the document entity
or any referenced entity. It is the responsibility of communicating
agents to agree upon an actual character set or encoding scheme.
The manner in which such an agreement is negotiated is outside the
scope of this RFC.

How's that?

Glenn