Re: Revised language on: ISO/IEC 10646 as Document Character Set

Albert Lunde (
Wed, 10 May 95 02:41:35 EDT

> So, how about defining the "document character set" to be the union
> of the "charset" and 10646? Numeric character references could use
> 10646 codepoints for 10646 characters, and the characters not in 10646
> could have other numbers (the actual numbers would be outside the
> scope of the HTML spec). Use of non-10646 numeric char refs could be
> discouraged or even prohibited, if the WG feels this is necessary.
> >255 to over 30 thousand. If this turns out not to be sufficent,
> >I'm sure we can do an extension mechanism or format negotiation
> >to allow for use of a different document character set.
> Hmmm... perhaps that extension mechanism would be called "charset"? :-)

Well, it's not a simple minded use of "charset".

I don't know if the gimmick you suggest above is feasible. One of
the advantages of a fixed document charset is that it makes it easy
to convert from one character encoding to another in a simple-minded
way and not lose the meaning of numeric references. But the
encodings you suggest are large enough that it's possible no one
would convert them to anything else, and it would preserve the
numeric references in 10646.

Even if this is feasible, I'd favor leaving the requirement for
10646 in the HTML 2.0 spec, and resolving this kind of extension
in an internationalization document.

For what it's worth, prior discussion of creation of new chinese
characters seems to pop up in the archives in early Feb at:

    Albert Lunde