Re: charset parameter (long)

Larry Masinter (masinter@parc.xerox.com)
Fri, 13 Jan 95 15:13:16 EST

There was a lot of confusion about the restriction that the current
draft describes level 0,1,2 and only is compatible with
charset=ISO-8859-1 and charset=US-ASCII. I thought, though, that we
were agreed that level 2 would describe roughly "current practice"
that was common throughout the web, and that it would require level
2.1 or some such to allow other charset parameters.

#>Also, do we really want to get into the business of multi-charsets w/in 1
#>document??

# Emphatically yes!

Emphatically no! That is, you may well want to support multiple
scripts within the same encoding (English, Russian, Greek, Japanese)
but you can do that using a single charset which supports multiple
encodings.

Trying to handle character-encoding at the SGML level is a mistake,
because the SGML tags themselves are represented in the overall charset.

Now, you might be able to use multiple encodings within the same
*document* by using some kind of inclusion mechanism, and having the
included data stream be encoded in a different character encoding than
the top-level one, but you want the encoding to be at the level of
granularity of charset negotiation.

For the discussion on character sets, think of documents being
represented at three levels:

entity: A stream of entities,
represented in SGML by
character: a stream of characters,
represented by a character encoding (charset) by
byte: a stream of bytes.

Don't try to say things about changes in the byte->character level by
declarations in the character->entity stream.