Re: Charset labelling (Was: Comments on: "Character Set" Considered Harmful)

Gavin Nicol (gtn@ebt.com)
Fri, 28 Apr 95 04:20:22 EDT

>I propose the following usage for charset labeling:
>
> <META HTTP-EQUIV=Content-Type contents="text/html; charset=iso-2022-jp"\
>
..
>We still have the chicken-and-egg problem for canonical Unicode as
>Larry Masinter pointed out.

To say the very least...

> A couple possible remedies:
> (1) restricting HTML to UTF8 form of Unicode

I do not think this reasonable. It places an artificial limitation on
the data we can handle.

> (2) use "filename.html" ("filename.htm" on Windows and CDROMS) for
> "ASCII-tag-character-superset encodings" and use
> "filename.uhtml"
> ("filename.uht" for Windows and CDROMS).

I do not think this to be reasonable either.

How about defining a new MIME type for filesystem-based documents (or
perhaps just storing such documents with the MIME headers)? Why is it
absolutely vital to store the documents as HTML when this is obviously
insufficient?

One could assume that documents which use the .htm(l) extension use
ISO-8859-1 (default, same as HTTP), and documents with a .www (or
whatever) extension have HTTP headers.