To say the very least...
> A couple possible remedies:
> (1) restricting HTML to UTF8 form of Unicode
I do not think this reasonable. It places an artificial limitation on
the data we can handle.
> (2) use "filename.html" ("filename.htm" on Windows and CDROMS) for
> "ASCII-tag-character-superset encodings" and use
> "filename.uhtml"
> ("filename.uht" for Windows and CDROMS).
I do not think this to be reasonable either.
How about defining a new MIME type for filesystem-based documents (or
perhaps just storing such documents with the MIME headers)? Why is it
absolutely vital to store the documents as HTML when this is obviously
insufficient?
One could assume that documents which use the .htm(l) extension use
ISO-8859-1 (default, same as HTTP), and documents with a .www (or
whatever) extension have HTTP headers.