Re: Comments on: "Character Set" Considered Harmful

Albert Lunde (Albert-Lunde@nwu.edu)
Wed, 26 Apr 95 22:51:22 EDT

> >>Can anyone think of cases where the charset parameter will *not*
> >>suffice? I have a nagging feeling, but nothing firm in my mind...
> >
> >I would like the same files to be used locally (e.g., CDROMs). In that
> >case, I would not have a charset parameter.
> Good point!
> How does your server know how to label the content it sends?

One approach that would avoid SGML issues, would be to define
a standard suggested "wrapper/header" format for meta-information
about HTML files, that wouldn't be passed it over the wire in the document
body or handed over to SGML. (MIME may be an option....)

Implementations could be free to represent meta-information other
ways but it could provide a format for disk interchange.

The down side is that this could make more work for servers.

Off-hand, this looks like a can-of-worms because of the mix
of HTML,HTTP,MIME and SGML issues lurking around the edges.

Still, I'd suspect that forcing all character encodings to be
supersets of US-ASCII so that meta-information including charset
can be read from tags would be more gross on the SGML end and
might close out some other options (EBCDIC?).

Another problem with trying to put the charset in one file
with a document is then you have to parse this information
and correct it when translating encodings: if I was an implementor
I might favor a second file for meta-information.

If we can't "solve" this promptly I'd be tempted to say for 2.x (small x)
that representing meta-information for documents on disk
is an implementation issue, and look for nicer fixes
later.

-- 
    Albert Lunde                      Albert-Lunde@nwu.edu