Re: Charset labelling (Was: Comments on: "Character Set" Considered

Glenn Adams (glenn@stonehand.com)
Fri, 28 Apr 95 10:10:29 EDT

Date: Fri, 28 Apr 95 08:28:20 EDT
From: Gary.Adams@east.sun.com (Gary Adams - Sun Microsystems Labs BOS)

[...discussion of HTTP-EQUIV mode of tagging...]

In either case the server will open the file process the header lines
and output the appropriate information on the wire.

As has been pointed out elsewhere, one can't parse an SGML file to obtain the
character set tag without knowing the character set which the tag purports
to identify. To quote from ISO 8879:1986, clause 13.1, which defines
the production [172] "document character set" in the SGML declaration:

NOTE -- It is recognized that the recipient of a document must be able
to translate it to his system character set before the document can be
processed by machine. There are two basic approaches to communicating
this information:

a) If the character set is standard, registered, or otherwise capable of
being referenced by an identifying name or number, that identifier can
be communicated to the recipient of the document. The communication
^^^^^^^^^^^^^^^^^
must necessarily occur outside of the document...
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

And the last para. on pg. 452 of the SGML Handbook, referring to the above
says:

"As the last note implies, the document character set parameter is ignored
by the SGML parser because the document is already in the document character
set. The parameter is intended for a human to read in printed form, in
order to determine how to trarnslate an incoming document to the local
system character set."

Regards,
Glenn