Re: SGML/MIME charsets, ERCS/Unicode [was: New DTD (final version?) ]

Roy T. Fielding (fielding@avron.ICS.UCI.EDU)
Thu, 9 Feb 95 15:53:42 EST

Dan writes [in response to Gavin]:

>>It was long ago decided in the HTTP working group that HTTP does not
>>require strict conformance to MIME in this area. It was also noted
>>that this is not in the MIME *standard*, but rather in a draft being
>>circulated.
>
> So we can agree that we haven't seen the last word on this. I think
> that it's silly for HTTP to fail to interoperate cleanly with MIME.
> I think there will be some give and take on both sides... we'll see.

I should not be saying it on this list, but I guess it has to be said.
HTTP is not a MIME-conformant application! It is not and will never be,
unless some god swoops down and magically updates every SMTP server and
gateway on the Internet.

HTTP does interoperate cleanly with MIME -- all that is required is a
gateway that converts HTTP messages to something that is MIME-conformant
(which is the same thing that any valid implementation of sendmail or
mmdf does). No conversion is necessary going the other way.
The HTTP spec already includes a section on the differences between
MIME and HTTP in order to simplify the conversion task.

Finally, regarding character set issues.... they don't belong here.
HTML should be defined independently of the document character set to
whatever extent is possible under SGML. Under no circumstances is this
group tasked to fix SGML such that it better handles character sets.
Under no circumstances will this group ever require that Web clients
and/or browsers use a specific character set other than ISO-8859-1 --
making it easy to use other character sets is desirable, but defining
a lingua franca is absolutely out of the question for this working group.

The same goes for HTTP -- it should be possible to transmit documents
in any character set using HTTP. Under no circumstances will the http-wg
ever require that Web clients and/or browsers use a specific character
set other than ISO-8859-1.

The reason ISO-8859-1 is required is because at least one character set
must be required, and ISO-8859-1 was the most appropriate 8-bit,
ASCII-inclusive set when the web was invented.

If you want to talk about lingua franca's and what-the-parser-should-do
and the future of the web, etc., it should be done on www-talk. Setting
standards for internal browser and server implementations is not a job
for the IETF.

If people are looking for something to fight for regarding Unicode, let
me suggest that they first get the three (4?) variations of Unicode
registered with IANA such that I can include their official names in the
HTTP/1.0 specification. It's damn difficult to provide for character set
negotiation when there is no single standard for the character set name.

.....Roy Fielding ICS Grad Student, University of California, Irvine USA
<fielding@ics.uci.edu>
<URL:http://www.ics.uci.edu/dir/grad/Software/fielding>