Re: HTML-WG digest 107

Daniel W. Connolly (connolly@beach.w3.org)
Tue, 18 Jul 95 07:20:23 EDT

In message <183C60115@pmail.accent.co.il>, "David Baron" writes:

>We propose to do all within HTML. (The HTTP protocal does
>allow 8-bit asciis, etc., but the in the SGML DTD
>mandates latin-1 and the current HTML spec madates 7-bits.

Hello? Where does the HTML spec mandate 7 bits? Please
cite your source.

Please see:

http://www.w3.org/hypertext/WWW/MarkUp/html-spec/html-spec_4.html#SEC20

..

Charset
The charset parameter (as defined in section 7.1.1 of RFC
1521[MIME]) may be given to specify the character encoding
scheme used to represent the HTML document as a sequence of
octets. The default value is outside the scope of this specification;
but for example, the default is `US-ASCII' in the context of
MIME mail, and `ISO-8859-1' in the context of HTTP.

HTML Document Representation

A message entity with a content type of `text/html' represents
an HTML document, consisting of a single text entity. The
`charset' parameter (whether implicit or explicit) identifies a
character encoding scheme. The text entity consists of the characters
determined by this character encoding scheme and the octets of the body
of the message entity.

>UTF-7 is expressed in latin-1 characters.

To be more precise (or at least, to use the language of
the HTML 2.0 spec), UTF-7 is express in octets, without
use of octets 128-255.

>We will propose additons to HTML3 for specifiying other
>encodings, paragraph reading-order for bidi, etc.

Please don't make up markup for character encoding schemes.
Markup for languages, writing directions, etc. is approporate.
Markup for character encoding schemes is not.

See also:

http://www.ics.uci.edu/pub/ietf/html/draft-ietf-html-charset-harmful-00.txt
HTML Working Group D. Connolly
INTERNET-DRAFT MIT/W3C
draft-ietf-html-charset-harmful-00.txt May 2, 1995
Expires November, 1995

Dan